Networking Forums

Networking Forums > Computer Networking > Linux Networking > Bizarre network application problem

Reply
Thread Tools Display Modes

Bizarre network application problem

 
 
Ron Albright
Guest
Posts: n/a

 
      11-10-2005, 12:59 AM
I have a small PC running Fedora Core 3 with kernel 2.6.12-1.1380_FC3. On
it is a Java application that collects data from a serial port and sends
it to a server application. It can do this over a LAN (possible a
broadband internet connection) or a dialup internet connection. In some
cases it has a fixed IP and in some it uses DHCP. The connection can be
either through a normal socket or a SSL socket.

It works fine under all conditions except using SSL over the LAN when the
network interface is configured using DHCP. It works with a normal socket
over dialup or over the LAN with a fixed IP or DHCP. It works with the SSL
socket over dialup or over the LAN with a fixed IP but not DHCP. The
switch from fixed IP to DHCP can be on the same subnet with the
only change being the ifcfg-eth0 script and bringing the network interface
down and up. Switch it back and it works. I can bring up the network
interface with DHCP and switch the application from SSL to normal socket
without any other changes and it stops working. Switch it back and it
works.

The application has excellent logging and comprehensive exception
handling. Both sides show the connection but the app thread just seems to
freeze immediately after with no program exceptions of any kind.

A different small PC running Radhat 7.3 and a somewhat older version
of the app works fine even over SSL using DHCP.

I have no idea where to even start looking for this one. Any pointers on
things to check would be appreciated.
 
Reply With Quote
 
 
 
 
muttley
Guest
Posts: n/a

 
      11-10-2005, 08:48 AM
run the older app on the new machine.

I think it's your machine.

 
Reply With Quote
 
Pep
Guest
Posts: n/a

 
      11-10-2005, 12:04 PM
muttley wrote:

> run the older app on the new machine.
>
> I think it's your machine.


Not necessarily.

Run the older app on FC3 and see what happens but also run the newer app on
RH7.3 to compare the results.

If the older app runs on FC3 then it points to the newer app being wrong.

Also, if the newer app runs on the RH7.3 then it points to the FC3 o/s being
wrong.

What are the differences between the older and newer apps?

Pep.

 
Reply With Quote
 
Chris Uppal
Guest
Posts: n/a

 
      11-10-2005, 12:23 PM
Ron Albright wrote:

> It works fine under all conditions except using SSL over the LAN when the
> network interface is configured using DHCP. It works with a normal socket
> over dialup or over the LAN with a fixed IP or DHCP. It works with the SSL
> socket over dialup or over the LAN with a fixed IP but not DHCP.

[...]
> I have no idea where to even start looking for this one. Any pointers on
> things to check would be appreciated.


After trying "muttley"s suggestion (and assuming that it didn't show up the
problem), I'd be inclined to focus very carefully on what was happening with
DHCP. First ensure that the settings supplied by DHCP are /exactly/ the same
as the static configuration (or it may be easier to change the static
configuration to be /exactly/ what DHCP would supply). Don't forget to
double-check things like the DNS and gateway info that may be supplied by DHCP.
I seem to remember that even the hostname can be supplied by DHCP, if so check
that too -- check /everything/. Does the problem still manifest ? Then fire
up Ethereal and look /very/ hard at the network behaviour, taking particular
note of the IP headers and the like (i.e. don't only look at the payload). For
comparison do the same thing with SSL turned off. Obviously there will be some
differences caused by the need to set up and use encryption, but a lot of the
traffic /should/ be identical (DNS lookups and the like). Where is the
difference ?

-- chris



 
Reply With Quote
 
Ron Albright
Guest
Posts: n/a

 
      11-10-2005, 11:07 PM
On Thu, 10 Nov 2005 13:23:35 +0000, Chris Uppal wrote:

> After trying "muttley"s suggestion (and assuming that it didn't show up the
> problem), I'd be inclined to focus very carefully on what was happening with
> DHCP.


I ran the new app on the old hardware/OS and still had the problem.
Running the old app on the new hardware/OS would take some significant
work. To Pep: The difference is the older version uses RMI while the newer
uses a straight SSL socket. But they both use identical code, keys and
passphrases to set up the socket factories.

> First ensure that the settings supplied by DHCP are /exactly/ the same
> as the static configuration (or it may be easier to change the static
> configuration to be /exactly/ what DHCP would supply). Don't forget to
> double-check things like the DNS and gateway info that may be supplied by DHCP.
> I seem to remember that even the hostname can be supplied by DHCP, if so check
> that too -- check /everything/. Does the problem still manifest ?


I've checked everything under ifconfig and netstat -r and they are
identical. What's confusing is under DHCP even when the app connection is
hanging, all other network functions seem to work including ssh to the
server. The hostname wasn't set and DHCP was suppling one but I set the
hostname so it was the same under DHCP and fixed IP and it made no
difference.

I found that when I boot DHCP kill the app (the app starts on boot) take
the network interface down set a fixed IP and bring the network interface
back up the app still won't talk until I do a reboot with the fixed IP
configuration.

> Then fire
> up Ethereal and look /very/ hard at the network behaviour, taking particular
> note of the IP headers and the like (i.e. don't only look at the payload). For
> comparison do the same thing with SSL turned off. Obviously there will be some
> differences caused by the need to set up and use encryption, but a lot of the
> traffic /should/ be identical (DNS lookups and the like). Where is the
> difference ?


I used snoop but I did this 4 ways. DHCP with SSL (no talk) and without
SSL (talks), fixed IP with SSL (talks) and the above scenario prior to the
reboot (no talk). The only difference I see are 16 DNS related packets
when DHCP is used but this is the case for both with and without SSL. Also
all communications in the app use IPs not domainnames. In both cases where
it didn't talk there where 16 packets exchanged between the app box and
the server. I compared these 16 with the first 16 packets exchanged when
it did talk with SSL and a fixed IP. It appears the 16 packet headers are
identical in all cases.

The DNS packets seem to be related to the hostname assigned by the DHCP
server. I'm using a USR8200 for a router and DHCP. I'm also using it's
brain dead DNS as a local server. I haven't set up real DNS for the domain
configured in the router. Maybe it has something to do with the SSL layer
failing the hostname and domain assigned by the USR8200. I'm going to try
connect the app box directly to a brain dead cheap router so only a
minimum of information is picked up by DHCP. It's the only thing I can
think of right now.

I appreciate the pointers so far and any more ideas will be greatly
appreciated because I'm still completely stumped.
 
Reply With Quote
 
Chris Uppal
Guest
Posts: n/a

 
      11-11-2005, 10:46 AM
Ron Albright wrote:

> I found that when I boot DHCP kill the app (the app starts on boot) take
> the network interface down set a fixed IP and bring the network interface
> back up the app still won't talk until I do a reboot with the fixed IP
> configuration.


I admit that I'm clutching at straws here, but that sounds a bit suspicious.
What happens if you remove the app from the startup sequence entirely, and only
run it once you are /sure/ that DHCP, /dev/random, etc, are all fully set up ?

I should have asked before. When you say the application freezes, what do you
mean ?

More specifically: Is it hanging in a send, or in a receive, or (even)
somewhere else ? What can you see when you run it under a debugger ? When you
sniff the network, where did the last packet get sent, was it from the app or
to it, was it to/from the server, or somewhere else ? Is the answer to that
question consistent with the answer to the first one (it might a deadlock
between app and server caused by buffering, so that both end "think" its the
other's turn to speak next) ?

If that doesn't suggest anything, and your DNS investigations don't turn up a
hint, then I'm afraid I've run out of ideas.

-- chris



 
Reply With Quote
 
Ron Albright
Guest
Posts: n/a

 
      11-11-2005, 07:02 PM
Man, I hope you're in Europe somewhere. The thought of getting up before
5:30 AM (central time USA) and actually having complex thought processes
working is a scary one for me. Unless you're still on the night before.

Hooking it up to a simpleton Linksys firewall router worked. I'm guessing
the incomplete DNS setup was causing some sort of security problem with
the SSL at some level. What really bothers me about this is that it was
locking up rather then failing clean. Unfortunately since it's working I
doubt I'll have the time to followup on finding where the bug is. I
included some more info below for thread completeness for anyone
interested.

Thanks to everyone for the ideas.

On Fri, 11 Nov 2005 11:46:12 +0000, Chris Uppal wrote:

> What happens if you remove the app from the startup sequence entirely, and only
> run it once you are /sure/ that DHCP, /dev/random, etc, are all fully set up ?


It's being start in rc.local so it should be the last thing started but
even if that wasn't the case I had tried shutting down and restarting the
app.

>
> I should have asked before. When you say the application freezes, what do you
> mean ?
>
> More specifically: Is it hanging in a send, or in a receive, or (even)
> somewhere else ?
> What can you see when you run it under a debugger ?


Unfortunately my development environment is all Windows here. The
deployment is on fully automated headless small form factor Lniux PCs. The
This problem doesn't manifest itself in the development environment. I
have no debugging tools on the deployment computers. From the logging I
can tell it is getting past the Socket.connect() and the
ServerSocket.accept() but not past the reading or writing anything from or
to the connection. In other words it appears threads on both sides go to
read and write and never return. After the connection the app writer
thread is supposed to send stuff to the server which the server then
acknowledges. The stuff is never being sent. The confusing part
is there are several log messages that should be spit out after the
connect but prior to any actual writes on the socket. These are not
happening. The reader thread is doing a read on the socket immediately
after the connection is established. So the best guess is the read call on
the socket is locking not only that thread but the entire process.

> When you
> sniff the network, where did the last packet get sent, was it from the app or
> to it, was it to/from the server, or somewhere else ?


The last packet was from the app to the server. That was one of the things
I focused on in the packets. Maybe one or more of the protocol setup
packets was being directed to a wrong address but the MAC addresses were
correct in every packet. In successful cases the 17th packet was also from
the app to the server. So it would appear the server was waiting on
something from app that was never being sent.

> Is the answer to that
> question consistent with the answer to the first one (it might a deadlock
> between app and server caused by buffering, so that both end "think" its the
> other's turn to speak next) ?


I'm not sure what level of buffering you're referring to but I would think
it would have to be at the OS level since the entire process appears to be
locking at the first operation (a read) on the socket.

>
> If that doesn't suggest anything, and your DNS investigations don't turn up a
> hint, then I'm afraid I've run out of ideas.


 
Reply With Quote
 
Chris Uppal
Guest
Posts: n/a

 
      11-14-2005, 09:44 AM
Ron Albright wrote:

> Man, I hope you're in Europe somewhere. The thought of getting up before
> 5:30 AM (central time USA) and actually having complex thought processes
> working is a scary one for me. Unless you're still on the night before.


<chuckle/>

Not to worry, I'm in the UK.


> What really bothers me about this is that it was
> locking up rather then failing clean. Unfortunately since it's working I
> doubt I'll have the time to followup on finding where the bug is. I
> included some more info below for thread completeness for anyone
> interested.


A shame not to nail it, but such are the pressures of commercial life...

-- chris


 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bizarre remote access problem Baz Windows Networking 6 02-28-2007 09:34 PM
Bizarre BT Broadband problem only affecting web not p2p James Broadband 8 12-05-2004 11:22 AM
Bizarre network problem paul@atom.sbrk.co.uk Linux Networking 4 07-20-2004 07:07 AM
Bizarre Problem David Grippo Windows Networking 0 08-29-2003 05:24 AM
Bizarre DHCP Problem ZB Home Networking 1 08-02-2003 03:26 AM



1 2 3 4 5 6 7 8 9 10 11