Networking Forums

Networking Forums > Computer Networking > Linux Networking > WHY: the client process has been killed, but the connection on serveris still ESTABLISHED

Reply
Thread Tools Display Modes

WHY: the client process has been killed, but the connection on serveris still ESTABLISHED

 
 
Andrew Jsyqf
Guest
Posts: n/a

 
      01-16-2009, 05:07 AM
Hi everyone,

I found such an interesting problem yesterday when I did the benchmark
test for epoll, and I really don't know why it happens.

I have a very simple client program, which just creates X tcp
connections to the server(named S1) and wait until being killed. I
deployed such a program into 100 physical servers, and start all the
client programs simultaneously by gexec (a tool makes me able to run
program on multiple servers). I also have a very simple server program
which runs on S1.

When X==100, which implies total number of connections == 10,000,
everything is ok.
When X==200, which means total number of connections==20,000, a
strange thing happened:
After I killed all the client program (I am pretty sure all of them
are killed, because I cannot found them by `ps aux` in such 100
servers), there are still a lot of number tcp connections on S1, and
their status are established ( result in `netstat -tan | grep 8080`).
No matter how long I wait, such connections are still there, and
established. Then I looked into the connections on the client side,
there is no connection at all (also by `netstat -tan | grep 8080`) !

8080 is the port on which the server program listen.

Following information is about S1:
uname -a:
Linux S1 2.6.24-23-server #1 SMP Thu Nov 27 19:19:15 UTC 2008 i686 GNU/
Linux

ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 8160
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 8160
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited


Could anybody kindly tell me why there are so many 'idle' connections
in S1?

Thank you very much !

Andrew
 
Reply With Quote
 
 
 
 
Andrew Jsyqf
Guest
Posts: n/a

 
      01-16-2009, 01:19 PM
On Jan 16, 2:07*pm, Andrew Jsyqf <andrew.js...@gmail.com> wrote:
> Hi everyone,
>
> I found such an interesting problem yesterday when I did the benchmark
> test for epoll, and I really don't know why it happens.
>
> I have a very simple client program, which just creates X tcp
> connections to the server(named S1) and wait until being killed. I
> deployed such a program into 100 physical servers, and start all the
> client programs simultaneously by gexec (a tool makes me able to run
> program on multiple servers). I also have a very simple server program
> which runs on S1.
>
> When X==100, which implies total number of connections == 10,000,
> everything is ok.
> When X==200, which means total number of connections==20,000, a
> strange thing happened:
> * After I killed all the client program (I am pretty sure all of them
> are killed, because I cannot found them by `ps aux` in such 100
> servers), there are still a lot of number tcp connections on S1, and
> their status are established ( result in `netstat -tan | grep 8080`).
> No matter how long I wait, such connections are still there, and
> established. Then I looked into the connections on the client side,
> there is no connection at all (also by `netstat -tan | grep 8080`) !
>
> 8080 is the port on which the server program listen.
>
> Following information is about S1:
> uname -a:
> Linux S1 2.6.24-23-server #1 SMP Thu Nov 27 19:19:15 UTC 2008 i686 GNU/
> Linux
>
> ulimit -a
> core file size * * * * *(blocks, -c) 0
> data seg size * * * * * (kbytes, -d) unlimited
> scheduling priority * * * * * * (-e) 0
> file size * * * * * * * (blocks, -f) unlimited
> pending signals * * * * * * * * (-i) 8160
> max locked memory * * * (kbytes, -l) 32
> max memory size * * * * (kbytes, -m) unlimited
> open files * * * * * * * * * * *(-n) 65535
> pipe size * * * * * *(512 bytes, -p) 8
> POSIX message queues * * (bytes, -q) 819200
> real-time priority * * * * * * *(-r) 0
> stack size * * * * * * *(kbytes, -s) 8192
> cpu time * * * * * * * (seconds, -t) unlimited
> max user processes * * * * * * *(-u) 8160
> virtual memory * * * * *(kbytes, -v) unlimited
> file locks * * * * * * * * * * *(-x) unlimited
>
> Could anybody kindly tell me why there are so many 'idle' connections
> in S1?
>
> Thank you very much !
>
> Andrew


My own analysis: since the connection's status on S1 is established,
which means the kernel of S1 didn't receive FIN. It implies the
connection status on the client side should be at least something like
FIN_WAIT. But what I saw is, all of the connections disappeared. All
the client machines and S1 have not been restart.

Does anyone explain it to me?

Thanks in advance.
 
Reply With Quote
 
David Schwartz
Guest
Posts: n/a

 
      01-16-2009, 06:48 PM
On Jan 16, 6:19*am, Andrew Jsyqf <andrew.js...@gmail.com> wrote:

> My own analysis: since the connection's status on S1 is established,
> which means the kernel of S1 didn't receive FIN. It implies the
> connection status on the client side should be at least something like
> FIN_WAIT. But what I saw is, all of the connections disappeared. All
> the client machines and S1 have not been restart.
>
> Does anyone explain it to me?
>
> Thanks in advance.


Probably packet loss gets high near the server. The clients have
probably given up trying to shut down the connections. It could also
be a safety of sorts in the clients. Windows XP, for example, has
special rate limiting to prevent clients from flooding servers with
large numbers of TCP establishment and tear down requests.

DS

 
Reply With Quote
 
Andrew Jsyqf
Guest
Posts: n/a

 
      01-17-2009, 05:37 AM
On Jan 17, 3:48*am, David Schwartz <dav...@webmaster.com> wrote:
> On Jan 16, 6:19*am, Andrew Jsyqf <andrew.js...@gmail.com> wrote:
>
> > My own analysis: since the connection's status on S1 is established,
> > which means the kernel of S1 didn't receive FIN. It implies the
> > connection status on the client side should be at least something like
> > FIN_WAIT. But what I saw is, all of the connections disappeared. All
> > the client machines and S1 have not been restart.

>
> > Does anyone explain it to me?

>
> > Thanks in advance.

>
> Probably packet loss gets high near the server. The clients have
> probably given up trying to shut down the connections. It could also
> be a safety of sorts in the clients. Windows XP, for example, has
> special rate limiting to prevent clients from flooding servers with
> large numbers of TCP establishment and tear down requests.
>
> DS


Hi David, thanks for your answer.

Limiting the connection number is reasonable to prevent the server
from being flooded. But it is NOT reasonable to prevent the client
from sending FIN package to server, which makes server wait forever.
On the other side, both server and clients are linux machine.

Andrew
 
Reply With Quote
 
Andrew Jsyqf
Guest
Posts: n/a

 
      01-17-2009, 06:02 AM
On Jan 17, 5:33*am, buck <b...@private.mil> wrote:
> On Thu, 15 Jan 2009 22:07:04 -0800 (PST), Andrew Jsyqf
>
> <andrew.js...@gmail.com> wrote:
> >Could anybody kindly tell me why there are so many 'idle' connections
> >in S1?

>
> >Thank you very much !

>
> >Andrew

>
> Because there is an insanely long timeout in the connection tracking
> code. *For a TCP connection, it is 5 days. *For a UDP connection,
> about 3 minutes. *You can change the 5 day setting, but that involves
> re-compiling. *Google should provide the name of the file to edit if
> you decide to pursue this. *I change 5 days to 2, which is STILL
> insanely long.
> --
> buck


Buck, thanks for your valuable post very much. I have little knowledge
about connection track before, as common network books or courses
don't mention it at all. I tried to find some related topic by google,
and most of result seem to be firewall or iptables related (ie.
http://www.kalamazoolinux.org/presen...conntrack.html ).
So I am not sure whether the connection tracking in my mind is same
with what you are talking. If yes, what is the relationship with my
question? If no, what is the connection tracking please?

Generally speaking, if package is not lost in the transfer between
server and clients, the server should close the socket. Do you mean if
the FIN package is lost or client does cheating, the server has to
wait for 5 days to be notified?

thanks again.

Andrew
 
Reply With Quote
 
buck
Guest
Posts: n/a

 
      01-17-2009, 08:35 PM
Andrew Jsyqf <(E-Mail Removed)> wrote in news:b3b0759f-d797-415a-
a38d-(E-Mail Removed):

> Buck, thanks for your valuable post very much. I have little knowledge
> about connection track before, as common network books or courses
> don't mention it at all. I tried to find some related topic by google,
> and most of result seem to be firewall or iptables related (ie.
> http://www.kalamazoolinux.org/presen...conntrack.html ).
> So I am not sure whether the connection tracking in my mind is same
> with what you are talking. If yes, what is the relationship with my
> question? If no, what is the connection tracking please?


Yes, the connection tracking I'm talking about is netfilter.

Check /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_timeout*
In recent kernels these parameters may have moved, possibly to
/proc/sys/net/netfilter/

Run a couple of checks:
cat /proc/net/ip_conntrack
The third column is the remaining timeout in seconds.
egrep -c "ESTAB|TIME" /proc/net/ip_conntrack

This is speculation. I am not certain that what I am about to say is
true, but it is what I think is true. When a connection receives a FIN,
it switches from ESTABLISHED to TIME_WAIT and waits 5 days for the
timeout.
--
buck
 
Reply With Quote
 
David Schwartz
Guest
Posts: n/a

 
      01-18-2009, 01:34 AM
On Jan 16, 10:37*pm, Andrew Jsyqf <andrew.js...@gmail.com> wrote:

> Limiting the connection number is reasonable to prevent the server
> from being flooded. But it is NOT *reasonable to prevent the client
> from sending FIN package to server, which makes server wait forever.
> On the other side, both server and clients are linux machine.


If the server waits forever, then it's broken.

DS
 
Reply With Quote
 
Andrew Jsyqf
Guest
Posts: n/a

 
      01-18-2009, 03:04 AM
On Jan 18, 5:35*am, buck <b...@private.mil> wrote:
> Andrew Jsyqf <andrew.js...@gmail.com> wrote in news:b3b0759f-d797-415a-
> a38d-a27178f00...@p36g2000prp.googlegroups.com:
>
> > Buck, thanks for your valuable post very much. I have little knowledge
> > about connection track before, as common network books or courses
> > don't mention it at all. I tried to find some related topic by google,
> > and most of result seem to be firewall or iptables related (ie.
> >http://www.kalamazoolinux.org/presen...conntrack.html).
> > So I am not sure whether the connection tracking in my mind is same
> > with what you are talking. If yes, what is the relationship with my
> > question? If no, what is the connection tracking please?

>
> Yes, the connection tracking I'm talking about is netfilter.
>
> Check /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_timeout*
> In recent kernels these parameters may have moved, possibly to
> /proc/sys/net/netfilter/
>
> Run a couple of checks:
> cat /proc/net/ip_conntrack
> * The third column is the remaining timeout in seconds.
> egrep -c "ESTAB|TIME" /proc/net/ip_conntrack
>
> This is speculation. *I am not certain that what I am about to say is
> true, but it is what I think is true. *When a connection receives a FIN,
> it switches from ESTABLISHED to TIME_WAIT and waits 5 days for the
> timeout.
> --
> buck


Buck, David, thanks again.

There is a little mistake in your above explanation: When a connection
receives a FIN, the state will be switched from ESTABLISHED to
CLOSE_WAIT, not TIME_WAIT. Well, it is not important.

What makes me confused is why S1 did not receive FIN, and always be in
ESTABLISHED state. David said, some client OS will not send FIN when
connection is closed, because it prevents server from being flooded.
But in my opinion, not only it doesn't help server reduce loading, but
also makes server has to maintain a lot of 'idle' connection. On the
other side, even if missing FIN on server is due to the network issue,
the client OS should be responsible to retransmit the FIN to make sure
server be notified. But in my test, client process exit but server
still contains so many ESTABLISHED connections. So no matter what
number the timeout value is, it should not happen, but it did happen.

Is it a bug of linux kernel? because it only happens when the
connection number is over 10K (not a certain value). If the number of
connection is not to many, everything is ok. If it is a bug, I would
like to submit to kernel.org. If someone can tell me how to fix it (I
really want to do something for kernel, but no chance, no guider),
that will be wonderful.

Any more idea?

Thanks again for all of you.
Andrew
 
Reply With Quote
 
David Schwartz
Guest
Posts: n/a

 
      01-19-2009, 02:18 AM
On Jan 17, 8:04*pm, Andrew Jsyqf <andrew.js...@gmail.com> wrote:

> What makes me confused is why S1 did not receive FIN, and always be in
> ESTABLISHED state. David said, some client OS will not send FIN *when
> connection is closed, because it prevents server from being flooded.
> But in my opinion, not only it doesn't help server reduce loading, but
> also makes server has to maintain a lot of 'idle' connection. On the
> other side, even if missing FIN on server is due to the network issue,
> the client OS should be responsible to retransmit the FIN to make sure
> server be notified.


That's impossible in principle. How many times do you retransmit the
FIN to "make sure" the server gets it? Three times? Ten times?

> But in my test, client process exit but server
> still contains so many ESTABLISHED connections. So no matter what
> number the timeout value is, it should not happen, but it did happen.
>
> Is it a bug of linux kernel? because it only happens when the
> connection number is over 10K (not a certain value). If the number of
> connection is not to many, everything is ok. If it is a bug, I would
> like to submit to kernel.org. If someone can tell me how to fix it (I
> really want to do something for kernel, but no chance, no guider),
> that will be wonderful.
>
> Any more idea?


This is normal behavior. There's no bug. TCP does not guarantee that
the two ends will remain synchronized in the face of packet loss, rate-
limiting, and the like. From your description, it sounds like the
server application is broken.

DS
 
Reply With Quote
 
Pascal Hambourg
Guest
Posts: n/a

 
      01-19-2009, 08:00 PM
Hello,

buck a écrit :
>
> Yes, the connection tracking I'm talking about is netfilter.


The connection tracking in netfilter is not related in any way with the
actual socket handling by the TCP/IP stack. In other words, the contents
of /proc/net/ip_conntrack is independant from the output of netstat.

You wrote that changing a time-out value in the netfilter conntrack
involves recompiling. This is wrong. You gave the way to change it at
run-time :

> Check /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_timeout*
> In recent kernels these parameters may have moved, possibly to
> /proc/sys/net/netfilter/


You have to recompile only when you want to change the default values.
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Connection to the Internet cannot be established Blue Horizon Broadband 6 08-24-2005 04:52 PM
Looking for a doc on AP/client connection process quinnray@yahoo.co.uk Linux Networking 0 08-06-2005 10:16 AM
GPRS connection established, but no ping possible Ekkard Gerlach Linux Networking 2 10-23-2004 09:02 PM
VPN connection established but cannot access folders...? =?Utf-8?B?cmFscGggbWFjZG91Z2xhcw==?= Windows Networking 3 10-06-2004 02:21 PM
Internet Connection Sharing: "Generic Host Process... has encountered a problem and needs to close..." error when client connects Philip Herlihy Home Networking 15 09-02-2003 09:28 PM



1 2 3 4 5 6 7 8 9 10 11