WHY: the client process has been killed, but the connection on serveris still ESTABLISHED

Discussion in 'Linux Networking' started by Andrew Jsyqf, Jan 16, 2009.

  1. Andrew Jsyqf

    Andrew Jsyqf Guest

    Hi everyone,

    I found such an interesting problem yesterday when I did the benchmark
    test for epoll, and I really don't know why it happens.

    I have a very simple client program, which just creates X tcp
    connections to the server(named S1) and wait until being killed. I
    deployed such a program into 100 physical servers, and start all the
    client programs simultaneously by gexec (a tool makes me able to run
    program on multiple servers). I also have a very simple server program
    which runs on S1.

    When X==100, which implies total number of connections == 10,000,
    everything is ok.
    When X==200, which means total number of connections==20,000, a
    strange thing happened:
    After I killed all the client program (I am pretty sure all of them
    are killed, because I cannot found them by `ps aux` in such 100
    servers), there are still a lot of number tcp connections on S1, and
    their status are established ( result in `netstat -tan | grep 8080`).
    No matter how long I wait, such connections are still there, and
    established. Then I looked into the connections on the client side,
    there is no connection at all (also by `netstat -tan | grep 8080`) !

    8080 is the port on which the server program listen.

    Following information is about S1:
    uname -a:
    Linux S1 2.6.24-23-server #1 SMP Thu Nov 27 19:19:15 UTC 2008 i686 GNU/

    ulimit -a
    core file size (blocks, -c) 0
    data seg size (kbytes, -d) unlimited
    scheduling priority (-e) 0
    file size (blocks, -f) unlimited
    pending signals (-i) 8160
    max locked memory (kbytes, -l) 32
    max memory size (kbytes, -m) unlimited
    open files (-n) 65535
    pipe size (512 bytes, -p) 8
    POSIX message queues (bytes, -q) 819200
    real-time priority (-r) 0
    stack size (kbytes, -s) 8192
    cpu time (seconds, -t) unlimited
    max user processes (-u) 8160
    virtual memory (kbytes, -v) unlimited
    file locks (-x) unlimited

    Could anybody kindly tell me why there are so many 'idle' connections
    in S1?

    Thank you very much !

    Andrew Jsyqf, Jan 16, 2009
    1. Advertisements

  2. Andrew Jsyqf

    Andrew Jsyqf Guest

    My own analysis: since the connection's status on S1 is established,
    which means the kernel of S1 didn't receive FIN. It implies the
    connection status on the client side should be at least something like
    FIN_WAIT. But what I saw is, all of the connections disappeared. All
    the client machines and S1 have not been restart.

    Does anyone explain it to me?

    Thanks in advance.
    Andrew Jsyqf, Jan 16, 2009
    1. Advertisements

  3. Probably packet loss gets high near the server. The clients have
    probably given up trying to shut down the connections. It could also
    be a safety of sorts in the clients. Windows XP, for example, has
    special rate limiting to prevent clients from flooding servers with
    large numbers of TCP establishment and tear down requests.

    David Schwartz, Jan 16, 2009
  4. Andrew Jsyqf

    Andrew Jsyqf Guest

    Hi David, thanks for your answer.

    Limiting the connection number is reasonable to prevent the server
    from being flooded. But it is NOT reasonable to prevent the client
    from sending FIN package to server, which makes server wait forever.
    On the other side, both server and clients are linux machine.

    Andrew Jsyqf, Jan 17, 2009
  5. Andrew Jsyqf

    Andrew Jsyqf Guest

    Buck, thanks for your valuable post very much. I have little knowledge
    about connection track before, as common network books or courses
    don't mention it at all. I tried to find some related topic by google,
    and most of result seem to be firewall or iptables related (ie.
    http://www.kalamazoolinux.org/presentations/20010417/conntrack.html ).
    So I am not sure whether the connection tracking in my mind is same
    with what you are talking. If yes, what is the relationship with my
    question? If no, what is the connection tracking please?

    Generally speaking, if package is not lost in the transfer between
    server and clients, the server should close the socket. Do you mean if
    the FIN package is lost or client does cheating, the server has to
    wait for 5 days to be notified?

    thanks again.

    Andrew Jsyqf, Jan 17, 2009
  6. Andrew Jsyqf

    buck Guest

    Yes, the connection tracking I'm talking about is netfilter.

    Check /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_timeout*
    In recent kernels these parameters may have moved, possibly to

    Run a couple of checks:
    cat /proc/net/ip_conntrack
    The third column is the remaining timeout in seconds.
    egrep -c "ESTAB|TIME" /proc/net/ip_conntrack

    This is speculation. I am not certain that what I am about to say is
    true, but it is what I think is true. When a connection receives a FIN,
    it switches from ESTABLISHED to TIME_WAIT and waits 5 days for the
    buck, Jan 17, 2009
  7. If the server waits forever, then it's broken.

    David Schwartz, Jan 18, 2009
  8. Andrew Jsyqf

    Andrew Jsyqf Guest

    Buck, David, thanks again.

    There is a little mistake in your above explanation: When a connection
    receives a FIN, the state will be switched from ESTABLISHED to
    CLOSE_WAIT, not TIME_WAIT. Well, it is not important.

    What makes me confused is why S1 did not receive FIN, and always be in
    ESTABLISHED state. David said, some client OS will not send FIN when
    connection is closed, because it prevents server from being flooded.
    But in my opinion, not only it doesn't help server reduce loading, but
    also makes server has to maintain a lot of 'idle' connection. On the
    other side, even if missing FIN on server is due to the network issue,
    the client OS should be responsible to retransmit the FIN to make sure
    server be notified. But in my test, client process exit but server
    still contains so many ESTABLISHED connections. So no matter what
    number the timeout value is, it should not happen, but it did happen.

    Is it a bug of linux kernel? because it only happens when the
    connection number is over 10K (not a certain value). If the number of
    connection is not to many, everything is ok. If it is a bug, I would
    like to submit to kernel.org. If someone can tell me how to fix it (I
    really want to do something for kernel, but no chance, no guider),
    that will be wonderful.

    Any more idea?

    Thanks again for all of you.
    Andrew Jsyqf, Jan 18, 2009
  9. That's impossible in principle. How many times do you retransmit the
    FIN to "make sure" the server gets it? Three times? Ten times?
    This is normal behavior. There's no bug. TCP does not guarantee that
    the two ends will remain synchronized in the face of packet loss, rate-
    limiting, and the like. From your description, it sounds like the
    server application is broken.

    David Schwartz, Jan 19, 2009
  10. Hello,

    buck a écrit :
    The connection tracking in netfilter is not related in any way with the
    actual socket handling by the TCP/IP stack. In other words, the contents
    of /proc/net/ip_conntrack is independant from the output of netstat.

    You wrote that changing a time-out value in the netfilter conntrack
    involves recompiling. This is wrong. You gave the way to change it at
    run-time :
    You have to recompile only when you want to change the default values.
    Pascal Hambourg, Jan 19, 2009
  11. Andrew Jsyqf

    Andrew Jsyqf Guest

    Well. a lot of post now :) Thanks all of you.

    Anyway, why does not the client re-send the FIN, and why the sockets
    on S1 remains ESTABLISHED. The timeout value is not important and has
    no business with it. I have looked all the materials, nothing found.
    And the TCP/IP stack should be responsible for re-send the FIN package
    in my mind.

    Andrew Jsyqf, Jan 20, 2009
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.