TCP connection stalls during SSL handshake

Discussion in 'Linux Networking' started by Tino Schwarze, Apr 12, 2011.

  1. Hi there,

    I'm currently chasing a rather weird problem. We've got a Java based
    program to connect to a Webshop (for importing/exporting data). Of
    course, we use SSL. (Webshop is on a webspace, client is behind DSL.)

    It used to work(tm) but somehow stopped working - the program could not
    access the Webshop anymore. Accessing by browser (from same machine)
    works.

    It's getting stranger... there's a proxy inbetween which is used by the
    Java app. I'm ssh-ing to the proxy machine (an OpenSUSE box, running
    kernel 2.6.27.56), I use lynx - it works. I use wget - it works.

    For testing, I made a very simple program which basically does

    new URL("https://my-url/").openConnection();

    and copies everything to stdout. Now the request just gets stuck.

    So I take a traffic dump, load it into Wireshark and see (sorry for the
    long lines):

    No. Time Source Destination Protocol Size Info
    1 0.000000 212.80.cli.ent 188.40.t.srv TCP 74 49047 > https [SYN] Seq=0 Win=5840 Len=0 MSS=1460 SACK_PERM=1 TSV=4228721087 TSER=0 WS=5
    2 0.031476 188.40.t.srv 212.80.cli.ent TCP 66 https > 49047 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1460 SACK_PERM=1 WS=6
    3 0.031527 212.80.cli.ent 188.40.t.srv TCP 54 49047 > https [ACK] Seq=1 Ack=1 Win=5856 Len=0
    4 0.219837 212.80.cli.ent 188.40.t.srv SSLv2 166 Client Hello
    5 0.251510 188.40.t.srv 212.80.cli.ent TCP 60 https > 49047 [ACK] Seq=1 Ack=113 Win=5888 Len=0
    6 0.260881 188.40.t.srv 212.80.cli.ent TLSv1 1514 Server Hello
    7 0.260919 212.80.cli.ent 188.40.t.srv TCP 54 49047 > https [ACK] Seq=113 Ack=1461 Win=8768 Len=0
    8 0.267468 188.40.t.srv 212.80.cli.ent TCP 1514 [TCP segment of a reassembled PDU]
    9 0.267506 212.80.cli.ent 188.40.t.srv TCP 54 49047 > https [ACK] Seq=113 Ack=2921 Win=11680 Len=0
    10 0.295676 188.40.t.srv 212.80.cli.ent TLSv1 604 Certificate, Server Hello Done
    11 0.295715 212.80.cli.ent 188.40.t.srv TCP 54 49047 > https [ACK] Seq=113 Ack=3471 Win=14624 Len=0
    12 0.313933 212.80.cli.ent 188.40.t.srv TLSv1 321 Client Key Exchange
    13 0.318689 212.80.cli.ent 188.40.t.srv TLSv1 60 Change Cipher Spec
    14 0.321350 212.80.cli.ent 188.40.t.srv TLSv1 91 Encrypted Handshake Message
    15 0.353167 188.40.t.srv 212.80.cli.ent TCP 66 https > 49047 [ACK] Seq=3471 Ack=380 Win=6912 Len=0 SLE=386 SRE=423
    16 0.582048 212.80.cli.ent 188.40.t.srv TLSv1 60 [TCP Retransmission] Change Cipher Spec
    17 1.046053 212.80.cli.ent 188.40.t.srv TLSv1 60 [TCP Retransmission] Change Cipher Spec
    18 1.974052 212.80.cli.ent 188.40.t.srv TLSv1 60 [TCP Retransmission] Change Cipher Spec
    19 3.830055 212.80.cli.ent 188.40.t.srv TLSv1 60 [TCP Retransmission] Change Cipher Spec
    20 7.542049 212.80.cli.ent 188.40.t.srv TLSv1 60 [TCP Retransmission] Change Cipher Spec
    21 14.966051 212.80.cli.ent 188.40.t.srv TLSv1 60 [TCP Retransmission] Change Cipher Spec
    22 29.814055 212.80.cli.ent 188.40.t.srv TLSv1 60 [TCP Retransmission] Change Cipher Spec
    23 59.510067 212.80.cli.ent 188.40.t.srv TLSv1 60 [TCP Retransmission] Change Cipher Spec
    24 70.992437 212.80.cli.ent 188.40.t.srv TCP 54 49047 > https [FIN, ACK] Seq=423 Ack=3471 Win=14624 Len=0
    25 71.023543 188.40.t.srv 212.80.cli.ent TCP 66 [TCP Dup ACK 15#1] https > 49047 [ACK] Seq=3471 Ack=380 Win=6912 Len=0 SLE=386 SRE=424

    The FIN,ACK gets sent when I press CTRL-C to abort the program.

    I captured all traffic going to the server (tcpdump ... host 188.40.t.srv).
    It looks like the TCP session gets stuck at packet 13. I don't know
    where to look next or whom to ask.

    I checked that the following things work:
    - access $url via Browser, lynx and wget
    - access $url via said Java program via several other internet
    connections (other DSL providers, from other servers located at a
    certain hoster)

    After all, I can rule out:
    - fundamental Java/Server/SSL issue (because Java program works with
    same Java version on other system)
    - fundamental network issue (because other browsers work)

    Funny thing is, it worked once today - even though I didn't change
    anything.

    Any hints?

    Thanks,

    Tino.

    PS: I found someone with a very similar problem:
    http://seclists.org/wireshark/2010/Jun/85 but no anwer nor solution (and
    he's getting Dup ACKs at least while I'm only seeing a DUP ACK after
    connection is closed.)
     
    Tino Schwarze, Apr 12, 2011
    #1
    1. Advertisements

  2. Tino Schwarze

    opendog1 Guest

    It's a bit late ... but what was the reason for the retransmissions? I have similar problems here.

    Danke,
    Bela
     
    opendog1, Jun 24, 2013
    #2
    1. Advertisements

  3. Tino Schwarze

    anoopk6 Guest

    I also ran in to same issue while performing LDAP SSL operation from Java code. Any idea what is the root cause of
    this problem. Are there any workarounds ?

     
    anoopk6, Oct 1, 2013
    #3
  4. Hi,

    you might suffer from entropy pool depletion. Quick check:
    # cat /proc/sys/kernel/random/entropy_avail
    should display something above 1000 and not go down below 200 during
    normal operation. It worked once because the entropy pool gathered
    enough entropy (e.g. from network activity or whatnot) to provide
    sufficient randomness.

    We've had connections to Oracle DB get stuck during handshake because of
    this issue. Google for haveged for a solution.

    HTH,

    Jamma.

     
    Jamma Tino Schwarze, Oct 1, 2013
    #4
  5. Tino Schwarze

    anoopk6 Guest

    Thanks . I will investigate the entropy parameter.

    Packet capture indicates that the following happens for more than 5 min

    client ------ Client Key Exchange ----------------------> Server
    client ------ Client Key Exchange(Re-transmission)------> Server
    client <----- ACK -------------------------------------- Server
    client ------ Client Key Exchange(Re-transmission)------> Server
    client <------ Dup ACK --------------------------------- Server
    client ------ Client Key Exchange(Re-transmission)------> Server
    client <------ Dup ACK --------------------------------- Server

    Anoop


     
    anoopk6, Oct 2, 2013
    #5
  6. Tino Schwarze

    anoopk6 Guest

    Thanks . I will investigate the entropy parameter.

    Packet capture indicates that the following happens for more than 5 min

    client ------ Client Key Exchange ----------------------> Server
    client ------ Client Key Exchange(Re-transmission)------> Server
    client <----- ACK -------------------------------------- Server
    client ------ Client Key Exchange(Re-transmission)------> Server
    client <------ Dup ACK --------------------------------- Server
    client ------ Client Key Exchange(Re-transmission)------> Server
    client <------ Dup ACK --------------------------------- Server

    Anoop


     
    anoopk6, Oct 2, 2013
    #6
  7. Tino Schwarze

    Rick Jones Guest

    What is the ACKnumber in the ACK from Server to client and how does
    that compare to the SEQuence number of the TCP segment carrying the
    Client Key Exchange? For that, "plain" tcpdump formatting rather than
    wireshark's (?) sometimes overly helpful formatting would be
    indicated. Are there checksum or other failures being reported in
    netstat -s?

    rick jones
     
    Rick Jones, Oct 2, 2013
    #7
  8. Tino Schwarze

    anoopk6 Guest

    Client Key Exchange Seq=172 Ack=6515 Len=274
    ACK Seq=6515 Ack=446 Len=0

    Thanks
    Anoop


     
    anoopk6, Oct 3, 2013
    #8
  9. Tino Schwarze

    anoopk6 Guest

    Attaching wireshark screenshot http://img59.imageshack.us/img59/6431/p63e.png.

    Anoop

     
    anoopk6, Oct 21, 2013
    #9
  10. Yup, that looks exactly like the issue I was describing two years ago.
    Did you check the kernel entropy pool?
    /proc/sys/kernel/random/entropy_avail ?

    Jamma.
     
    Jamma Tino Schwarze, Oct 24, 2013
    #10
  11. Tino Schwarze

    anoopk6 Guest

    I checked entropy.

    Latest news is that the problem disappeared after removing a firewall in the network. But I am still curious as to how a TCP transaction can go in to this re transmit loop (as seen in wireshark screenshot) assuming there is afirewall in between. Why client is retransmitting same packet again even after receiving ACK from server ?
     
    anoopk6, Oct 25, 2013
    #11
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.