Linux ppp + MegaPOP dialup change = mrru related LCP timeout

Discussion in 'Linux Networking' started by Michael Shell, Jan 6, 2005.

  1. I have been using Newsguy's dialup ISP service for sometime now and
    have been happy with it, until last month when my Linux box could no
    longer establish a ppp connection. The symptom was that pppd errored
    out with "LCP: timeout sending Config-Requests". I had been running
    pppd version 2.4.1, upgrading to 2.4.3 did not help.

    The strange thing is that I can connect to the Atlanta numbers
    (e.g., 678-538-1522) without issue, but the problem does show up on
    the Augusta, Georgia line (706-849-0578). The Augusta line is on the
    MegaPOP network which is owned by Starnet (www.starnetinc.com).

    So, I enabled my pppd's debug option to see what was going on. With
    the Atlanta number, all is well:


    rcvd [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x923d695a> <pcomp>
    <accomp> <auth pap>]
    sent [LCP ConfAck id=0x1 <asyncmap 0x0> <magic 0x923d695a> <pcomp>
    <accomp> <auth pap>]


    But, there is a problem with the Augusta number:


    sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0xc473d9d7> <pcomp> <accomp>]
    rcvd [LCP ConfReq id=0x1 <mru 1501> <asyncmap 0xa0000> <auth pap>
    <magic 0xfd874513> <pcomp> <accomp> <mrru 1524>
    <endpoint [local:77.64.63.34.2d.6c.6e.73.31]>]
    sent [LCP ConfRej id=0x1 <mrru 1524>]


    After which point the host (peer from my machine's perspective) seems to
    ignore my machine's ConfRej message and simply reissues its previous
    ConfReq options until my machine times-out. I am not sure if the ppp
    software these ISP's use is even capable of full negotiation (that
    might be too much to ask). However, unless they do negotiate, they
    should not default to multilink operation (<mrru 1524> above) which
    is for use with multiple modems (so as to get a ppp multiline
    connection over 56Kbps).

    The behavior of the Augusta ppp host seems to me to be a violation
    of ppp standards. Specifically, section 5.1.1 of RFC1990
    (http://www.ietf.org/rfc/rfc1990.txt) states:


    The presence of this [mrru] LCP option indicates that the system
    sending it implements the PPP Multilink Protocol. If not rejected,
    the system will construe all packets received on this link as being
    able to be processed by a common protocol machine with any other
    packets received from the same peer on any other link on which
    this option has been accepted.
     
    Michael Shell, Jan 6, 2005
    #1
    1. Advertisements

  2. ....
    Try compiling multilink support above and then use the multilink
    option; it *might* be a workaround since my ISP, also via a regular
    landline, will negotiate MP and be happy with just one MP connection.

    I won't try to give detailed answers to your other questions. But the
    ISP's PPP implementation is simply broken in my eyes. My ISP will
    also accept a Configure-Reject of mrru and complete PPP negotiations.
    In addition it will complete negotiations when using the nomultilink
    option. I believe this is as it should be and that, generally, all
    your conclusions are correct.
     
    Clifford Kite, Jan 6, 2005
    #2
    1. Advertisements

  3. [...]
    [...]
    Isn't there a possibility that a zero ACCM can't be used here. This POP
    asks for 0x0A0000 whereas the other one asks for zero. It could be that
    your Config-Reject is getting lost because of ACCM problems and that is
    why the peer appears to be ignoring it. Does the peer respond to your
    Config-Request?


    [...]
    Won't it be funny if it turns out to be a ACCM issue?
     
    Alan McFarlane, Jan 6, 2005
    #3
  4. It's not an ACCM problem. At this point ACCM has not been negotiated
    and all Control Characters are escaped.
     
    Clifford Kite, Jan 6, 2005
    #4
  5. Ahh yes, apologies.
     
    Alan McFarlane, Jan 6, 2005
    #5

  6. Thanks for the help, trying this, I get:


    CONNECT 45333/ARQ/V90/LAPM/V42BIS

    Connected!
    Serial connection established.
    using channel 1
    Starting negotiation on /dev/modem
    sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x1e9403cf> <pcomp> <accomp> <mrru 1500> <endpoint [MAC:00:20:10:71:48:21]>]
    rcvd [LCP ConfReq id=0x1 <mru 1501> <asyncmap 0xa0000> <auth pap> <magic 0x43f0e8b1> <pcomp> <accomp> <mrru 1524> <endpoint [local:77.64.63.34.2d.6c.6e.73.31]>]
    sent [LCP ConfAck id=0x1 <mru 1501> <asyncmap 0xa0000> <auth pap> <magic 0x43f0e8b1> <pcomp> <accomp> <mrru 1524> <endpoint [local:77.64.63.34.2d.6c.6e.73.31]>]
    sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x1e9403cf> <pcomp> <accomp> <mrru 1500> <endpoint [MAC:00:20:10:71:48:21]>]
    sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x1e9403cf> <pcomp> <accomp> <mrru 1500> <endpoint [MAC:00:20:10:71:48:21]>]
    sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x1e9403cf> <pcomp> <accomp> <mrru 1500> <endpoint [MAC:00:20:10:71:48:21]>]
    sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x1e9403cf> <pcomp> <accomp> <mrru 1500> <endpoint [MAC:00:20:10:71:48:21]>]
    sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x1e9403cf> <pcomp> <accomp> <mrru 1500> <endpoint [MAC:00:20:10:71:48:21]>]
    rcvd [LCP ConfReq id=0x2 <mru 1501> <asyncmap 0xa0000> <auth pap> <magic 0x43f0e8b1> <pcomp> <accomp> <mrru 1524> <endpoint [local:77.64.63.34.2d.6c.6e.73.31]>]
    sent [LCP ConfAck id=0x2 <mru 1501> <asyncmap 0xa0000> <auth pap> <magic 0x43f0e8b1> <pcomp> <accomp> <mrru 1524> <endpoint [local:77.64.63.34.2d.6c.6e.73.31]>]
    sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x1e9403cf> <pcomp> <accomp> <mrru 1500> <endpoint [MAC:00:20:10:71:48:21]>]
    sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x1e9403cf> <pcomp> <accomp> <mrru 1500> <endpoint [MAC:00:20:10:71:48:21]>]
    sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x1e9403cf> <pcomp> <accomp> <mrru 1500> <endpoint [MAC:00:20:10:71:48:21]>]
    sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x1e9403cf> <pcomp> <accomp> <mrru 1500> <endpoint [MAC:00:20:10:71:48:21]>]
    LCP: timeout sending Config-Requests
    Connection terminated.



    What gets me is that the host never seems to alter its behavior
    regardless of what my machine sends - ConfRej of ConfAck. It is as
    if it never sees any of the data sent from my machine. I tried the
    asyncmap 0xa0000 option just for the heck of it, but it did not help.

    I note that the ppp standard has a lot of conditions for silently
    dropping packets. Could it be that they have a buggy ppp host that
    sees all Linux pppd generated LCP packets as being invalid? I have
    no idea how robust LCP packets are (7 bit, etc.). If so, I wonder
    how MS Windows does it differently.

    Another possibility is a modem firmware problem. That is, *after* my
    particular modem connects, their end never sees any of the data sent
    from my modem. I am using a TI chipset based 56Kbps hardware modem
    which I have never had a problem with. I can connect to my backup ISP
    without trouble, but with that Augusta number I have not been able to
    connect for a month (I would think that I would eventually get a "good"
    modem after dozens of tries). I tried connecting at 14.4Kbps, but this
    did not change anything. I even had the gall to bring my now ancient
    Hayes 2400 external modem out of mothball, but I could hear from the
    tones that modern modems have long since forgotten about the pre-14.4K
    days. (IMHO, 9600bps was the last time everything worked as it should. ;)
    Of course, I would be able to check this with minicom if they still
    offered a "login: " prompt (which they don't).

    Yet another possibility is something related to this bogus "high speed
    dialup" (aka the AOL runner) feature everyone is offering. I sure hope
    that they do not require special bits for this to be sent during ppp
    negotiation.

    Now I am beginning to wonder if what they told me about MS Windows XP
    clients being able to connect is really true. Maybe that line is
    totally hosed and they are covering it up. ;)


    Mike Shell
     
    Michael Shell, Jan 7, 2005
    #6
  7. Okay, I focused on MP because it appears you use the same host and
    device file for each connection. There is only one other thing I know
    about that can cause the peer not to "hear" any of your LCP requests,
    given that a good serial connection is established and knowing that
    pppd is sending and receiving valid LCP requests.

    If the type of UART configured for the device file differs from the
    actual UART type then that would cause the problem. I still don't
    see how it's possible in this case since you can connect to the other
    POP, and seem to have no problem connecting to both until recently.
    But that's all I have left to suggest.

    (A FYI - the most common UART is a 16550A and configuring the device
    file for a 16550 won't work even though the package/manual for the
    serial device may say 16550. The UART type can be changed using the
    setserial program.)
    I'm no longer sure it's a buggy peer. PPP is a standard and though
    PPP implementations vary they should be compatible enough to provide
    a connection (cell-phones excepted).
    Since it was able to connect to the troublesome POP previously, I don't
    see how firmware could be the problem unless something broke. I'd expect
    the other POP connection would also fail if that happened.

    ....
    I *think* that is accomplished with a server at the ISP that caches
    web pages and client software provided by the ISP to MS clients.
     
    Clifford Kite, Jan 7, 2005
    #7
  8. OK, I decided to boot with MS Windows 2000 (same machine) and see if I
    could connect with that. Indeed, I could - the byte-level details of the
    log file are at the end of this post.

    Manually decoding the bytes in the MS log to a pppd-like format, I came up
    with this:


    sent [LCP ConfReq id=0x00 len=0x32 <asyncmap 0x00000000> <magic 0x3a0b158e> <pcomp> <accomp> <callback 0x06> <mrru 1614> <endpoint local:1c.79.3b.b1.2d.8c.47.d0.9b.fc.a8.ca.50.78.98.e9.00.00.00.00>]
    sent [LCP ConfReq id=0x01 len=0x32 <asyncmap 0x00000000> <magic 0x3a0b158e> <pcomp> <accomp> <callback 0x06> <mrru 1614> <endpoint local:1c.79.3b.b1.2d.8c.47.d0.9b.fc.a8.ca.50.78.98.e9.00.00.00.00>]
    rcvd [LCP ConfReq id=0x01 len=0x2c <mru 1501> <asyncmap 0x000a0000> <auth pap> <magic 0x48bbe142> <pcomp> <accomp> <mrru 1524> <endpoint local:77.64.63.34.2d.6c.6e.73.31>]
    sent [LCP ConfAck id=0x01 len=0x2c <mru 1501> <asyncmap 0x000a0000> <auth pap> <magic 0x48bbe142> <pcomp> <accomp> <mrru 1524> <endpoint local:77.64.63.34.2d.6c.6e.73.31>]
    rcvd [LCP ConfRej id=0x01 len=0x07 <callback 0x06>]
    sent [LCP ConfReq id=0x02 len=0x2f <asyncmap 0x00000000> <magic 0x3a0b158e> <pcomp> <accomp> <mrru 1614> <endpoint local:1c.79.3b.b1.2d.8c.47.d0.9b.fc.a8.ca.50.78.98.e9.00.00.00.00]>
    rcvd [LCP ConfAck id=0x02 len=0x2f <asyncmap 0x00000000> <magic 0x3a0b158e> <pcomp> <accomp> <mrru 1614> <endpoint local:1c.79.3b.b1.2d.8c.47.d0.9b.fc.a8.ca.50.78.98.e9.00.00.00.00]>


    What the heck is going on?! This is the exact same hardware, so now I
    don't think it is a modem firmware issue. The 0D 03 06 LCP option
    from Windows 2000 is strange. My, possibly incorrect, interpretation of
    this is that it is the callback option (0x0d=13) of Section 2.3 of RFC1570.
    However, the operation code of 6 is strange in that RFC1570 only lists
    up to number 4. Furthermore, why in the heck would MS Windows be requesting
    a callback anyway?! The host does wakeup to it and reject it after which
    all is well. I have no idea if pppd can be configured to issue this
    strange option - I would try it if I could.

    The $10,000 question is why does the host seem to see the Windows 2000
    generated LCP packets, but not those from Linux's pppd? Remember, I can
    connect to other numbers just fine under Linux with the same setup,
    options and dialscripts, so the serial line/modem cannot be broken.


    I tried using a pppd option:

    endpoint local:1c.79.3b.b1.2d.8c.47.d0.9b.fc.a8.ca.50.78.98.e9.00.00.00.00

    so as to more closely mimic MS Windows, but the host didn't respond any
    differently to it. Ditto for resetting the modem to factory defaults and
    trying the same mrru (1614) as MS Windows.

    I only see two possibilities:

    1. Something is going wrong at the byte level that causes the host
    to silently drop pppd's ConfAck and ConfRej's. I am assuming that my
    pppd would put something in the debug output if it received and
    dropped something improper from the host. I want to look at the
    bytelevel conversation between pppd and the host to see if anything
    differs from the LCP bytes MS Windows sends. What is the best way to
    eavesdrop on the conversation that flows through /dev/modem?

    2. That callback 0x06 invokes some special MS witchcraft.



    What a creepy situation!


    Mike




    Windows 2000 ppp log file details are as follows:
    -----
    ..
    ..
    [1072] 20:13:57:356: <PPP packet sent at 01/08/2005 01:13:57:356
    [1072] 20:13:57:356: <Protocol = LCP, Type = Configure-Req, Length = 0x34, Id = 0x0, Port = 5
    [1072] 20:13:57:356: <C0 21 01 00 00 32 02 06 00 00 00 00 05 06 3A 0B |.!...2........:.|
    [1072] 20:13:57:356: <15 8E 07 02 08 02 0D 03 06 11 04 06 4E 13 17 01 |............N...|
    [1072] 20:13:57:356: <1C 79 3B B1 2D 8C 47 D0 9B FC A8 CA 50 78 98 E9 |.y;.-.G.....Px..|
    [1072] 20:13:57:356: <00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
    [1072] 20:13:57:356:
    [1072] 20:13:57:356: InsertInTimerQ called portid=0,Id=0,Protocol=c021,EventType=0,fAuth=0
    [1072] 20:13:57:356: InsertInTimerQ called portid=0,Id=0,Protocol=0,EventType=3,fAuth=0
    [1072] 20:13:59:359: Recv timeout event received for portid=0,Id=0,Protocol=c021,fAuth=0
    [1072] 20:13:59:359: NotifyCaller(hPort=5, dwMsgId=9)
    [1072] 20:13:59:359: <PPP packet sent at 01/08/2005 01:13:59:359
    [1072] 20:13:59:359: <Protocol = LCP, Type = Configure-Req, Length = 0x34, Id = 0x1, Port = 5
    [1072] 20:13:59:359: <C0 21 01 01 00 32 02 06 00 00 00 00 05 06 3A 0B |.!...2........:.|
    [1072] 20:13:59:359: <15 8E 07 02 08 02 0D 03 06 11 04 06 4E 13 17 01 |............N...|
    [1072] 20:13:59:359: <1C 79 3B B1 2D 8C 47 D0 9B FC A8 CA 50 78 98 E9 |.y;.-.G.....Px..|
    [1072] 20:13:59:359: <00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
    [1072] 20:13:59:359:
    [1072] 20:13:59:359: InsertInTimerQ called portid=0,Id=1,Protocol=c021,EventType=0,fAuth=0
    [1016] 20:13:59:509: Packet received (46 bytes) for hPort 5
    [1072] 20:13:59:509: >PPP packet received at 01/08/2005 01:13:59:509
    [1072] 20:13:59:509: >Protocol = LCP, Type = Configure-Req, Length = 0x2e, Id = 0x1, Port = 5
    [1072] 20:13:59:509: >C0 21 01 01 00 2C 01 04 05 DD 02 06 00 0A 00 00 |.!...,..........|
    [1016] 20:13:59:519: Packet received (9 bytes) for hPort 5
    [1072] 20:13:59:509: >03 04 C0 23 05 06 48 BB E1 42 07 02 08 02 11 04 |...#..H..B......|
    [1072] 20:13:59:509: >05 F4 13 0C 01 77 64 63 34 2D 6C 6E 73 31 00 00 |.....wdc4-lns1..|
    [1072] 20:13:59:519:
    [1072] 20:13:59:519: <PPP packet sent at 01/08/2005 01:13:59:519
    [1072] 20:13:59:519: <Protocol = LCP, Type = Configure-Ack, Length = 0x2e, Id = 0x1, Port = 5
    [1072] 20:13:59:519: <C0 21 02 01 00 2C 01 04 05 DD 02 06 00 0A 00 00 |.!...,..........|
    [1072] 20:13:59:519: <03 04 C0 23 05 06 48 BB E1 42 07 02 08 02 11 04 |...#..H..B......|
    [1072] 20:13:59:519: <05 F4 13 0C 01 77 64 63 34 2D 6C 6E 73 31 00 00 |.....wdc4-lns1..|
    [1072] 20:13:59:519:
    [1072] 20:13:59:519: >PPP packet received at 01/08/2005 01:13:59:519
    [1072] 20:13:59:519: >Protocol = LCP, Type = Configure-Reject, Length = 0x9, Id = 0x1, Port = 5
    [1072] 20:13:59:519: >C0 21 04 01 00 07 0D 03 06 00 00 00 00 00 00 00 |.!..............|
    [1072] 20:13:59:519:
    [1072] 20:13:59:519: RemoveFromTimerQ called portid=0,Id=1,Protocol=c021,EventType=0,fAuth=0
    [1072] 20:13:59:519: <PPP packet sent at 01/08/2005 01:13:59:519
    [1072] 20:13:59:519: <Protocol = LCP, Type = Configure-Req, Length = 0x31, Id = 0x2, Port = 5
    [1072] 20:13:59:519: <C0 21 01 02 00 2F 02 06 00 00 00 00 05 06 3A 0B |.!.../........:.|
    [1072] 20:13:59:519: <15 8E 07 02 08 02 11 04 06 4E 13 17 01 1C 79 3B |.........N....y;|
    [1072] 20:13:59:519: <B1 2D 8C 47 D0 9B FC A8 CA 50 78 98 E9 00 00 00 |.-.G.....Px.....|
    [1072] 20:13:59:519: <00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
    [1072] 20:13:59:519:
    [1072] 20:13:59:519: InsertInTimerQ called portid=0,Id=2,Protocol=c021,EventType=0,fAuth=0
    [1016] 20:13:59:700: Packet received (49 bytes) for hPort 5
    [1072] 20:13:59:700: >PPP packet received at 01/08/2005 01:13:59:700
    [1072] 20:13:59:700: >Protocol = LCP, Type = Configure-Ack, Length = 0x31, Id = 0x2, Port = 5
    [1072] 20:13:59:700: >C0 21 02 02 00 2F 02 06 00 00 00 00 05 06 3A 0B |.!.../........:.|
    [1072] 20:13:59:700: >15 8E 07 02 08 02 11 04 06 4E 13 17 01 1C 79 3B |.........N....y;|
    [1072] 20:13:59:700: >B1 2D 8C 47 D0 9B FC A8 CA 50 78 98 E9 00 00 00 |.-.G.....Px.....|
    [1072] 20:13:59:700: >00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
    [1072] 20:13:59:700:
    [1072] 20:13:59:700: RemoveFromTimerQ called portid=0,Id=2,Protocol=c021,EventType=0,fAuth=0
    [1072] 20:13:59:700: FsmThisLayerUp called for protocol = c021, port = 5
    [1072] 20:13:59:700: LCP Local Options-------------
    [1072] 20:13:59:700: MRU=1500,ACCM=0,Auth=0,MagicNumber=973804942,PFC=ON,ACFC=ON
    [1072] 20:13:59:700: Recv Framing = PPP Multilink,SSHF=OFF,MRRU=1614,LinkDiscrim=0,BAP=OFF
    [1072] 20:13:59:700: ED Class = 1, ED Value = 1c793bb12d8c47d09bfca8ca507898e900000000
    [1072] 20:13:59:700: LCP Remote Options-------------
    [1072] 20:13:59:700: MRU=1501,ACCM=655360,Auth=c023,MagicNumber=1220272450,PFC=ON,ACFC=ON
    [1072] 20:13:59:700: Send Framing = PPP Multilink,SSHF=OFF,MRRU=1524,LinkDiscrim=0
    [1072] 20:13:59:700: ED Class = 1, ED Value = 776463342d6c6e73310000000000000000000000
    [1072] 20:13:59:700: LCP Configured successfully
    ..
    ..
     
    Michael Shell, Jan 8, 2005
    #8
  9. You are correct, it is the call-back option.
    It's a MS thing (used google to find this):

    http://www.microsoft.com/resources/.../2003/all/techref/en-us/w2k3tr_dura_tools.asp

    Search the page for CBCP, read about the 6, then search for CBCP again
    for a section titled CBCP. You may be able to make more sense of it
    than I could. My read of it is that the 6 tells the peer that the
    request is to use MS CBCP to negotiate call-back after authentication.

    Pppd has an undocumented option named `callback' that might generate
    what the MS side of your host generated. But you'll likely have to
    edit pppd/Makefile in the pppd source, uncomment the second line below:

    # Enable Microsoft proprietary Callback Control Protocol
    #CBCP=y

    and recompile. I don't have it compiled into pppd here and so can't
    readily test the option. It may or may not take a call-back number
    as a value (callback <number>) - I'm not a PPP implementor and my C
    reading skill is low.
    I've been hoping that James Carlson would participate but his last
    post here was on Monday. He's a regular poster here and the one most
    likely to come up with an answer. It is indeed a "creepy" problem;
    I've never seen anything quite like it.

    BTW, your tenacity and your manual translation of raw hex to Linux
    PPP-ese are admirable. Also impressive is knowing what information
    could be useful in finding a solution to the problem and trying to
    imitate the MS requests before posting - sometimes that works.
     
    Clifford Kite, Jan 8, 2005
    #9
  10. Well, I finally solved the mystery - and it took some doing to uncover
    it. I used serial port sniffers under both the Linux (slsnif) and
    Windows to see exactly what each was sending over the link.

    Everything looked great with each PPP frame, which just deepened the
    riddle.

    I even made sure that Linux was using the exact same modem reset and
    initialization strings that MS Windows was using - to no avail. BTW,
    for future reference, Clifford's advice on the pppd code did indeed
    allow me to enable the <callback CBCP> option under Linux, but
    unfortunately this did not change anything either.

    I then noticed that the bad host was not even sending me a TermAck to my
    TermReq when I used control-c to prematurely shutdown a connect attempt.
    LCP Termination Requests are very simple packets - there isn't much that
    can go wrong with them.

    So I decided to take another look at my connect script. The final part
    of my chat script went like this:



    TIMEOUT 50 \
    SAY "\nWaiting for Connection..." \
    ECHO ON \
    "ONNEC" "\c" \
    "\n" "\r\n" \
    SAY "\nConnected!\n"



    What could possibly go wrong here you might ask? Plenty, if the PPP
    host has a framing parser so fragile that it cannot withstand a leading
    carriage return and/or line feed before the PPP negotiation sequence
    begins!

    That's right folks, the initial \r\n permanently broke the host's PPP
    frame receiver code! After that happens, you can send all the properly
    formed LCP packets you want and the host will never see any of them!
    But, it will continue to send out its own ConfReq's. The person who
    wrote that crappy PPP code outta be run out of town.

    The reason I even have this initial new line in there was that some
    time in the past, some ISP's PPP or login code would not "wake up"
    until it received a CR or LF after connect.

    For the record, you can't do any of these:


    "\n" "\r\c" \
    "\n" "\n\c" \
    "\n" "\r\n\c" \


    However, you can do these:


    "\n" "\c" \
    "\n" "\N\c" \
    "\n" "\s\c" \


    So, nulls and spaces don't hang the receiver, but CR and/or LF does.

    I decided to put in a little delay for good measure, so:


    "\n" "\p\c" \


    is what I use now and all is well.


    I am pretty sure this will help somebody out in the future. Can you
    imagine what those poor souls with modems that happen to output a
    spurious new line just after initial connect will go through?!

    I'd sure like to know the name and version of this fragile PPP
    software so that people can be warned about it. Geeezzzz.



    Thanks for all your help and advice,

    Mike Shell
     
    Michael Shell, Jan 11, 2005
    #10
  11. Typically, dial-in servers attempt to detect what protocol the peer is
    using automatically. If the server sees a carriage return, then it
    assumes that it's a human at a regular tty, not a machine using PPP.

    It's obviously not the best way to do things. A better way than this
    is to spit out a text message welcoming the user (which will just be
    discarded by any PPP-speaking peer), but _continuously_ look for PPP
    data on input and switch modes when appropriate, rather than switching
    on the first one or two characters. Doing that right takes a little
    more than a minute's thought, though, so it's often not done.

    Plus, there's the Windows-effect to consider: most ISP equipment these
    days is designed for the least-common-denominator. If it works with
    Windows DUN, then that's "good enough." It doesn't have to work well
    anywhere else.

    (The same is unfortunately true of a lot of consumer gear these days.)
    I don't think it's the server that's bad. The chat script was bad.
    Ask the ISP. But it's likely that it's one of the many commercial
    versions, and you just suffered from having a bad chat script.
     
    James Carlson, Jan 20, 2005
    #11

  12. That I can understand, but remember that the ISP in question continued
    to send valid PPP ConfReq requests, but ignored all my PPP ConfAck
    responses. Something is obviously broken on their end. There is no text
    based login with their system. The very reason ISPs went to pure PPP login
    and skipped the text based login altogether is because of the difficulties
    of handling tech support for all the other different types of
    login/Login/username, text based configurations. Going deaf after the first
    CRLF kind of defeats the purpose of the default PPP approach because it is,
    IMHO, unsafe to trust the first few characters after the initial connect -
    there is always the possibility that the client will still be chatting with
    the modem or the modem itself may issue a CR at first connect (I've never
    personally seen this, but it would not surprise me in the least if some
    modems did just that). The PPP protocol was designed to handle all types
    of these kinds of initial missteps.


    I agree that my end did something that it should not have. However, remember
    that the ISP continued to send valid ConfReq requests - and so this is a
    PPP protocol issue (because it happened within PPP negotiation) and I
    don't think the ISP is allowed to do this according to the PPP standards -
    invalid PPP data should be silently discarded and then one should resume
    scanning for valid PPP config requests - the latter of which was not done.



    Mike Shell
     
    Michael Shell, Jan 20, 2005
    #12
  13. If I remember correctly, your LCP negotiation started off strangely,
    with one side (probably theirs) suggesting an asyncmap (ACCM) of
    0xa0000, and the other (probably yours) suggesting 0. That's
    technically legal per RFC 1662, but is often in practice a good
    indicator of bugs in the peer implementation, and usually results in a
    failure to negotiate that's remarkably similar to what you saw.

    The fix is to add "asyncmap 0xa0000" to your configuration, after some
    obligatory swearing at the people who built the bad implementation.
    I'm pretty sure I know something like that.
    I'm not sure I understand what you're saying here, and I don't see any
    specific error that is directly traceable to a violation of any of the
    standards.

    If the other side cannot hear your side due to communications errors
    (which is what I expect is going on here during the failure scenario),
    then it rightly should continue sending the same Configure-Request
    messages at each Restart timer expiry until the restart limit is
    reached.

    There's no way that any of the PPP documents can require the peer to
    do what it is unable to do. If the packets are getting garbled in
    transit (which I expect is true, given the symptoms), there's not much
    the peer can do but allow the connection to fail and hope the human
    can fix things.

    Now if the peer is switching the ACCM too early (before LCP is in
    Opened state) or if the implementor confused the transmit and receive
    directions for the escaping logic (altogether *way* too common), then
    that's indeed an implementation bug. The real issue, though, is the
    lack of interoperability, not the conformance (or lack thereof) with
    respect to the standards.
     
    James Carlson, Jan 21, 2005
    #13


  14. See, this is what is so surprising about the whole thing and why
    nobody, including myself, suspected this type of bug triggered by
    the chat script. The comm link was/is fine and without error. However,
    when a leading CR and/or LF is sent to the host at the start of
    PPP negotiations, the host receiver will "lock-up" and never
    be able to see any of my PPP Conf Requests or Acks from that
    point on. However, the host will continue to transmit its own
    valid Conf Requests - indicating clearly that it is trying to
    establish a PPP connection. I can watch the whole thing unfold
    at the byte level using a serial line sniffer and I can
    reproduce the problem at will by sending one LF just prior to
    end of the chat script - as well as avoiding the problem and
    getting a good PPP negotiation by removing the spurious LF.

    I just know this bug is going to bite others and when it does
    it is a real bear to understand what the heck is going wrong.

    I did ask my ISP what software is being used, but it is
    unlikely that they'll ever tell me this. I'd sure like to
    know if anybody knows the make of crappy code that does
    this.



    Mike
     
    Michael Shell, Jan 21, 2005
    #14
  15. Michael Shell

    Bill Unruh Guest


    Seems to me that you have two options-- 1) Figure out what you can do to
    make the link work.
    2) Rant and rail against the rest of the world and how if only they did
    things better it would make it easier for you.


    That kind of bug is EXTREMELY common. Yes, there are some pretty bad
    programmers out there.
    As Carlson said, the reason may very very well have been that it demands an
    asyncmap of a000 and you did 0000. Nothing illegal, but it is very well
    known in the community that is a recipie for disaster. There is a badly
    written program out there ( by some organsation from Washington State) that
    breaks in that situation. Should it ? No. Does it? Yes. Should it be fixed?
    Yes. do you have the influence to get it done? Probably not.
    So, remove it.

    It has for many many years. So? Windows is set up not to trigger this bug.
    Do you think that MS is going to change things to make life for other
    operating systems easier? I could rather suspect it is there on purpose to
    make life as hard as possible for others.


    What code do you think most ISPs run?
     
    Bill Unruh, Jan 22, 2005
    #15
  16. On 22 Jan 2005 18:41:23 GMT

    I did and I am not. I simply wanted to track down the source of the
    problem for the benefit of future readers of this thread - a lesser poster
    would not have followed-up after he got his system working.


    This is a bit misleading because it implies a configuration problem
    with pppd rather than a *single* unescaped LF at the very start of
    PPP negotiation. If you had read the entire thread from the beginning
    before posting, you would have seen that a different pppd asymcmap
    setting was one of the very first things we checked, for on Jan 6th,
    Clifford Kite wrote:


    : It's not an ACCM problem. At this point ACCM has not been negotiated
    : and all Control Characters are escaped.


    Indeed, setting my asyncmap to match that asked of the host had no
    effect. Of course, the origin of my single unescaped LF was "outside"
    of pppd, but we would not expect this single rouge LF to hang the
    entire receiver of the host for the remainder of the call.


    Of course not.


    I agree that they often do just that, but somehow I don't feel that this
    is the case here. As I tried to point out several times, there might
    be some *hardware* running Windows that could be bitten by this bug.


    I don't know as after using dozens of dialup numbers from several
    different ISPs over half a decade with the exact same chat scripts, this
    is the first time I've run into a problem quite like this and it does not
    occur with the several other dialup numbers that I have tried - which is
    why I am/was so curious about it.

    Have some heart. It took me a lot of effort to track down this "simple"
    problem even with the generous help and advice of other posters; and I am
    not a newbie. I am sure that the info here will help somebody out of a
    jam in the future - maybe even somebody running MS Windows. ;)



    Mike
     
    Michael Shell, Jan 22, 2005
    #16
  17. Michael Shell

    Bill Unruh Guest

    I agree that one would not expect it. On the other hand, I spent some time
    trying to understand what actually happened in the real world when I wrote
    www.theory.physics.ubc.ca/ppp-linux.html
    I came to the conclusion that the ways ISPs had of screwing up were
    infinite. Most of them are ways that should not happen, that properly
    written/set up pppd's would not do those things, but nevertheless they
    did. Perhaps I am just too cynical.
     
    Bill Unruh, Jan 23, 2005
    #17
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.