Thanks Pete.
I changed the switch today and I'm still getting the bad checksums.
The switch is a 48 port HP Procurve 2650. The switch comes with two
diagnostic utilities; one is used for pinging and the other is used for
checking the link. Pinging works of course however, the link checking
utility fails all checks even if I try it 20 times. Apparently what
this utility does is send information out to a specific Mac address
then checks it when it comes back to see if it's the same information
that went out. In this case it's different everytime I tried it (40
times). Of course that's nothing new because we knew it from the
Ethereal capture but, this utility does this on the 2nd OSI layer. The
implication here is that there's something wrong with the cabling
despite what the cabling experts said.
What do you think?
Joe
Pete wrote:
> Hi,
>
> Look for a duplex mismatch - for example, if one end (say, the server) of a
> cable is configured to 100Mbps full duplex and the other end (the switch) to
> automatic, the automatic end will gear down to 10Mbps half duplex (many
> people miss the fact that configuring specific parameters always turn
> autonegotiation off - you MUST configure both ends identically or leave them
> on automatic). You will get an immense amount of checksum errors and TCP
> retransmits on traffic going from the full duplex end to the auto end -
> enough for the checksummming algorithm not being able to detect them all,
> letting occasional packets slip by with corrupted payload...
>
> If the auto end (or actually, the end that currently runs half duplex on a
> link where the other end runs full duplex) is manageable, you may see reports
> of a lot of "late collisions" as well.
>
> Also note that there are switches that have weird autonegotiation algorithms
> and sometimes create duplex/speed mismatch without showing it (3com,
> notably). A software upgrade usually cures that.
>
> Again: EACH end of a cable must be either set to "auto" or to the same speed
> and duplex, or BAD things will happen.
>
> This can be especially tricky when you have fiber media converters between
> switches in the network. They are often configured to negotiate even though
> you WILL get severe problems if both ends don't happen to land on the same
> parameters.
>
> Later,
> Pete
> --
> Peter Josefson, Specialisthuset, Sweden
>
>
> "CoolHandJoe" wrote:
>
> > Hi all
> >
> > I've been troubleshooting a problem a client of mine is having that had
> > practically disabled the entire network. It was all at random they
> > would lose connection to the server and/or connection to the internet,
> > also they started having corrupt pst files while using roaming
> > profiles. Now, I've since discovered that the problem was actually due
> > to at least two different things one of which was a rogue router with
> > the same address as the router they actually use that had DHCP turned
> > on but DHCP was being handled by the server. However, all the clients
> > were receiving the proper information for DNS so we couldn't tell until
> > we entered the IP address of the main router (Linksys) and got a prompt
> > for a D-Link DI-604. It was obvious after that. Anyway, there is still
> > corruption going on and people getting disconnected from the server and
> > losing information. There are no errors anywhere and roaming profiles
> > have been disabled so that corruption doesn't occur to files being
> > copied. We've had professionals come in and test the cables and they're
> > all fine, I installed a new network card in the server though there
> > wasn't anything to indicate a problem with the first one. Throughout
> > this process I ran Ethereal a few times and there were always many many
> > errors about bad checksums from the server to any of the machines that
> > were trying to transfer information but it wasn't any computer in
> > particular it was any computer. I want to include that there were many
> > dropped packets in some cases it was 9500 of 10000 that got dropped but
> > they were all the ones using smb. So, just a quick recap, from computer
> > to server no errors from server to computer error. I suspect that's the
> > server telling the computer that the received data is corrupt. I'm
> > getting them a new switch for Monday and I've done some performance
> > monitoring that doesn't look out of place compared to the baseline and
> > if anything it's lower across the board because of the disabled roaming
> > profiles.
> > Anyone have any idea where I should look and what I can do to hunt down
> > the source of this problem?
> >
> > Joe
> >
> >
|