Networking Forums

Networking Forums > Computer Networking > Windows Networking > Mysterious problem on my client's Windows Server 2003 Network

Reply
Thread Tools Display Modes

Mysterious problem on my client's Windows Server 2003 Network

 
 
CoolHandJoe
Guest
Posts: n/a

 
      11-12-2006, 01:47 AM
Hi all

I've been troubleshooting a problem a client of mine is having that had
practically disabled the entire network. It was all at random they
would lose connection to the server and/or connection to the internet,
also they started having corrupt pst files while using roaming
profiles. Now, I've since discovered that the problem was actually due
to at least two different things one of which was a rogue router with
the same address as the router they actually use that had DHCP turned
on but DHCP was being handled by the server. However, all the clients
were receiving the proper information for DNS so we couldn't tell until
we entered the IP address of the main router (Linksys) and got a prompt
for a D-Link DI-604. It was obvious after that. Anyway, there is still
corruption going on and people getting disconnected from the server and
losing information. There are no errors anywhere and roaming profiles
have been disabled so that corruption doesn't occur to files being
copied. We've had professionals come in and test the cables and they're
all fine, I installed a new network card in the server though there
wasn't anything to indicate a problem with the first one. Throughout
this process I ran Ethereal a few times and there were always many many
errors about bad checksums from the server to any of the machines that
were trying to transfer information but it wasn't any computer in
particular it was any computer. I want to include that there were many
dropped packets in some cases it was 9500 of 10000 that got dropped but
they were all the ones using smb. So, just a quick recap, from computer
to server no errors from server to computer error. I suspect that's the
server telling the computer that the received data is corrupt. I'm
getting them a new switch for Monday and I've done some performance
monitoring that doesn't look out of place compared to the baseline and
if anything it's lower across the board because of the disabled roaming
profiles.
Anyone have any idea where I should look and what I can do to hunt down
the source of this problem?

Joe

 
Reply With Quote
 
 
 
 
Pete
Guest
Posts: n/a

 
      11-13-2006, 05:37 PM
Hi,

Look for a duplex mismatch - for example, if one end (say, the server) of a
cable is configured to 100Mbps full duplex and the other end (the switch) to
automatic, the automatic end will gear down to 10Mbps half duplex (many
people miss the fact that configuring specific parameters always turn
autonegotiation off - you MUST configure both ends identically or leave them
on automatic). You will get an immense amount of checksum errors and TCP
retransmits on traffic going from the full duplex end to the auto end -
enough for the checksummming algorithm not being able to detect them all,
letting occasional packets slip by with corrupted payload...

If the auto end (or actually, the end that currently runs half duplex on a
link where the other end runs full duplex) is manageable, you may see reports
of a lot of "late collisions" as well.

Also note that there are switches that have weird autonegotiation algorithms
and sometimes create duplex/speed mismatch without showing it (3com,
notably). A software upgrade usually cures that.

Again: EACH end of a cable must be either set to "auto" or to the same speed
and duplex, or BAD things will happen.

This can be especially tricky when you have fiber media converters between
switches in the network. They are often configured to negotiate even though
you WILL get severe problems if both ends don't happen to land on the same
parameters.

Later,
Pete
--
Peter Josefson, Specialisthuset, Sweden


"CoolHandJoe" wrote:

> Hi all
>
> I've been troubleshooting a problem a client of mine is having that had
> practically disabled the entire network. It was all at random they
> would lose connection to the server and/or connection to the internet,
> also they started having corrupt pst files while using roaming
> profiles. Now, I've since discovered that the problem was actually due
> to at least two different things one of which was a rogue router with
> the same address as the router they actually use that had DHCP turned
> on but DHCP was being handled by the server. However, all the clients
> were receiving the proper information for DNS so we couldn't tell until
> we entered the IP address of the main router (Linksys) and got a prompt
> for a D-Link DI-604. It was obvious after that. Anyway, there is still
> corruption going on and people getting disconnected from the server and
> losing information. There are no errors anywhere and roaming profiles
> have been disabled so that corruption doesn't occur to files being
> copied. We've had professionals come in and test the cables and they're
> all fine, I installed a new network card in the server though there
> wasn't anything to indicate a problem with the first one. Throughout
> this process I ran Ethereal a few times and there were always many many
> errors about bad checksums from the server to any of the machines that
> were trying to transfer information but it wasn't any computer in
> particular it was any computer. I want to include that there were many
> dropped packets in some cases it was 9500 of 10000 that got dropped but
> they were all the ones using smb. So, just a quick recap, from computer
> to server no errors from server to computer error. I suspect that's the
> server telling the computer that the received data is corrupt. I'm
> getting them a new switch for Monday and I've done some performance
> monitoring that doesn't look out of place compared to the baseline and
> if anything it's lower across the board because of the disabled roaming
> profiles.
> Anyone have any idea where I should look and what I can do to hunt down
> the source of this problem?
>
> Joe
>
>

 
Reply With Quote
 
CoolHandJoe
Guest
Posts: n/a

 
      11-17-2006, 12:11 AM
Thanks Pete.

I changed the switch today and I'm still getting the bad checksums.
The switch is a 48 port HP Procurve 2650. The switch comes with two
diagnostic utilities; one is used for pinging and the other is used for
checking the link. Pinging works of course however, the link checking
utility fails all checks even if I try it 20 times. Apparently what
this utility does is send information out to a specific Mac address
then checks it when it comes back to see if it's the same information
that went out. In this case it's different everytime I tried it (40
times). Of course that's nothing new because we knew it from the
Ethereal capture but, this utility does this on the 2nd OSI layer. The
implication here is that there's something wrong with the cabling
despite what the cabling experts said.
What do you think?

Joe


Pete wrote:
> Hi,
>
> Look for a duplex mismatch - for example, if one end (say, the server) of a
> cable is configured to 100Mbps full duplex and the other end (the switch) to
> automatic, the automatic end will gear down to 10Mbps half duplex (many
> people miss the fact that configuring specific parameters always turn
> autonegotiation off - you MUST configure both ends identically or leave them
> on automatic). You will get an immense amount of checksum errors and TCP
> retransmits on traffic going from the full duplex end to the auto end -
> enough for the checksummming algorithm not being able to detect them all,
> letting occasional packets slip by with corrupted payload...
>
> If the auto end (or actually, the end that currently runs half duplex on a
> link where the other end runs full duplex) is manageable, you may see reports
> of a lot of "late collisions" as well.
>
> Also note that there are switches that have weird autonegotiation algorithms
> and sometimes create duplex/speed mismatch without showing it (3com,
> notably). A software upgrade usually cures that.
>
> Again: EACH end of a cable must be either set to "auto" or to the same speed
> and duplex, or BAD things will happen.
>
> This can be especially tricky when you have fiber media converters between
> switches in the network. They are often configured to negotiate even though
> you WILL get severe problems if both ends don't happen to land on the same
> parameters.
>
> Later,
> Pete
> --
> Peter Josefson, Specialisthuset, Sweden
>
>
> "CoolHandJoe" wrote:
>
> > Hi all
> >
> > I've been troubleshooting a problem a client of mine is having that had
> > practically disabled the entire network. It was all at random they
> > would lose connection to the server and/or connection to the internet,
> > also they started having corrupt pst files while using roaming
> > profiles. Now, I've since discovered that the problem was actually due
> > to at least two different things one of which was a rogue router with
> > the same address as the router they actually use that had DHCP turned
> > on but DHCP was being handled by the server. However, all the clients
> > were receiving the proper information for DNS so we couldn't tell until
> > we entered the IP address of the main router (Linksys) and got a prompt
> > for a D-Link DI-604. It was obvious after that. Anyway, there is still
> > corruption going on and people getting disconnected from the server and
> > losing information. There are no errors anywhere and roaming profiles
> > have been disabled so that corruption doesn't occur to files being
> > copied. We've had professionals come in and test the cables and they're
> > all fine, I installed a new network card in the server though there
> > wasn't anything to indicate a problem with the first one. Throughout
> > this process I ran Ethereal a few times and there were always many many
> > errors about bad checksums from the server to any of the machines that
> > were trying to transfer information but it wasn't any computer in
> > particular it was any computer. I want to include that there were many
> > dropped packets in some cases it was 9500 of 10000 that got dropped but
> > they were all the ones using smb. So, just a quick recap, from computer
> > to server no errors from server to computer error. I suspect that's the
> > server telling the computer that the received data is corrupt. I'm
> > getting them a new switch for Monday and I've done some performance
> > monitoring that doesn't look out of place compared to the baseline and
> > if anything it's lower across the board because of the disabled roaming
> > profiles.
> > Anyone have any idea where I should look and what I can do to hunt down
> > the source of this problem?
> >
> > Joe
> >
> >


 
Reply With Quote
 
Pete
Guest
Posts: n/a

 
      11-17-2006, 09:05 AM
Hi Joe,

I've never tried these utilities even though I have customers with 2650's,
so I can't tell how reliable they are or what they do in detail. However, it
definitiely sounds like cabling problem, OR a speed/duplex mismatch, OR a
faulty network card in the server.

Again (can't stress this enough), make sure BOTH ENDS are configured to
auto-negotiate speed and duplex OR to specific (same) values. If one end
(say, the server) is configured to 100full and the switch to auto, the switch
will speak 10half and you'll have the kind of problems you see.

Please note that although this is the most common cause for packet loss on
the Internet (80% according to Cisco), even experts are sometimes unaware of
what I wrote in my previous post - if one end is hardcoded and the other
automatic, the automatic end WILL run 10half. You'll even find Microsoft KB
articles recommending you to configure a server to 100full (without
mentioning the switch)...

Later,
Pete
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Migrating to Windows Server 2003 from Windows Server 2000 and using Remote Desktop Client Navodit Windows Networking 1 09-13-2006 07:38 PM
problem: windows server 2003 not sharing/accepting network host names from windows xp Samuel Proulx Windows Networking 2 07-20-2006 11:40 AM
Client performance problem windows 2003 server... fc9a9f82-2129692850@news.postalias Windows Networking 17 08-04-2005 07:56 AM
Conecting to Windows 2003 Server via DOS Network Client Diskettes Systems Engineer Windows Networking 3 06-29-2005 07:48 PM
Windows 2000 client can't map network drive on windows server 2003 John Xie Windows Networking 1 05-31-2005 04:07 PM



1 2 3 4 5 6 7 8 9 10 11