Networking Forums

Networking Forums > Computer Networking > Linux Networking > linux crashing - possible e1000 driver?

Reply
Thread Tools Display Modes

linux crashing - possible e1000 driver?

 
 
Jason Keltz
Guest
Posts: n/a

 
      03-02-2006, 04:22 PM
I posted the message below to linux.kernel, but I realize that it may be
more appropriate for this forum...

-----

We have an Intel SE7501WV2A system running Linux kernel 2.4.32 that is
crashing between every 1-4 days. The system has two on-board Intel
PRO/1000 MT Server Network Connections (Intel 82546EB Controller) that
are both in use. We did extensive hardware diagnostics on the machine,
and came up with no hardware errors. We got a similarly configured
Dell Precision WorkStation 450, which has an Intel 7505 chipset, and
used that as a temporary replacement for the 7501 box while we were
doing hardware testing. To our surprise, the Dell Precision box
crashed as well. We allowed the problem to happen a few more times on
the Dell Precision box just so that we could be sure.

We have a totally different Dell PowerEdge 1750 box (with Dell Intel
ServerWorks GC LE chipset) that is configured almost identically to the
above two machines, but does not have the Intel e1000 gigabit on-board.
Instead, that system has dual on-board Broadcom gigabit using the tg3
driver. This box which has the same role at the original box (user
time sharing server) is not crashing on us at all, and more often than
not has a much higher load than the 7501 box.

When I say crash, I mean that logins into the box hang, the console
displays a black screen. However, interestingly enough, the machine
remains pingable, and an nmap on the machine reveals the ports that it
provides services on. The machine will answer on those ports, but the
services are not available. We also have the contents of a "ps" that
is going to a file once every minute, in order to try to help us solve
this problem, and that activity stops. Activity prior to the crash is
generally minimal. A serial console displays nothing until the machine
is rebooted.

The one similar thing between the crashing machines is the fact that
they both have the Intel on-board gigabit controller. I saw a few
posts on the web from people talking about having the same symptoms
(crashing Linux) when using, in particular, the e1000 module with
on-board Intel nics. However, in the few cases that I found, the users
claimed that by upgrading the e1000 module, their mysterious crashes
went away. The version of the e1000 driver that comes with Linux
2.4.32 is an older version - 5.7.6-k1-NAPI. I've compiled 6.3.9-NAPI
from sourceforge, and put that in place on our server after the last
crash. Our server lasted 4 days, and then crashed again. After that,
it only lasted an additional 2 days and crashed again.

I cannot guarantee that the problem is related to the e1000 module.
I'm just very suspicious of that fact. I have no way of getting into
the system when it "crashes". The ordeal is rather frustrating!
There also doesn't seem to be much in Linux in terms of generation of
kernel dumps. I enabled the sysrq sequence on my kernel, and was
hoping to be able to use the "c" command to crash the kernel, and get a
dump that I could use to check out what is going on, but pretty much
every kernel dump facility that I've read about seems to work with 2.6
only or older versions of 2.4! Further,
the "c" option that I keep reading about doesn't seem to exist in
2.4.32, or 2.6... (I think it could be a redhat mod?)

Anyhow -- does anyone have any ideas on how we might go about
diagnosing this problem? I have contacted Intel via mail for
suggestions and haven't heard back anything yet.

Thanks,

Jason.
 
Reply With Quote
 
 
 
 
Vishwas Pai
Guest
Posts: n/a

 
      03-03-2006, 04:51 AM
Jason Keltz wrote:
>
> When I say crash, I mean that logins into the box hang, the console
> displays a black screen. However, interestingly enough, the machine
> remains pingable,


If it is pingable and you are using e1000 for networking
, then it may not be e1000 driver issue? Also ,it may
not be a kernel hang - see if some application is using
too much CPU/MEM. Try setting global virtual memory limits
, so that you get some time to debug if some application
starts leaking memory.

HTH --vishwas
 
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
e1000 driver Ralph Spitzner Linux Networking 12 03-14-2012 07:33 AM
e1000 driver -- NAPI --- How to count number of packets fetched? Azeem Linux Networking 0 04-15-2009 12:04 PM
e1000 driver won't load at boot, but loads OK manually. Why? stevenb9643 Linux Networking 4 03-30-2007 10:52 PM
e1000 driver Benoit LEROYER Linux Networking 3 08-22-2005 06:24 PM
Intel E1000/MT driver won't load with 2.4.20 (smp) kernel? Steve Wampler Linux Networking 1 12-02-2004 04:31 PM



1 2 3 4 5 6 7 8 9 10 11