|
||||||||
|
|
|||||||
![]() |
|
|
Thread Tools | Display Modes |
|
#1
|
|
Hi people...
I have kernel 2.6.18.1 running very stably - it runs for days without problem. I wanted to upgrade to 2.6.19 or .20, and have similar problems with both. After a few hours of work, eth1 stops working, and no amount of rmmod or ifconfig can restart it. Until now, the only way I found, was to reboot. Net card is a Winbond W89C940, works perfectly under 2.6.18.1. Driver is ne2k-pci/8390. The messages I found in dmesg are: NETDEV WATCHDOG: eth1: transmit timed out eth1: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=282. NETDEV WATCHDOG: eth1: transmit timed out eth1: Tx timed out, lost interrupt? TSR=0x3, ISR=0x2, t=235. As I mentioned above, everything works perfectly under 2.6.18.1 for weeks even. With bot 2.6.19.2 and 2.6.20.1, after a few hours eth1 quietly freezes, all the rest continues to work ok. Strangely enough I found several references to this problem with older kernels (2.2 and 2.4), none for 2.6. Suggestions please. John john.coppens@gmail.com |
|
#2
|
|||
|
|||
|
On 3 Mar 2007, in the Usenet newsgroup comp.os.linux.networking, in article
<(E-Mail Removed). com>, (E-Mail Removed) wrote: >I have kernel 2.6.18.1 running very stably - it runs for days without >problem. I wanted to upgrade to 2.6.19 or .20, and have similar >problems with both. After a few hours of work, eth1 stops working, >and no amount of rmmod or ifconfig can restart it. Until now, >the only way I found, was to reboot. Are these kernels that you've compiled yourself, rather than kernels from some distribution? From kernel.org: [compton ~]$ grep 2.6.18[.]*[0-9]*.tar.gz$ temp/new/kernel/2.6-ls | cut -b44- | column Sep 20 03:56 linux-2.6.18.tar.gz Dec 02 00:21 linux-2.6.18.5.tar.gz Oct 14 06:01 linux-2.6.18.1.tar.gz Dec 17 00:28 linux-2.6.18.6.tar.gz Nov 04 01:43 linux-2.6.18.2.tar.gz Feb 20 06:52 linux-2.6.18.7.tar.gz Nov 19 03:38 linux-2.6.18.3.tar.gz Feb 23 23:55 linux-2.6.18.8.tar.gz Nov 29 19:38 linux-2.6.18.4.tar.gz [compton ~]$ grep 2.6.19[.]*[0-9]*.tar.gz$ temp/new/kernel/2.6-ls | cut -b44- | column Nov 29 22:20 linux-2.6.19.tar.gz Feb 20 06:51 linux-2.6.19.4.tar.gz Dec 11 19:40 linux-2.6.19.1.tar.gz Feb 24 00:29 linux-2.6.19.5.tar.gz Jan 10 19:50 linux-2.6.19.2.tar.gz Mar 03 01:06 linux-2.6.19.6.tar.gz Feb 05 16:36 linux-2.6.19.3.tar.gz Mar 03 05:29 linux-2.6.19.7.tar.gz [compton ~]$ grep 2.6.20[.]*[0-9]*.tar.gz$ temp/new/kernel/2.6-ls | cut -b44- | column Feb 04 18:59 linux-2.6.20.tar.gz Feb 20 06:49 linux-2.6.20.1.tar.gz [compton ~]$ >Net card is a Winbond W89C940, works perfectly under 2.6.18.1. >Driver is ne2k-pci/8390. >The messages I found in dmesg are: > >NETDEV WATCHDOG: eth1: transmit timed out >eth1: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=282. Some of the ChangeLog files are getting larger than the original 2.0.0 kernel tarball, but have you looked through them? I've seen something recently (maybe the past ten days?) about dynamic interrupt problems, but can't remember where. What does the interrupts list look like before/after the problem? It may also have been in the ChangeLogs for the 2.6.21 candidates as well (/pub/linux/kernel/v2.6/testing/) - sorry I can't be more specific. Old guy |
|
#3
|
|||
|
|||
|
On Mar 4, 3:54 pm, ibupro...@painkiller.example.tld (Moe Trin) wrote:
> On 3 Mar 2007, in the Usenet newsgroup comp.os.linux.networking, in article > <1172984031.217986.65...@p10g2000cwp.googlegroups. com>, john.copp...@gmail.com > wrote: > > >I have kernel 2.6.18.1 running very stably - it runs for days without > >problem. I wanted to upgrade to 2.6.19 or .20, and have similar > >problems with both. After a few hours of work, eth1 stops working, > >and no amount of rmmod or ifconfig can restart it. Until now, > >the only way I found, was to reboot. > > Are these kernels that you've compiled yourself, rather than kernels > from some distribution? From kernel.org: Hi Moe, Thanks for the reply! Yes - I always compile my kernels (originals from kernel.org) The distro is mostly Slackware 11.0, but I do update programs as necessary. > >NETDEV WATCHDOG: eth1: transmit timed out > >eth1: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=282. > > Some of the ChangeLog files are getting larger than the original 2.0.0 > kernel tarball, but have you looked through them? I've seen something > recently (maybe the past ten days?) about dynamic interrupt problems, but > can't remember where. What does the interrupts list look like before/after > the problem? It may also have been in the ChangeLogs for the 2.6.21 > candidates as well (/pub/linux/kernel/v2.6/testing/) - sorry I can't be > more specific. I've been reading some of the mail on the dynamic interrupts, but it doesn't seem applicable in this case. I did notice that one of the first items mentioned in version 2.6.19 is the removal of the interrupt table. I don't really know what that means though. It does seem to be a rather large modification to the interrupt system. I just recompiled 2.6.20.1, and set the timer back to 250Hz instead of 1000Hz, as this was the only difference I could imagine had any effect. For the rest all hardware is the same (the Core sensors were added to this version, but I don't think that makes any difference?) The /proc/interrupts table is exactly the same before and after the event (except for the incremente 'MIS' counter): Before: CPU0 CPU1 0: 20404 735 XT-PIC-XT timer 1: 209 9 IO-APIC-edge i8042 6: 0 3 IO-APIC-edge floppy 7: 465 18042 IO-APIC-edge parport0 9: 0 0 IO-APIC-fasteoi acpi 12: 87 16 IO-APIC-edge i8042 14: 102 8 IO-APIC-edge ide0 15: 30 32 IO-APIC-edge ide1 16: 0 3 IO-APIC-fasteoi ohci1394 17: 4438 3 IO-APIC-fasteoi libata, HDA Intel 18: 19 1 IO-APIC-fasteoi libata, ehci_hcd:usb2 19: 6777 1 IO-APIC-fasteoi eth0 20: 27 1 IO-APIC-fasteoi eth1 21: 0 1 IO-APIC-fasteoi ohci_hcd:usb1 22: 0 0 IO-APIC-fasteoi saa7130[0] NMI: 0 0 LOC: 21047 21049 ERR: 1 MIS: 0 After: CPU0 CPU1 0: 1271050 121694 XT-PIC-XT timer 1: 968 9 IO-APIC-edge i8042 6: 0 3 IO-APIC-edge floppy 7: 121424 1268688 IO-APIC-edge parport0 9: 0 0 IO-APIC-fasteoi acpi 12: 187823 16 IO-APIC-edge i8042 14: 4658 8 IO-APIC-edge ide0 15: 30 32 IO-APIC-edge ide1 16: 0 3 IO-APIC-fasteoi ohci1394 17: 10447 15857 IO-APIC-fasteoi libata, HDA Intel 18: 19 1 IO-APIC-fasteoi libata, ehci_hcd:usb2 19: 583328 1 IO-APIC-fasteoi eth0 20: 152992 1 IO-APIC-fasteoi eth1 21: 0 1 IO-APIC-fasteoi ohci_hcd:usb1 22: 0 0 IO-APIC-fasteoi saa7130[0] 23: 468394 1 IO-APIC-fasteoi nvidia NMI: 0 0 LOC: 1392690 1392692 ERR: 1 MIS: 1 (Note: the nvidia wasn't present before, as I hadn't entered X yet, so the module hadn't loaded yet). This time, it took about an hour to hang. I'm considering the possibility this happens when my dsl line carrier disappears, though I don't know why it doesn't in 2.6.18 Thanks - and still open for suggestions! John |
|
#4
|
|||
|
|||
|
On 4 Mar 2007, in the Usenet newsgroup comp.os.linux.networking, in article
<(E-Mail Removed) .com>, (E-Mail Removed) wrote: >ibupro...@painkiller.example.tld (Moe Trin) wrote: >> john.copp...@gmail.com wrote: >>>I have kernel 2.6.18.1 running very stably - it runs for days without >>>problem. I wanted to upgrade to 2.6.19 or .20, and have similar >>>problems with both. -rw-rw-r-- 1 536 536 53944 Oct 14 06:01 ChangeLog-2.6.18.1 -rw-r--r-- 1 536 536 56117 Nov 04 01:32 ChangeLog-2.6.18.2 -rw-r--r-- 1 536 536 19448 Nov 19 03:38 ChangeLog-2.6.18.3 -rw-r--r-- 1 536 536 646 Nov 29 19:26 ChangeLog-2.6.18.4 -rw-rw-r-- 1 536 536 3816102 Nov 29 22:11 ChangeLog-2.6.19 but looking in the testing directory, 2.6.19 branched back on October 5, so everything _should_ show (yeah, I know) in the 2.6.19 ChangeLog. >I've been reading some of the mail on the dynamic interrupts, but it >doesn't seem applicable in this case. My thought was the IRQ being moved about - but your results don't indicate that. >I just recompiled 2.6.20.1, and set the timer back to 250Hz instead of >1000Hz, as this was the only difference I could imagine had any effect. That is another possibility. The ne2k-pci isn't the most efficient card in the world. What rate were you using on the 2.6.18.1 kernel? Old guy |
|
#5
|
|||
|
|||
|
On Mar 5, 4:55 pm, ibupro...@painkiller.example.tld (Moe Trin) wrote:
> >I just recompiled 2.6.20.1, and set the timer back to 250Hz instead of > >1000Hz, as this was the only difference I could imagine had any effect. > > That is another possibility. The ne2k-pci isn't the most efficient card > in the world. What rate were you using on the 2.6.18.1 kernel? Hi Moe. I know about the 'oldness' of the card, but it works fine under 2.6.18.1 ;-) (Still does. I can't really leave the machine on 2.6.19/20, because the card manages the DSL line, and else my wife complains that internet disappears.) The rate on the 2.6.18 kernel was 250 Hz, so both are now the same. (I wanted to switch to 1kHz because a MIDI sequencer complained about lack of resolution.) Cheers, John |
|
#6
|
|||
|
|||
|
On Mar 5, 4:55 pm, ibupro...@painkiller.example.tld (Moe Trin) wrote:
> >I just recompiled 2.6.20.1, and set the timer back to 250Hz instead of > >1000Hz, as this was the only difference I could imagine had any effect. > > That is another possibility. The ne2k-pci isn't the most efficient card > in the world. What rate were you using on the 2.6.18.1 kernel? Hi Moe. I know about the 'oldness' of the card, but it works fine under 2.6.18.1 ;-) (Still does. I can't really leave the machine on 2.6.19/20, because the card manages the DSL line, and else my wife complains that internet disappears.) The rate on the 2.6.18 kernel was 250 Hz, so both are now the same. (I wanted to switch to 1kHz because a MIDI sequencer complained about lack of resolution.) Cheers, John |
|
#7
|
|||
|
|||
|
On Mar 5, 4:55 pm, ibupro...@painkiller.example.tld (Moe Trin) wrote:
> >I just recompiled 2.6.20.1, and set the timer back to 250Hz instead of > >1000Hz, as this was the only difference I could imagine had any effect. > > That is another possibility. The ne2k-pci isn't the most efficient card > in the world. What rate were you using on the 2.6.18.1 kernel? Hi Moe. I know about the 'oldness' of the card, but it works fine under 2.6.18.1 ;-) (Still does. I can't really leave the machine on 2.6.19/20, because the card manages the DSL line, and else my wife complains that internet disappears.) The rate on the 2.6.18 kernel was 250 Hz, so both are now the same. (I wanted to switch to 1kHz because a MIDI sequencer complained about lack of resolution.) Cheers, John |
|
#8
|
|||
|
|||
|
Sorry about the multiple postings. I was using the Google interface to
the group, and I hadn't noticed that in some cases, a simple 'Reload' of the page causes a repost of the last message! Again, apologies! John |
|
#9
|
|||
|
|||
|
On 5 Mar 2007, in the Usenet newsgroup comp.os.linux.networking, in article
<(E-Mail Removed) .com>, (E-Mail Removed) wrote: >I know about the 'oldness' of the card, but it works fine under >2.6.18.1 ;-) (Still does. I can't really leave the machine on >2.6.19/20, because the card manages the DSL line, and else my wife >complains that internet disappears.) Yeah, well, there is _that_ problem as well ;-) My comment was directed to the fact that increasing the interrupt rate may start over-running things. Certainly it may be cheaper to buy a modern card that doesn't have the problem. >The rate on the 2.6.18 kernel was 250 Hz, so both are now the same. >(I wanted to switch to 1kHz because a MIDI sequencer complained about >lack of resolution.) You may want to look at the 2.6.21.rc* ChangeLog files just the same, as there were a number of reversions where some things were backed out after discovering that an improvement in one point broke something else in another place. Old guy` |
|
#10
|
|||
|
|||
|
> You may want to look at the 2.6.21.rc* ChangeLog files just the same, > as there were a number of reversions where some things were backed out > after discovering that an improvement in one point broke something > else in another place. Hi Moe... Few takers on this problem it seems. I'll download the x21 Changelogs, but I'm hopeful the problem is solved: I upgraded the BIOS of the MoBo, after I noticed a very remotely similar problem report with another MSI board. When I connected to MSI site, there was a fresh version - just a month old. Call it coincidence... I don't like to upgrade BIOSes without motive, but, on the other hand, there have been a couple of updates, and something must be hidden behind the reason: the site says: reason of the update: 'Upgrade'... I've been running 2.6.20.1 for about 9 hours now - a record! Thanks for the suggestions. John |
![]() |
| Tags |
| kernel, transmit timed out |
| Thread Tools | |
| Display Modes | |
|
|