> I'm assuming the echo-replys are at least going out the correct interface?
As far as I could tell, which now we've moved the .91 machine back "out" of
the linux router's protection so it will be a little harder to test with it,
but I do have a couple other machines still "in" the protection.
> If so, whats the next hop, if any? If not does arp report the correct
> address? Even if it is hopping a router, is that next hop pingable /
correct
> mac address?
when I was pinging from .91 to .80 (one of the trouble routers) and saw the
req go across the linux router and the reply come back to the linux router,
the next hop to the .91 box from the linux router was across a little
netgear switch/hub to the .91 machine. I'm sure the hub/switch works fine
because I've used it for some time now without any event.
> Also, what network is this .91 on? And the router is .101, same network?
and
> what is the ip of the router on the /25 net?
this is all on a public network. x.x.x.0/25 (.1 - .127)
The gateway out of this network is x.x.x.1, the linux router is .101 and
using proxy arp and ip forwarding w/ route entries routes for .91 - .100
One of the routers that was failing was .80, another one .11. In all cases
getting them to talk to .91 is the goal, as that hosts the web sites and
email. Last night .11 was failing intermitently and then suddenly cleared
up. This morning .80 started this same trash. That's when I pinged from
..91 => .80 and noticed both packets crossing the linux router as expected
but the .91 box didn't seem to see the replies (and didn't have tcpdump on
it for me to double check). I ran out of time to check this out and had to
move .91 back outside the protected zone. After moving it out ping worked
fine. While the ping was failing with .80 all the rest of the web seemed
fine. HTTP requests were coming across the linux router and getting
answered without any problems. Same with the other services running on .91,
except those coming from .80.
> This is definatly a routing issue, perhaps you should post a watered-down
> route -n and arp
# arp -n
Address HWtype HWaddress Flags Mask Iface
x.x.x.91 ether 00:10

C

F:6C:48 C eth0
x.x.x.94 ether 00:0C:29

9:BA:A8 C eth1
x.x.x.1 ether 00:04:27:4C:BA:E1 C eth0
x.x.x.100 ether 00:11:2F:15:23:76 C eth1
x.x.x.96 ether 00:0C:29

9:BA:A8 C eth1
as you can see, .91 is now on eth0 (the "public" interface)
# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
x.x.x.95 0.0.0.0 255.255.255.255 UH 0 0 0 eth1
x.x.x.94 0.0.0.0 255.255.255.255 UH 0 0 0 eth1
x.x.x.98 0.0.0.0 255.255.255.255 UH 0 0 0 eth1
x.x.x.99 0.0.0.0 255.255.255.255 UH 0 0 0 eth1
x.x.x.96 0.0.0.0 255.255.255.255 UH 0 0 0 eth1
x.x.x.97 0.0.0.0 255.255.255.255 UH 0 0 0 eth1
x.x.x.100 0.0.0.0 255.255.255.255 UH 0 0 0 eth1
x.x.x.0 0.0.0.0 255.255.255.128 U 0 0 0 eth0
x.x.x.0 0.0.0.0 255.255.255.128 U 0 0 0 eth1
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
0.0.0.0 x.x.x.1 0.0.0.0 UG 1 0 0 eth0
And here's the route table as it currently stands. About the only
abnormality I can see is the .0/25 network being on both interfaces. Should
the eth1 interface's entry for this net be deleted since technically the
whole netork (minus .94 - .100) is actually on eth0?
Thanks for your help!