Michael Graf <(E-Mail Removed)> wrote:
> I have a fairly clean install of redhat 9 running squid for a proxy
> serving approximately 800 people.
>
> At different intervals (every 5 minutes - every couple hours) this
> machine will send a burst of arp requests (a couple hundred up to
> several thousand) onto the network searching for _external_ IP
> addresses. The default gateway and subnet mask are set correctly,
The behaivour would seem to indicate that its thinking that particular
machines are local (and hence doing ARP), which would suggest strongly
that it's not going via the gateway.
Assuming that the gateway is indeed correct, I would have to assume that
there was a routing error of some such.
The fact that it is sending out such a large number of these ARP
requests is disturbing however.
> the arp table does not contain any external entries, but for some
> reason we keep getting this behavior.
Does it even list them as "(incomplete)"?
I suggest you do the following.
use 'tcpdump -w arp.pcap proto arp' to take a capture of of the traffic.
Take it over the course of say a day so you can start te got meaningful
data you can analyse post-capture in ethereal (it has some useful things
in the Tools menu you might use).
Using that capture, build up a table of requested IP addresses that
are querying for external IPs. This would be most easily accomplished
using Ethereals display filters.
Do a reverse-lookup on those addresses, and see if you can spot a
pattern.
Note that Squid does lookup various well-known DNS names when it start
(and maybe every-so-often later on?) to verify that DNS is working
correctly. These servers include IBM, Yahoo etc.
> /etc/sysconfig/network-scripts/ifcfg-eth1
Where's your eth0 in all this?
Other important things to think about.
Does the problem go away when you stop squid? (You'll need to do this
when no-one is using it).
Does the problem have a periodicity to it? This is most easily achived
by graphing (arp requests per minute and seeing of there is a period to
these occurances.) This can be the easiest way to track down annoying
things (I've done that myself not so long ago when investigating
load-spikes on my server).
Does the time between these storms happening vary with the usage of the
server/load of the network?
If you study a packet capture just preceding say, 5 or 10 of these
events, can you spot something that might be a trigger?
Are these arp requests being repeated, or do they seem to be fairly
unique?
Take a listing of the currently running processes when the event
happens. Can you see any process that may be running more than others
at each time?
ARP requests are almost never made directly by a process, they are done
by the kernel on behalf of a process. You can tell (with a bit of work),
what program is causing these ARP requests by having a look at the
requested address (the Destination Network Address field in the ARP
packet), and then (in real-time), running lsof too see who is trying to
connect to that address.
Once you have a likely culprit, you can use strace to verify it. You can
tell strace to attach to a currently running PID. You'll want to run it
something like this.
strace -p PID -e trace=connect
OR
strace -p PID -e trace=network
Hope at least some of this helps. This is a nice juicy problem. If you
need any help to write scripts to help track this down, just ask. I'm
really busy at present though, so it may take a wee while.
--
Cameron Kerr
(E-Mail Removed) :
http://nzgeeks.org/cameron/
Empowered by Perl!