Networking Forums

Networking Forums > Computer Networking > Linux Networking > abnormal (excessive) number of arp requests on subnet?

Reply
Thread Tools Display Modes

abnormal (excessive) number of arp requests on subnet?

 
 
Rahul
Guest
Posts: n/a

 
      07-10-2010, 09:38 PM
I did a tcpdump like so:

tcpdump -c 1000 -ennqti eth3 \( arp or icmp \)

In a one minute period I get 1000 ARP requests. Is this normal? I
reproduce below the traffic in case this helps diagnosis. The network is
static, no new devices are being added or removed. The MAC<->IP
association is also static. Why is there such a lot of ARP traffic or is
this normal?

The network has ~265 servers. There is only a single physical network but
twin subnets: 10.0.x.x (primary traffic) and 172.16.x.x (monitoring).
i.e. each server has a single physical card but it reponds to two MAC and
IP addresses.

Would increasing the size of my ARP cache be a solution? I'm a bit
confused because (as I understant ARP caching) my ARP cache size is set
to 512 or 1024 (not sure which) but the actual ARP table seems to have
only 265 entries (values below). Or is my understanding of ARP wrong?

cat /proc/net/arp | wc -l
265

ip neigh | wc -l
264

cat /proc/sys/net/ipv4/neigh/default/gc_thresh2
512

cat /proc/sys/net/ipv4/neigh/default/gc_thresh3
1024



############################
00:26:b9:58:ec:29 > ff:ff:ff:ff:ff:ff, ARP, length 60: arp reply
172.16.2.5 is-at 00:26:b9:58:ec:29
00:26:b9:58:ec:2a > 00:26:b9:58:d7:2f, ARP, length 60: arp who-has
10.0.3.2 tell 10.0.0.11
00:26:b9:58:ec:2c > ff:ff:ff:ff:ff:ff, ARP, length 60: arp reply
172.16.0.11 is-at 00:26:b9:58:ec:2c
00:26:b9:58:ec:48 > 00:26:b9:58:d7:2f, ARP, length 60: arp who-has
10.0.3.2 tell 10.0.1.66
00:26:b9:58:ec:4a > ff:ff:ff:ff:ff:ff, ARP, length 60: arp reply
172.16.1.66 is-at 00:26:b9:58:ec:4a
00:26:b9:58:ec:56 > ff:ff:ff:ff:ff:ff, ARP, length 60: arp reply
172.16.1.12 is-at 00:26:b9:58:ec:56
00:26:b9:58:ec:5a > 00:26:b9:58:d7:2f, ARP, length 60: arp who-has
10.0.3.2 tell 10.0.0.52
################################



--
Rahul
 
Reply With Quote
 
 
 
 
Chris Cox
Guest
Posts: n/a

 
      07-11-2010, 01:15 AM
Rahul wrote:
> I did a tcpdump like so:
>
> tcpdump -c 1000 -ennqti eth3 \( arp or icmp \)
>
> In a one minute period I get 1000 ARP requests. Is this normal? I
> reproduce below the traffic in case this helps diagnosis. The network is
> static, no new devices are being added or removed. The MAC<->IP
> association is also static. Why is there such a lot of ARP traffic or is
> this normal?


In general, I'd say pretty normal. Things are always making queries... who-has
messages abound, as well as i-have messages.
 
Reply With Quote
 
Rahul
Guest
Posts: n/a

 
      07-12-2010, 07:31 PM
(E-Mail Removed)d (Moe Trin) wrote in
news:(E-Mail Removed):

Thanks Moe for a detailed analysis!


> BRIEFLY - ARP is used to resolve IP->MAC. The querying and answering
> systems will keep an individual entry for on the order of one minute.
> For the Linux kernel, this is NORMALLY a compile-time setting. You
> may be able to increase the timeout.
>


Why is the cache maintained on a time basis? Isn't it more logial to
specify the max number of ARP cache entries? Or are the two approaches
identical?

>>In a one minute period I get 1000 ARP requests. Is this normal?

>
> Depends. How "busy" is the network - how many hosts talking to how
> many hosts how often?


I know there are ~265 physical servers and x2 = 530 IP addresses. The
10.0.x.x should be fairly busy. But I have no way to quantify it right
now. In fact, what tool does one use to answer the question you raised:
"How "busy" is the network?"

Maybe the answer is in the RFC's you quoted. I'm reading them now. But if
anyone has pointers as to how to answer the above question please do
tell. I don't have access to the switches so can't get any switch side
stats. unfortunately. All monitoring will have to be server-side.
>
> Overlaying networks rarely serves any useful purpose other than to
> increase overhead. Are you sure this is needed?


I am not sure. Maybe my design decision was wrong. The situation is that
we have normal traffic as well as IPMI (maintainance mode) traffic
piggybacking over the same physical wire and adapters. Conceptually I
thought it made sense to keep those seperate? But I am open to
sugesstions if this was a bad idea.

> It's bad enough
> with 265 hosts in one collision domain, never mind 530.


But that is only relevant for broadcast traffic, correct? Unicast traffic
will be intelligently handled by the switch so that the collission domain
is only equal to the number of switch ports? Pardon my networking
ignorance if this is wrong.


> improvement in network speed.). Doing a traffic analysis (who is
> talking to who) could be a real eye-opener, suggesting a more
> efficient layout.


Is tcpdump the tool of choice for this? Or wireshark? Or something else?


> ARP is used when host A wants to talk to host B. If it doesn't need
> to talk to B, why should it be caching B's MAC?


Is there a downside to having a larger ARP cache? I mean sure, it takes
more memory but these days RAM is cheap and anyways a 1000 row IP<->MAC
lookup table is not a big size.

>Also, how is your
> network _physically_ connected? Is this coax (10Base2 or 10Base5)


It's a 1GigE ethernet cable. I think it's CAT5e (1000BASE-T).

> cache ARP replies heard from "other" systems. If the network is
> using switches, _broadcast_ packets are heard by all (depending on


The network is switched. Each switch takes around 48 hosts so we have 6
Cisco-Catalyst switches interconnected with 10GigE fiber links.

> the switch), while _unicast_ packets (ARP replies) are heard only
> by the "interested" party. If using switches, you need also look at
> the timeouts in the individual switches as well.


Ah! Thanks! I didn't realize the switches have a ARP cache timeout too.
Makes sense. I'll ask my networking folks about that.


>
> My condolences.


For using Dell? I'm confused.

> Something is fucked with your capture data. Example - the first line
> shows Dull 58:ec:29 _broadcasting_ an ARP reply. That should be a
> unicast from Dull 58:ec:29 to the MAC of the querying system. In
> the second line, Dull ec:2a sends a _unicast_ query asking who is
> "10.0.3.2". That should be a broadcast unless this is a reconfirm.
> You may also want to look at RFC0826, which is the specification for
> ARP referenced in RFC1122.


Wow! You are right. I never noticed this. I will definately dig deeper
into this. Something is not right.


--
Rahul
 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      07-12-2010, 09:03 PM
Rahul <(E-Mail Removed)> wrote:
> Why is the cache maintained on a time basis?


It helps to bound "fail-over" time when an IP is migrated from being
associated with one MAC address to another.

> Is tcpdump the tool of choice for this? Or wireshark? Or something
> else?


If one is a fan of Star Trek "TOS" tcpdump can be though of as the
mnemonic memory circuits made from stone knives and bearskins. It is
a basic CLI (command-line interface) packet capture utility.
Wireshark adds a gooey and whatnot. They both use libpcap to perform
actual packet capture. The differences would be in what they can
decode and how they display it.

> Ah! Thanks! I didn't realize the switches have a ARP cache timeout too.
> Makes sense. I'll ask my networking folks about that.


Indeed, anything with an ARP cache needs to have a way to keep it
up-to-date.

rick jones
--
oxymoron n, commuter in a gas-guzzling luxury SUV with an American flag
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
Rahul
Guest
Posts: n/a

 
      07-12-2010, 09:49 PM
Rick Jones <(E-Mail Removed)> wrote in news:i1fvvl$adq$1
@usenet01.boi.hp.com:

>> Why is the cache maintained on a time basis?

>
> It helps to bound "fail-over" time when an IP is migrated from being
> associated with one MAC address to another.
>


My other concern is that a lot of my codes are latency sensitive. Thus
whenever a IP is not found in the cache this means an additional ARP lookup
will be needed. So I am afraid that this will degrade my effective latency.

That's why I am trying to keep all my MAC<->IP pairs cached. If that is a
reasonable strategy.

--
Rahul
 
Reply With Quote
 
Pascal Hambourg
Guest
Posts: n/a

 
      07-12-2010, 10:05 PM
Hello,

Rahul a écrit :
>
> I didn't realize the switches have a ARP cache timeout too.


They don't. At least pure layer-2 switches, because they don't care
about ARP or any other protocol above ethernet.

>> Something is fucked with your capture data. Example - the first line
>> shows Dull 58:ec:29 _broadcasting_ an ARP reply. That should be a
>> unicast from Dull 58:ec:29 to the MAC of the querying system.


That could be some kind of gratuitous ARP.
 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      07-13-2010, 06:07 PM
Rahul <(E-Mail Removed)> wrote:
> My other concern is that a lot of my codes are latency
> sensitive. Thus whenever a IP is not found in the cache this means
> an additional ARP lookup will be needed. So I am afraid that this
> will degrade my effective latency.


Just *how* latency sensitive? If one request/response pair out of N,
where N could be quite large depending on how fast you are running,
has an extra RTT added to it is that really going to cause problems?
Is a LAN RTT even a non-trivial fraction of the service time of your
application?

> That's why I am trying to keep all my MAC<->IP pairs cached. If that is a
> reasonable strategy.


Reasonable is subjective.

Some platforms allow the addition of "permanent" entries in the ARP
cache via the likes of the arp command. If one does add a permanent
entry, s/he becomes responsible for dealing with the IP moving from
one MAC to another case themselves.

rick jones
--
a wide gulf separates "what if" from "if only"
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
Rahul
Guest
Posts: n/a

 
      07-13-2010, 07:57 PM
Rick Jones <(E-Mail Removed)> wrote in
news:i1ia16$7sa$(E-Mail Removed):

> Is a LAN RTT even a non-trivial fraction of the service time of your
> application?


Yes, I think it is. It is a MPI application (computational chemistry :
VASP) using distributed memory that does a fair amount of small-packet
traffic.

> Just *how* latency sensitive?


It is hard to say since I don't know of a way to vary latency on demand (is
there a way? I'd be eager to know!) to test response. These are the data
points I have:

Using 6 servers with 8 cores each.
RT Latency Job Runtime (normalised secs)
130 usec 10x
18 usec 1.5x
7 usec 1x

18 usec is my current network.

>If one request/response pair out of N,
> where N could be quite large depending on how fast you are running,
> has an extra RTT added to it is that really going to cause problems?


You are probably right. It won't matter. It depends on how large is N.

--
Rahul
 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      07-13-2010, 10:45 PM
Rahul <(E-Mail Removed)> wrote:
> Rick Jones <(E-Mail Removed)> wrote in
> news:i1ia16$7sa$(E-Mail Removed):


> > Is a LAN RTT even a non-trivial fraction of the service time of your
> > application?


> Yes, I think it is. It is a MPI application (computational chemistry :
> VASP) using distributed memory that does a fair amount of small-packet
> traffic.


I thought the goal of most MPI applications was to minimize the number
of MPI message passings?

> > Just *how* latency sensitive?


> It is hard to say since I don't know of a way to vary latency on demand (is
> there a way? I'd be eager to know!) to test response. These are the data
> points I have:


> Using 6 servers with 8 cores each.
> RT Latency Job Runtime (normalised secs)
> 130 usec 10x
> 18 usec 1.5x
> 7 usec 1x


> 18 usec is my current network.


If your cluster is limited to 6 nodes, you might want to consider a
"cluster in a box" with an 8S system, you should get rather better
than 18 usec RTT over loopback.

rick jones

> >If one request/response pair out of N,
> > where N could be quite large depending on how fast you are running,
> > has an extra RTT added to it is that really going to cause problems?


> You are probably right. It won't matter. It depends on how large is N.


> --
> Rahul


--
The glass is neither half-empty nor half-full. The glass has a leak.
The real question is "Can it be patched?"
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      07-13-2010, 10:49 PM
Moe Trin <(E-Mail Removed)> wrote:
> They're not really ARP caches, so much as a lookup table of which
> host is connected to which hose. When the switch looses it - or when
> it can't figure out which port a host is on, it will often broadcast
> the packet to all ports - not good for efficiency.


Given the added meaning of "broadcast" it might not be a bad idea to
put that as "it will transmit the packet on all ports"

rick jones
--
Wisdom Teeth are impacted, people are affected by the effects of events.
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Reserved addresses in subnet & Number of subnets unruh Windows Networking 0 01-11-2010 03:56 AM
determine subnet and host number bill tie Windows Networking 5 10-27-2008 03:58 PM
Abnormal DNS query result.. Fva... Linux Networking 3 07-14-2008 08:33 PM
excessive connections rob Windows Networking 1 09-28-2004 02:50 AM
Responds to ARP requests for specified subnet Dmytro Bablinyuk Linux Networking 2 12-10-2003 05:45 PM



1 2 3 4 5 6 7 8 9 10 11