Networking Forums

Networking Forums > Computer Networking > Linux Networking > multiple link aggregation questions: LAG /LACP /IEEE 802.3ad/ etc.

Reply
Thread Tools Display Modes

multiple link aggregation questions: LAG /LACP /IEEE 802.3ad/ etc.

 
 
Rahul
Guest
Posts: n/a

 
      09-05-2008, 05:54 PM
I'm still a bit uncertain about the way that I've set up my Linux box and
the switches correctly in my quest for Ethernet channel bonding.

Goal: to bond eth0 and eth1 on each blade and thus attain close to 2Gbps
transmit and receive. i.e. desire Load balancing / bandwidth
aggregation. Do *not* care at all about fault-tolerance.

Equipment: 1 server (3 eth ports), 23 blades, 2 Dell 6248 switches. Each
switch has 48 Gbit ports.

I setup bond0 on each blade. Used mode=6. Adaptive load balancing. From
what I read this seems most suitable (correct me if I am wrong please!)
since supports both transmit and receive side balancing.

Now comes the confusing parts:

1. Do I need Ling Aggregation Groups (LAGs) on the switch or not for my
switch-to-port connections? I receive multiple conflicting views on this
online. http://www.linuxfoundation.org/en/Net:Bonding says "does not
require any special switch support...does not require any special switch
support..." So do many other tutorials that do not mention anything about
any switch side configs being required at all!

Others say it still needs LAGs. My "common-sense" says I ought to tell
the switch that two of my ports are going to the same blade somehow.

2. Will the switch see two MAC ids or just a single one for the bond0
device if I examined its address tables? For alb it ought to be both
right? But I tried examining the ARP tables on the server and there for
each blade IP only the bond0 MAC is listed. Is that a sign something is
wrong or just my misinformed-ignorant paranoia! I read the specs on all
the 6 bonding algorithms (some load balancing and others for fault
tolerance) and see that some seem to transmit both MAC ids and other just
a single one? True?

3. How about the switch-to-switch connections? If I want to connect 8 eth
cables switch-to-switch (to aggregate bandwidth again) do I need a LAG
here or not? (8 is the magic number because thats the max number of ports
my switch will allow me to aggregate).

4. Each LAG group has a LACP option. Enable or not? The core Linux specs.
seem to have no mention of LACP; only company specific info seems to
exist! (Cisco, Dell etc.) I guess LACP is related to IEEE 802.3ad? Is
that only a workaround to prevent having to manually aggregate ports into
LAG groups? Or does it have an advantage as a load-balancing protocol to
my chose "Adaptive load balancing"

I guess it boils down to two questions: (1) To LAG or not-to-LAG (same
for LACP) (2) Is my mode=6 (Adaptive load balancing) the appropriate
mode?


--
Rahul
 
Reply With Quote
 
 
 
 
Rick Jones
Guest
Posts: n/a

 
      09-06-2008, 12:29 AM
In comp.os.linux.networking Rahul <(E-Mail Removed)> wrote:
> I'm still a bit uncertain about the way that I've set up my Linux
> box and the switches correctly in my quest for Ethernet channel
> bonding.


> Goal: to bond eth0 and eth1 on each blade and thus attain close to
> 2Gbps transmit and receive. i.e. desire Load balancing / bandwidth
> aggregation. Do *not* care at all about fault-tolerance.


Do you expect that 2Gbps over a _single_ connection/flow?

> Equipment: 1 server (3 eth ports), 23 blades, 2 Dell 6248
> switches. Each switch has 48 Gbit ports.


You mentioned blades - I cannot recall from earlier which blades these
were, but are they connecting to the outside world through a _switch_
module in the blade chassis or a pass-through module?

> I setup bond0 on each blade. Used mode=6. Adaptive load
> balancing. From what I read this seems most suitable (correct me if
> I am wrong please!) since supports both transmit and receive side
> balancing.


> Now comes the confusing parts:


Cannot really help much there.

> 4. Each LAG group has a LACP option. Enable or not? The core Linux
> specs. seem to have no mention of LACP; only company specific info
> seems to exist! (Cisco, Dell etc.) I guess LACP is related to IEEE
> 802.3ad?


IIRC they are one and the same

rick jones
--
denial, anger, bargaining, depression, acceptance, rebirth...
where do you want to be today?
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
Rahul
Guest
Posts: n/a

 
      09-08-2008, 11:40 PM
Rick Jones <(E-Mail Removed)> wrote in news:g9sitk$cs7$2
@usenet01.boi.hp.com:

> Do you expect that 2Gbps over a _single_ connection/flow?


Thanks again for your comments Rick. Yes. Am I wrong in expecting that?

> You mentioned blades - I cannot recall from earlier which blades these
> were, but are they connecting to the outside world through a _switch_
> module in the blade chassis or a pass-through module?


Dell Power Edge 1435 ("nodes"). They have twin eth ports each. They connect
to a switch. The switch connects to a server. Server to world. I am *not*
interested in node-to-world performance. Mostly node-to-node and node-to-
server.

> IIRC they are one and the same


Could very well be! But then why does Dell have a seperate toggle for LACP.
Implies I can have LAG but not LACP. Maybe its just a Dell error. Could
swith users from other vendors comment on their configs? So that we can see
if this is a Dell specific quirk?

--
Rahul
 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      09-09-2008, 12:23 AM
Rahul <(E-Mail Removed)> wrote:
> Rick Jones <(E-Mail Removed)> wrote in news:g9sitk$cs7$2
> @usenet01.boi.hp.com:
> > Do you expect that 2Gbps over a _single_ connection/flow?


> Thanks again for your comments Rick. Yes. Am I wrong in expecting
> that?


I think so but my point of view may not be shared by others. IIRC the
only mode that will spread the _outbound_ traffic of a single
connection/flow across multiple links in the bond/trunk/aggregate is
mode-rr aka round-robin.

I've never been terribly fond of that mode because it leads to
out-of-order TCP segments and a resulting increase in ACKs and
depending on the number of links in the bond/trunk/aggregate spurrious
TCP retransmissions.

I am not familiar with any switch with a similar round-robin mode for
the inbound traffic. Doesn't mean they don't exist mind you...

Those adaptive modes which are doing clever things with MAC addresses
are (probably) doing them for different destinations (IP addresses).
It would be necessary to _constantly_ be sending ARP refreshes (as in
an ARP frame for virtually every frame carrying a TCP segment) to get
traffic between a single pair of IPs to spread across different MAC
addresses.

IMO the best-if-not-only way to get > 1Gbit/s for a single TCP
connection is to use a 10G link.

> > You mentioned blades - I cannot recall from earlier which blades
> > these were, but are they connecting to the outside world through a
> > _switch_ module in the blade chassis or a pass-through module?


> Dell Power Edge 1435 ("nodes"). They have twin eth ports each. They
> connect to a switch. The switch connects to a server. Server to
> world. I am *not* interested in node-to-world performance. Mostly
> node-to-node and node-to- server.


The "nodes" connect directly to an external switch and not some switch
internal to the blade chassis? I'm not familiar with Dell blades, but
for HP C-Class blades, there are I/O modules which plug into the back
of the blade chassis to connect the eth ports on the blades themselves
with the outside world. Those can either be pass-through modules or
they can be actual switches. That is why I was asking about what was
in the blade chassis along with the blades themselves. If you have
switch modules you would need to bond/trunk/aggregate to _that_ switch
module, and then have another bond/trunk/aggregate between the
"chassis switch" and the external switch to which the server is
connected.

rick jones
--
The computing industry isn't as much a game of "Follow The Leader" as
it is one of "Ring Around the Rosy" or perhaps "Duck Duck Goose."
- Rick Jones
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
Rahul
Guest
Posts: n/a

 
      09-09-2008, 01:37 AM
Rick Jones <(E-Mail Removed)> wrote in news:ga4fl7$ndc$1
@usenet01.boi.hp.com:

> The "nodes" connect directly to an external switch and not some switch
> internal to the blade chassis? I'm not familiar with Dell blades, but
> for HP C-Class blades, there are I/O modules which plug into the back
> of the blade chassis to connect the eth ports on the blades themselves
> with the outside world. Those can either be pass-through modules or
> they can be actual switches. That is why I was asking about what was
> in the blade chassis along with the blades themselves. If you have
> switch modules you would need to bond/trunk/aggregate to _that_ switch
> module, and then have another bond/trunk/aggregate between the
> "chassis switch" and the external switch to which the server is
> connected.
>


Rick, my bad. Maybe I confused you with my misleading usage of the term
"blades"? These are Dell Power Edge 1435 Rack Mount servers.
http://www.dell.com/content/products.../pedge_sc1435?
c=us&cs=555&l=en&s=biz

The backplane has twin eth ports. We connected these using ordinary CAT5e
cables to ports on a Dell switch. Switch is also a Dell Power Connect 6248
with 48 Gbit ports.

Does that clarify the situation better?

--
Rahul
 
Reply With Quote
 
Rahul
Guest
Posts: n/a

 
      09-09-2008, 01:59 AM
Rick Jones <(E-Mail Removed)> wrote in news:ga4fl7$ndc$1
@usenet01.boi.hp.com:

> I think so but my point of view may not be shared by others. IIRC the
> only mode that will spread the _outbound_ traffic of a single
> connection/flow across multiple links in the bond/trunk/aggregate is
> mode-rr aka round-robin.
> I've never been terribly fond of that mode because it leads to
> out-of-order TCP segments and a resulting increase in ACKs and
> depending on the number of links in the bond/trunk/aggregate spurrious
> TCP retransmissions.


Interesting. Any downsides to mode-rr? Is it transmit-side load balancing
only? Also, why do you think that some of the other "smarter" modes (alb
/ 802.3ab) do not achieve a bandwidth multiplier, can I ask? Just a
personal preference or anything fundamentally iffy about those modes?


> I am not familiar with any switch with a similar round-robin mode for
> the inbound traffic. Doesn't mean they don't exist mind you...


I thought a LAG was the same idea. If a switch cannot distinguish between
two similar links and clubs them together doesn't that achive the same
effect? Maybe I am wrong.

> Those adaptive modes which are doing clever things with MAC addresses
> are (probably) doing them for different destinations (IP addresses).
> It would be necessary to _constantly_ be sending ARP refreshes (as in
> an ARP frame for virtually every frame carrying a TCP segment) to get
> traffic between a single pair of IPs to spread across different MAC
> addresses.


Right. Which is why mode=6 (alb) will only (IMO) give a bandwidth
multiplier when speaking to *at least* two different peers. When talking
to a single peer (single IP) no advantage.

> IMO the best-if-not-only way to get > 1Gbit/s for a single TCP
> connection is to use a 10G link.


Too expensive for a university-research cluster!

--
Rahul
 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      09-09-2008, 05:21 PM
Rahul <(E-Mail Removed)> wrote:
> Rick, my bad. Maybe I confused you with my misleading usage of the
> term "blades"? These are Dell Power Edge 1435 Rack Mount servers.
> http://www.dell.com/content/products.../pedge_sc1435?
> c=us&cs=555&l=en&s=biz


> The backplane has twin eth ports. We connected these using ordinary
> CAT5e cables to ports on a Dell switch. Switch is also a Dell Power
> Connect 6248 with 48 Gbit ports.


> Does that clarify the situation better?


Yes. Standalone systems. Understood. My end conclusion about
single-stream, aggregatation and 10Gig still stands though

rick jones
--
denial, anger, bargaining, depression, acceptance, rebirth...
where do you want to be today?
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      09-09-2008, 05:30 PM
Rahul <(E-Mail Removed)> wrote:
> Rick Jones <(E-Mail Removed)> wrote in news:ga4fl7$ndc$1
> @usenet01.boi.hp.com:


> > I think so but my point of view may not be shared by others. IIRC the
> > only mode that will spread the _outbound_ traffic of a single
> > connection/flow across multiple links in the bond/trunk/aggregate is
> > mode-rr aka round-robin.
> > I've never been terribly fond of that mode because it leads to
> > out-of-order TCP segments and a resulting increase in ACKs and
> > depending on the number of links in the bond/trunk/aggregate spurrious
> > TCP retransmissions.


> Interesting. Any downsides to mode-rr?


It leads to out-of-order TCP segments, which leads to an increase in
the number of ACKs, which will increase CPU utilization per KB
transferred (service demand in netperf-speak) and on the larger link
counts in a single aggregate, spurrious TCP retransmissions which will
waste bandwidth and suppress the congestion window.

> Is it transmit-side load balancing only?


Yes.

> Also, why do you think that some of the other "smarter" modes (alb /
> 802.3ab) do not achieve a bandwidth multiplier, can I ask? Just a
> personal preference or anything fundamentally iffy about those
> modes?


Unless I've really misunderstood what is going on, the modes playing
tricks with ARP cannot on first principles affect a single flow. They
get traffic to flow over different links by handing-out different MAC
addreses to queries for their one local IP. Even if we assume that
every segment sent on a TCP connection does an ARP cache lookup, the
only way it could get a new MAC address each time would be if there
was an ARP update between every TCP segment. I cannot imagine any of
the modes in linux bonding doing something sooo terribly inefficient.
It would make mode-rr look positively pristine in comparison.

The point of link aggregation was to increase aggregate throughput and
provide a modicum of HA. Increasing the speed of a single flow was
not part of the design center.

> > I am not familiar with any switch with a similar round-robin mode for
> > the inbound traffic. Doesn't mean they don't exist mind you...


> I thought a LAG was the same idea. If a switch cannot distinguish between
> two similar links and clubs them together doesn't that achive the same
> effect? Maybe I am wrong.


All depends on what the switch does. My experience with other
switches (non-Dell) has been that when presented with an aggregate the
switch will hash on some addressing in the frame to pick the link on
which it will place the frame. Soemtimes this is simply the MAC,
sometimes it may include the IP. I've heard unconfirmed rumours that
some switches may even go so far as to look at TCP/UDP port numbers.
However, none of that would result in traffic for a single flow
flowing over multiple links in parallel.

> > Those adaptive modes which are doing clever things with MAC
> > addresses are (probably) doing them for different destinations (IP
> > addresses). It would be necessary to _constantly_ be sending ARP
> > refreshes (as in an ARP frame for virtually every frame carrying a
> > TCP segment) to get traffic between a single pair of IPs to spread
> > across different MAC addresses.


> Right. Which is why mode=6 (alb) will only (IMO) give a bandwidth
> multiplier when speaking to *at least* two different peers. When
> talking to a single peer (single IP) no advantage.


Right, and you said you needed an increase for comms to a single peer
right?

> > IMO the best-if-not-only way to get > 1Gbit/s for a single TCP
> > connection is to use a 10G link.


> Too expensive for a university-research cluster!


How did the line go in "The Right Stuff?" "No bucks, no Buck Rogers."


rick jones
--
a wide gulf separates "what if" from "if only"
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Multiple Internet connection on one server (link aggregation?) brodseba Windows Networking 2 01-07-2008 06:19 PM
Link aggregation at level higher than EtherChannel? Juha Laiho Linux Networking 1 10-03-2006 04:48 PM
load balancing (aggregation) for multiple pppoe connections to same ISP neofyte Linux Networking 1 05-17-2005 01:52 AM
bonding, link aggregation, and switch config Linux Networking 8 12-15-2004 11:47 PM
Link aggregation for dumb 10/100 switches Chris Adams Linux Networking 3 11-14-2003 08:27 AM



1 2 3 4 5 6 7 8 9 10 11