Networking Forums

Networking Forums > Computer Networking > Linux Networking > load-balancing twin 10GbE?

Reply
Thread Tools Display Modes

load-balancing twin 10GbE?

 
 
Steve Wampler
Guest
Posts: n/a

 
      12-16-2008, 03:04 PM
Has anyone had any experience with load-balancing using
two 10GbE ports? (On one card or two?). I'm specifically
interested in performance measures transporting large data
volumes in one direction only. Did you see any issues with
backplane contention, etc.? What hardware (controllers and
Linux boxes)?

Thanks for any information!

-Steve
--
Steve Wampler -- (E-Mail Removed)
The gods that smiled on your birth are now laughing out loud.
 
Reply With Quote
 
 
 
 
Rick Jones
Guest
Posts: n/a

 
      12-16-2008, 09:02 PM
Steve Wampler <(E-Mail Removed)> wrote:
> Has anyone had any experience with load-balancing using
> two 10GbE ports? (On one card or two?). I'm specifically
> interested in performance measures transporting large data
> volumes in one direction only. Did you see any issues with
> backplane contention, etc.? What hardware (controllers and
> Linux boxes)?


IIRC most 10G NICs these days are PCIe 1.1 x8. I've been told that a
PCIe 1.1 x8 slot has a total of about 16 Gbit/s bandwidth after
overheads.

So, that could be used as a first approximation of a limit when
bonding the two ports of a single card.

Not that I have any data myself but are you looking for a single
stream across the bond, or multiple streams?

Are you going back-to-back with 10G between systems or will there be a
switch in between?

If you want a single stream to try to take advantage of multiple links
in a bond you are pretty much limited to mode-rr, and so at the very
real risk (certainty IMO) of reordered traffic at the receiver. I
suspect that will affect the receiver's ability to effectively employ
Large Receive Offload. The out of order traffic will result in an
increased ACK load, if the out of order is "enough" out of order it
can trigger spurrious fast retransmissions. Further, while the
bonding software on the Linux host will control how traffic is spread
on outbound, it is the _switch_ which controls how traffic is spread
on inbound, and if the switch does not have a mode-rr equivalent, you
might get 2 links on transmit but only one link on recieve.

If you are talking about multiple streams the discussion shifts to
what parts of which headers are looked at when making link choices -
again both in the host and in the switch.

rick jones
--
Wisdom Teeth are impacted, people are affected by the effects of events.
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
Steve Wampler
Guest
Posts: n/a

 
      12-17-2008, 02:17 AM
Rick Jones wrote:
> IIRC most 10G NICs these days are PCIe 1.1 x8. I've been told that a
> PCIe 1.1 x8 slot has a total of about 16 Gbit/s bandwidth after
> overheads.


Ah - good to know - thanks!

> So, that could be used as a first approximation of a limit when
> bonding the two ports of a single card.
>
> Not that I have any data myself but are you looking for a single
> stream across the bond, or multiple streams?


Single stream, at 960MB/s for 4 hours/day (typical) with possible 8 hour
duration (rarer). The source is an as-yet-unbuilt camera system. (There's
actually more than one, but we should be able to isolate the data flows.)
The 960MB is too close to 10Gb for me the believe we can get by with one
port - hence the interest in bonding.

> Are you going back-to-back with 10G between systems or will there be a
> switch in between?


We'd prefer to have a switch, if possible at those rates. The cameras
will be on a rotating platform with the target systems well off the
platform, so having to switch fibers to switch back-ends between
cameras isn't very attractive.

> If you want a single stream to try to take advantage of multiple links
> in a bond you are pretty much limited to mode-rr, and so at the very
> real risk (certainty IMO) of reordered traffic at the receiver. I
> suspect that will affect the receiver's ability to effectively employ
> Large Receive Offload. The out of order traffic will result in an
> increased ACK load, if the out of order is "enough" out of order it
> can trigger spurrious fast retransmissions. Further, while the
> bonding software on the Linux host will control how traffic is spread
> on outbound, it is the _switch_ which controls how traffic is spread
> on inbound, and if the switch does not have a mode-rr equivalent, you
> might get 2 links on transmit but only one link on recieve.


Thanks - that's extremely useful! (or it will be as soon as I get a
translation back into english )


--
Steve Wampler -- (E-Mail Removed)
The gods that smiled on your birth are now laughing out loud.
 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      12-17-2008, 04:37 AM
Steve Wampler <(E-Mail Removed)> wrote:
> Rick Jones wrote:
> > Not that I have any data myself but are you looking for a
> > single stream across the bond, or multiple streams?


> Single stream, at 960MB/s for 4 hours/day (typical) with possible 8
> hour duration (rarer). The source is an as-yet-unbuilt camera
> system. (There's actually more than one, but we should be able to
> isolate the data flows.) The 960MB is too close to 10Gb for me the
> believe we can get by with one port - hence the interest in bonding.


Well, depending on the "oomph" you have on the sender and the
reciever, it is possible to achieve "link rate" with TCP over 10G
Ethernet with either 1500 byte MTU employing TSO - TCP Segmentation
Offload - on the receiver and LRO - Large Receive Offload - on the
sender. It gets even easier if you can use JumboFrames of 9000 bytes
or more.

> > Are you going back-to-back with 10G between systems or will there
> > be a switch in between?


> We'd prefer to have a switch, if possible at those rates. The
> cameras will be on a rotating platform with the target systems well
> off the platform, so having to switch fibers to switch back-ends
> between cameras isn't very attractive.


I'm not sure if any commercially available switches offer a mode-rr
(round robin) setting. Some use MAC addresses for picking a link in
the bond/trunk/team/aggregate, some can use IP address, some can use
TCP port numbers. But I'm not sure if any do round-robin.

> > If you want a single stream to try to take advantage of multiple
> > links in a bond you are pretty much limited to mode-rr, and so at
> > the very real risk (certainty IMO) of reordered traffic at the
> > receiver. I suspect that will affect the receiver's ability to
> > effectively employ Large Receive Offload. The out of order
> > traffic will result in an increased ACK load, if the out of order
> > is "enough" out of order it can trigger spurrious fast
> > retransmissions. Further, while the bonding software on the Linux
> > host will control how traffic is spread on outbound, it is the
> > _switch_ which controls how traffic is spread on inbound, and if
> > the switch does not have a mode-rr equivalent, you might get 2
> > links on transmit but only one link on recieve.


> Thanks - that's extremely useful! (or it will be as soon as I get a
> translation back into english )


TCP will "work" when its segments arrive out of order, but for every
out-of-order segment a TCP receiver will generate an immediate ACK.
That ACK will have the sequence number of the first "missing" TCP
segment. That means that both the receiving and sending TCPs will
spend more CPU cycles in ACK processing.

A sending TCP has a heuristic called "fast retransmit" which works
based on the ass-u-me-ption that traffic is rarely reordered, so if
traffic arrives out of order at a receiver it implies some traffic was
lost. By default, if a sending TCP receives three duplicate ACKs
(ACKs saying the same sequence number is the next expected) the
sending TCP will assume that segment was lost and retransmit it.

Sending TCPs also maintain an idea of how much traffic they can send
at one time without triggering packet loss in the network. That is
called the congestion window. When a sending TCP has to retransmit it
will adjust its congestion window downwards - sometimes considerably.

So, lots of traffic reordering can result in spurrious fast
retransmissions, which can result in smaller congestion windows which
can result in lower performance.

The linux tcp stack on the sending side can have its sensitivity to
duplicate ACKs "tuned" to the point of effectively eliminating fast
retransmissions. Of course then if there *is* a lost segment one
might end-up waiting for a retransmission timeout and that is really
bad news. Enabling Selective ACKnowledgement may help.

Similarly, I think that many of the LRO schemes in NICs make use of
the "traffic is rarely reordered" assumption. So, when traffic
arrives out of order the NIC is not able to aggregate as many smaller
segments into one larger one to give to the host. So the receiving
host has more per-packet work to do because it is receiving more
packets.

The above may not be modern english, but perhaps it isn't any worse
than middle english now

Whan that Aprill, with his shoures soote
The droghte of March hath perced to the roote

http://www.librarius.com/cantales.htm

rick jones

I've heard occasional talk about 40 and 100Gbit Ethernet - not sure if
any of it is far enough along for an "observatory special" though.
There may be something that fast or faster in the telco space. In
either case we are probably talking some serious dollars though.

--
portable adj, code that compiles under more than one compiler
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
david
Guest
Posts: n/a

 
      12-17-2008, 09:27 AM
On Tue, 16 Dec 2008 20:17:50 -0700, Steve Wampler rearranged some
electrons to say:

> The 960MB is too close to 10Gb for me the believe we can
> get by with one port - hence the interest in bonding.
>


It is? 960MB is 9.6% of 10GB.

 
Reply With Quote
 
Steve Wampler
Guest
Posts: n/a

 
      12-17-2008, 01:47 PM
david wrote:
> On Tue, 16 Dec 2008 20:17:50 -0700, Steve Wampler rearranged some
> electrons to say:
>
>> The 960MB is too close to 10Gb for me the believe we can
>> get by with one port - hence the interest in bonding.
>>

>
> It is? 960MB is 9.6% of 10GB.
>


True, but it's 96% of 10Gb.

--
Steve Wampler -- (E-Mail Removed)
The gods that smiled on your birth are now laughing out loud.
 
Reply With Quote
 
Steve Wampler
Guest
Posts: n/a

 
      12-17-2008, 01:52 PM
Rick Jones wrote:

> The above may not be modern english, but perhaps it isn't any worse
> than middle english now


Yes - thanks! You've been a great help for this functional illiterate!

> I've heard occasional talk about 40 and 100Gbit Ethernet - not sure if
> any of it is far enough along for an "observatory special" though.
> There may be something that fast or faster in the telco space. In
> either case we are probably talking some serious dollars though.


Thanks again. We have a couple of years before spending any money, but need
to show costs and feasibility assuming current technology. So maybe
the serious dollars will be a little sillier by then...


--
Steve Wampler -- (E-Mail Removed)
The gods that smiled on your birth are now laughing out loud.
 
Reply With Quote
 
Hactar
Guest
Posts: n/a

 
      12-17-2008, 08:23 PM
In article <(E-Mail Removed)>,
Steve Wampler <(E-Mail Removed)> wrote:
> david wrote:
> > On Tue, 16 Dec 2008 20:17:50 -0700, Steve Wampler rearranged some
> > electrons to say:
> >
> >> The 960MB is too close to 10Gb for me the believe we can
> >> get by with one port - hence the interest in bonding.
> >>

> >
> > It is? 960MB is 9.6% of 10GB.

>
> True, but it's 96% of 10Gb.


I get 75%:

960 1024 1024 * * (gives 1006632960)
10 1024 1024 1024 * * * 8 / (gives 1342177280)
/ (gives 0.75)

Have I forgotten something? Are bytes in TCP not octets? Is "10 Gb
ethernet" not really 10 * 2^30 bps?

--
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing? [TOFU := text oben,
A: Top-posting. followup unten]
Q: What is the most annoying thing on usenet? -- Daniel Jensen
 
Reply With Quote
 
Pascal Hambourg
Guest
Posts: n/a

 
      12-17-2008, 09:23 PM
Hello,

Hactar a écrit :
>
> Is "10 Gb ethernet" not really 10 * 2^30 bps?


No. In telecom/networking multiplier prefixes have always been used with
their "classic" decimal meanings.
k = 10^3 (thousand)
M = 10^6 (million)
G = 10^9 (billion)

So 10 Gbit/s = 10 * 10^9 bit/s. Besides, it's the raw signalling bit
rate on the wire, not the payload bit rate.
 
Reply With Quote
 
Steve Wampler
Guest
Posts: n/a

 
      12-17-2008, 10:10 PM
Steve Wampler wrote:
>> It is? 960MB is 9.6% of 10GB.
>>

>
> True, but it's 96% of 10Gb.


The moment I sent this, I just *knew* someone would do
the math .

That's a (very) rough approximation to point out that
10GB ~= 10Gb. The actual % is lower, but without knowing
frame size, protocol (TCP vs UDP), etc., it's hard to get
an accurate number. Someone with more network smarts
than me (uh, that'd be most of you...) can do the real
math. Anyway, the % is high enough that I worry about
sending 960MB/s sustained down a 10GbE wire, though thanks
to Rick's input I'm worrying a little less than before!


--
Steve Wampler -- (E-Mail Removed)
The gods that smiled on your birth are now laughing out loud.
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
NLB on DC's for DNS load balancing BSweeney Windows Networking 10 07-25-2007 02:48 PM
load balancing Alberto Brivio Windows Networking 1 08-22-2006 07:48 PM
Load Balancing Phil Andersen Windows Networking 2 04-15-2004 08:16 PM
NIC Load Balancing Chicho Windows Networking 2 02-20-2004 06:55 PM
FTP Load Balancing Windows Networking 0 01-27-2004 02:13 AM



1 2 3 4 5 6 7 8 9 10 11