|
||||||||
|
|
|||||||
![]() |
|
|
Thread Tools | Display Modes |
|
#1
|
|
Hi,
I'm currently trying to reduce our backup time. We have to transfer ~250GB/day (50GB-2000GB). We tried to connect one of the file server using bonding and 2 unused network interfaces to improve the performance, but we are only seeing a small improvement (5-10%; averg. is currently 60MB/s). So I started to play a little bit on our cluster, where we have a couple of nodes with bonding interfaces. Using netcat I can transfer a file of 670MB within 5.9 sek, but this is only the theor. value for 1 GBit. Using scp I can only see ~45MB/s. (it's a 8 core node, so it's fast enough ...). Is there anything to improve this values substantial ? Is it principal not possible to jumb over the single link speed ? Any hints is welcome !! Bye, Peer ------- OS: Novell SLES9 SP3, AMD64 Network E1000 (from Sles) Bounding (RR) <-> Alcatel OS6800 using static linkagg cluster nodes have 16GB RAM / 8 Cores basic tuning using sysctl has been done: net.ipv4.tcp_westwood = 0 net.ipv4.tcp_low_latency = 0 net.ipv4.tcp_frto = 0 net.ipv4.tcp_tw_reuse = 0 net.ipv4.tcp_adv_win_scale = 2 net.ipv4.tcp_app_win = 31 net.ipv4.tcp_rmem = 10485760 10485760 10485760 net.ipv4.tcp_wmem = 10485760 10485760 10485760 net.ipv4.tcp_mem = 10485760 10485760 10485760 net.ipv4.tcp_dsack = 1 net.ipv4.tcp_ecn = 0 net.ipv4.tcp_reordering = 3 net.ipv4.tcp_fack = 1 net.ipv4.tcp_orphan_retries = 0 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_rfc1337 = 0 net.ipv4.tcp_stdurg = 0 net.ipv4.tcp_abort_on_overflow = 0 net.ipv4.tcp_tw_recycle = 0 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_fin_timeout = 60 net.ipv4.tcp_retries2 = 15 net.ipv4.tcp_retries1 = 3 net.ipv4.tcp_keepalive_intvl = 75 net.ipv4.tcp_keepalive_probes = 9 net.ipv4.tcp_keepalive_time = 7200 net.ipv4.tcp_max_tw_buckets = 180000 net.ipv4.tcp_max_orphans = 65536 net.ipv4.tcp_synack_retries = 5 net.ipv4.tcp_syn_retries = 5 net.ipv4.tcp_retrans_collapse = 1 net.ipv4.tcp_sack = 0 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_timestamps = 1 Peer-Joachim Koch |
|
#2
|
|||
|
|||
|
On Apr 15, 11:25 pm, Peer-Joachim Koch <pk...@bgc-jena.mpg.de> wrote:
> I'm currently trying to reduce our backup time. We have to transfer > ~250GB/day (50GB-2000GB). > We tried to connect one of the file server using bonding and 2 unused > network interfaces to improve the performance, but we are only seeing > a small improvement (5-10%; averg. is currently 60MB/s). Most bonding implementations assign a connection to a link, so a single connection will not run any faster. > So I started to play a little bit on our cluster, where we have a couple > of nodes with bonding interfaces. Using netcat I can transfer a file > of 670MB within 5.9 sek, but this is only the theor. value for > 1 GBit. Yep, that's what I would expect. > Using scp I can only see ~45MB/s. (it's a 8 core node, so it's fast > enough ...). Since SCP is not multi-threaded, the transfer speed is limited by how fast one core can do encryption. Last I checked, there was a huge speed difference between various ciphers, so you might try a few to see which is fastest. DS |
|
#3
|
|||
|
|||
|
David Schwartz schrieb:
> On Apr 15, 11:25 pm, Peer-Joachim Koch <pk...@bgc-jena.mpg.de> wrote: > >> I'm currently trying to reduce our backup time. We have to transfer >> ~250GB/day (50GB-2000GB). >> We tried to connect one of the file server using bonding and 2 unused >> network interfaces to improve the performance, but we are only seeing >> a small improvement (5-10%; averg. is currently 60MB/s). > > Most bonding implementations assign a connection to a link, so a > single connection will not run any faster. So there is no way to benefit from a trunk for a single application ? Only multi threaded or many individual app. can use the higher bandwith. The main problem is our TSM backup. Therefore I'll have see how I can set it up to use more individual threads or use lanfree ... > >> So I started to play a little bit on our cluster, where we have a couple >> of nodes with bonding interfaces. Using netcat I can transfer a file >> of 670MB within 5.9 sek, but this is only the theor. value for >> 1 GBit. > > Yep, that's what I would expect. > >> Using scp I can only see ~45MB/s. (it's a 8 core node, so it's fast >> enough ...). > > Since SCP is not multi-threaded, the transfer speed is limited by how > fast one core can do encryption. Last I checked, there was a huge > speed difference between various ciphers, so you might try a few to > see which is fastest. > > DS Thanks, Peer |
|
#4
|
|||
|
|||
|
On Apr 16, 1:07 am, Peer-Joachim Koch <pk...@bgc-jena.mpg.de> wrote:
> > Most bonding implementations assign a connection to a link, so a > > single connection will not run any faster. > So there is no way to benefit from a trunk for a single application ? No, just no way for a single TCP connection to benefit. The exact details, and possible workarounds, depend on the exact trunking implementation. > Only multi threaded or many individual app. can use the higher bandwith. In some implementations, it must be many different destination MAC addresses. In some it must be multiple destination IP addresses. In some implementations, it depends whether the interface is bridging or routing. There may be a way to configure it to alternate packets out the two links. This may help with your particular problem, but it will cause tremendously reduced performance in some other cases (due to large numbers of out-of-order packets being received). > The main problem is our TSM backup. Therefore I'll have see how I can > set it up to use more individual threads or use lanfree ... See if you can set it up to use more than one TCP connection. If possible, have it use two different destination addresses (both assigned to the same machine). I'm not sure what bonding implementation you are using, but you should definitely check its documentation. DS |
|
#5
|
|||
|
|||
|
Peer-Joachim Koch wrote:
> Hi, > > I'm currently trying to reduce our backup time. We have to transfer > ~250GB/day (50GB-2000GB). > We tried to connect one of the file server using bonding and 2 unused > network interfaces to improve the performance, but we are only seeing > a small improvement (5-10%; averg. is currently 60MB/s). > > So I started to play a little bit on our cluster, where we have a couple > of nodes with bonding interfaces. Using netcat I can transfer a file > of 670MB within 5.9 sek, but this is only the theor. value for > 1 GBit. > > Using scp I can only see ~45MB/s. (it's a 8 core node, so it's fast > enough ...). > > Is there anything to improve this values substantial ? > Is it principal not possible to jumb over the single link speed ? > > Any hints is welcome !! > Maybe not the answer you're looking for, but we had a similar problem (apart from the fact that we could not increase the linespeed between the two locations) and solved it by using two netapp filers. The netapp filers have the ability to sync volumes (snapmirror) on a very regular basis on block-level changes. This way, we can keep two storage arrays (one containing our production database and one backup in a different location) synced without having to burst every day. |
|
#6
|
|||
|
|||
|
Linux bonding does offer the prospect of doing round-robin scheduling
of packets across the links in the bond, but that comes at a price - packet reordering. Get "too much" of that and you start to get spurrious retransmissions and clamping of the congestion window. Also, the packet scheduling algorithms in the bonding code are for transmit only. A backup server is ostensibly a recv-mostly sort of thing. The scheduling of packets for inbound to the backup server would be determined by the algorithms in the switch to which it was connected. You might also sniff the wire to see what sort of window sizes are being used. Also, check the CPU util of _each_ CPU on the server. I'm assuming your filesystem/whatnot can take-in data >> 60 MB/s? Does the side sending the data to the server report any TCP retransmissions? rick jones -- The glass is neither half-empty nor half-full. The glass has a leak. The real question is "Can it be patched?" these opinions are mine, all mine; HP might not want them anyway... ![]() feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH... |
|
#7
|
|||
|
|||
|
Hi,
thanks for all the answers. To give a more general overview: We are running a GFS (StorNEXT) as "normal" file system. One server is working also as TSM client. 8 file systems are defined, each is able to deliver ~60-100MB/s in avereage. The file space is ~60TB with 22Million files. 2 dedicated NIC's (intel e1000) are used in a dedicated vlan and configured using bionding without any further settings. This link is connected to our TSM Server (AIX 8 core) which is also connected to this vlan using a dedicated network interface. In the moment we do not have many traffic on the file system, therefore I can not measure many things. What I currently trying after reading all the posts is to configure the client to use not only one task (and streams), but to use 2-4 streams. When everything is working corrently, it *might* split 2 streams on each NIC .... However tuning this parameter on the client nearly dropped the backup time by a factor of 2 ! But I need more transfer, to see, if the usage of the Nic's is improved. Maybe we have to try LANfree backup ... Bye, Peer Rick Jones schrieb: > Linux bonding does offer the prospect of doing round-robin scheduling > of packets across the links in the bond, but that comes at a price - > packet reordering. Get "too much" of that and you start to get > spurrious retransmissions and clamping of the congestion window. > > Also, the packet scheduling algorithms in the bonding code are for > transmit only. A backup server is ostensibly a recv-mostly sort of > thing. The scheduling of packets for inbound to the backup server > would be determined by the algorithms in the switch to which it was > connected. > > You might also sniff the wire to see what sort of window sizes are > being used. Also, check the CPU util of _each_ CPU on the server. > I'm assuming your filesystem/whatnot can take-in data >> 60 MB/s? Does > the side sending the data to the server report any TCP > retransmissions? > > rick jones |
|
#8
|
|||
|
|||
|
On Mon, 21 Apr 2008 10:26:54 +0200, Peer-Joachim Koch wrote:
> Hi, > > thanks for all the answers. > > To give a more general overview: > > We are running a GFS (StorNEXT) as "normal" file system. One server is > working also as TSM client. 8 file systems are defined, each is able to > deliver ~60-100MB/s in avereage. The file space is ~60TB with 22Million > files. > > 2 dedicated NIC's (intel e1000) are used in a dedicated vlan and > configured using bionding without any further settings. > > This link is connected to our TSM Server (AIX 8 core) which is also > connected to this vlan using a dedicated network interface. In the > moment we do not have many traffic on the file system, therefore I can > not measure many things. > > What I currently trying after reading all the posts is to configure the > client to use not only one task (and streams), but to use 2-4 streams. > When everything is working corrently, it *might* split 2 streams on each > NIC .... > > However tuning this parameter on the client nearly dropped the backup > time by a factor of 2 ! > > But I need more transfer, to see, if the usage of the Nic's is improved. > > Maybe we have to try LANfree backup ... > > Bye, Peer > > Rick Jones schrieb: >> Linux bonding does offer the prospect of doing round-robin scheduling >> of packets across the links in the bond, but that comes at a price - >> packet reordering. Get "too much" of that and you start to get >> spurrious retransmissions and clamping of the congestion window. >> >> Also, the packet scheduling algorithms in the bonding code are for >> transmit only. A backup server is ostensibly a recv-mostly sort of >> thing. The scheduling of packets for inbound to the backup server >> would be determined by the algorithms in the switch to which it was >> connected. >> >> You might also sniff the wire to see what sort of window sizes are >> being used. Also, check the CPU util of _each_ CPU on the server. I'm >> assuming your filesystem/whatnot can take-in data >> 60 MB/s? Does the >> side sending the data to the server report any TCP retransmissions? >> >> rick jones 1) Enable jumbo frames on your gigabit NIC's and all network equipment between them, if you haven't already. http://stromberg.dnsalias.org/ ~strombrg/jumbo.html Path MTU Discovery would probably be a good idea too. 2) Use a protocol that will allow you to use large block sizes to reduce the CPU needs. http://stromberg.dnsalias.org/~dstromberg/protocol- comparison.html If you must use openssh, patch it for performance. Also, blowfish tends to be a good encryption algorithm for performance, despite what Schneier says about it not being sufficiently vetted yet (he's the author of the algorithm - and I haven't read cryptogram in a while, so maybe he feels it's solid by now). 3) Use rsync to reduce the amount of data you're pushing again and again: http://stromberg.dnsalias.org/~strom...up.remote.html It'll turn your series of fullsaves and incrementals into what appears to be one fullsave and many incrementals from the perspective of network performance and disk use (except the inode use will be high). |
|
#9
|
|||
|
|||
|
Dan Stromberg <(E-Mail Removed)> wrote:
> 1) Enable jumbo frames on your gigabit NIC's and all network > equipment between them, if you haven't already. > http://stromberg.dnsalias.org/ ~strombrg/jumbo.html Keeping in mind there are still switches out there (older ones at least) which do not support JF, and since JF is not a de jure standard, there can be NICs (and switches) for which the definition of "JumboFrame" is something other than the 9000 byte MTU "de facto" standard initiated (IIRC) by Alteon. If the switch does not support JF, enabling JF on the NICs will result in odd losses of connectivity, with stuff like telnet/ssh _mostly_ working and stuff like FTP and perhaps HTTP not working well at all. Even if the switch supports JF, unless one enables it across the _entire_ broadcast domain, UDP traffic for stuff like NFS can be fubared - the NIC with JF support will fragment to JF sizes, which will arrive at the NIC without JF and be dropped. Don't assume that just because something like an FTP or netperf TCP_STREAM works that all is OK - the TCP MSS exchange at the beginning of a TCP connection will result in the smaller MSS being used, masking the MTU mismatch. I _like_ JF, but it isn't a pancea. Also, as more and more NICs support LRO (Large Receive Offload) in addition to TSO (Transport Segmentation Offload) the benefit to JF becomes reduced. rick jones -- web2.0 n, the dot.com reunion tour... these opinions are mine, all mine; HP might not want them anyway... ![]() feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH... |
![]() |
| Tags |
| bonding, gigabit, improve, performance |
| Thread Tools | |
| Display Modes | |
|
|