Networking Forums

Networking Forums > Computer Networking > Linux Networking > RH Linux equivalent to Solaris ndd command (for tcp_deferred_acks_max and tcp_naglim_def)

Reply
Thread Tools Display Modes

RH Linux equivalent to Solaris ndd command (for tcp_deferred_acks_max and tcp_naglim_def)

 
 
antoine
Guest
Posts: n/a

 
      01-19-2007, 06:33 AM
hello,

I'm migrating one system from solaris to RH linux enterprise, and I'm
looking at setting up the same network performance optimizations that I
have on my solaris box.

performance optimization is about reducing latency to the max, I'm
fighting for every single millisecond.

on solaris, I've done the 2 following things:

1. disable nagle algorithm
2. setting tcp deferred acks max to 0

to do so, I use the "ndd" command on solaris, but I couldn't find this
command on RH.

ndd -set /dev/tcp tcp_naglim_def 1
ndd -set /dev/tcp tcp_deferred_acks_max 0

does anyone know I can achieve this on RH Linux ?
also, what would be the command to check the current values / status of
such parameters ?

thank you for your help !

-Antoine

 
Reply With Quote
 
 
 
 
Paul Colquhoun
Guest
Posts: n/a

 
      01-19-2007, 12:09 PM
On 18 Jan 2007 23:33:59 -0800, antoine <(E-Mail Removed)> wrote:
| hello,
|
| I'm migrating one system from solaris to RH linux enterprise, and I'm
| looking at setting up the same network performance optimizations that I
| have on my solaris box.
|
| performance optimization is about reducing latency to the max, I'm
| fighting for every single millisecond.
|
| on solaris, I've done the 2 following things:
|
| 1. disable nagle algorithm
| 2. setting tcp deferred acks max to 0
|
| to do so, I use the "ndd" command on solaris, but I couldn't find this
| command on RH.
|
| ndd -set /dev/tcp tcp_naglim_def 1
| ndd -set /dev/tcp tcp_deferred_acks_max 0
|
| does anyone know I can achieve this on RH Linux ?
| also, what would be the command to check the current values / status of
| such parameters ?
|
| thank you for your help !


For quite a few parameters, you can write values directly into
/proc/sys/net/ipv4/*

To get the parameters set automatically at every boot, look for
/etc/sysctl.conf

For some parameters, you may need to recompile the kernel with
appropriate settings.


--
Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/~paulcol
Asking for technical help in newsgroups? Read this first:
http://catb.org/~esr/faqs/smart-questions.html#intro
 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      01-19-2007, 09:29 PM
antoine <(E-Mail Removed)> wrote:
> I'm migrating one system from solaris to RH linux enterprise,


Very good

> and I'm looking at setting up the same network performance
> optimizations that I have on my solaris box.


> performance optimization is about reducing latency to the max, I'm
> fighting for every single millisecond.


What did you do to get past the no better than 8000 transactions per
second one gets by default on Solaris (well, 10 at least) when running
netperf TCP_RR?

> on solaris, I've done the 2 following things:


> 1. disable nagle algorithm
> 2. setting tcp deferred acks max to 0


Exactly how does your application behave? Does it ever have more than
one "transaction" in flight at a time? By how much did those things
affect your measured latency? How big are the requests and responses
for your transactions?

> to do so, I use the "ndd" command on solaris, but I couldn't find
> this command on RH.


> ndd -set /dev/tcp tcp_naglim_def 1
> ndd -set /dev/tcp tcp_deferred_acks_max 0


> does anyone know I can achieve this on RH Linux ?


> also, what would be the command to check the current values / status of
> such parameters ?


sysctl

You may also need/want to experiment with ethtool interrupt coalescing
settings:

ftp://ftp.cup.hp.com/dist/networking...cy_vs_tput.txt

rick jones
--
firebug n, the idiot who tosses a lit cigarette out his car window
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
antoine
Guest
Posts: n/a

 
      01-23-2007, 03:46 AM
I've found this page:
http://www-128.ibm.com/developerwork.../l-hisock.html

that describes the way to disable nagle algorithm at the application
level, but I don't have access to this application code.
my only solution (that works well with solaris) is to do a change at
the system level.

however I still couldn't find the equivalent of disabling nagle
algorithm nor setting deferred acks to 0.

the only thing I could find is the

/proc/sys/net/ipv4/tcp_low_latency

flag that is set to "0" by default.

it certainly "sounds" interesting to me, but I can't find anything on
the net neither that would explain EXACTLY what it does...

any insight ?





Rick Jones a écrit :
> antoine <(E-Mail Removed)> wrote:
> > I'm migrating one system from solaris to RH linux enterprise,

>
> Very good
>
> > and I'm looking at setting up the same network performance
> > optimizations that I have on my solaris box.

>
> > performance optimization is about reducing latency to the max, I'm
> > fighting for every single millisecond.

>
> What did you do to get past the no better than 8000 transactions per
> second one gets by default on Solaris (well, 10 at least) when running
> netperf TCP_RR?
>
> > on solaris, I've done the 2 following things:

>
> > 1. disable nagle algorithm
> > 2. setting tcp deferred acks max to 0

>
> Exactly how does your application behave? Does it ever have more than
> one "transaction" in flight at a time? By how much did those things
> affect your measured latency? How big are the requests and responses
> for your transactions?
>
> > to do so, I use the "ndd" command on solaris, but I couldn't find
> > this command on RH.

>
> > ndd -set /dev/tcp tcp_naglim_def 1
> > ndd -set /dev/tcp tcp_deferred_acks_max 0

>
> > does anyone know I can achieve this on RH Linux ?

>
> > also, what would be the command to check the current values / status of
> > such parameters ?

>
> sysctl
>
> You may also need/want to experiment with ethtool interrupt coalescing
> settings:
>
> ftp://ftp.cup.hp.com/dist/networking...cy_vs_tput.txt
>
> rick jones
> --
> firebug n, the idiot who tosses a lit cigarette out his car window
> these opinions are mine, all mine; HP might not want them anyway...
> feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...


 
Reply With Quote
 
Michael Heiming
Guest
Posts: n/a

 
      01-23-2007, 10:02 PM
In comp.os.linux.networking antoine <(E-Mail Removed)>:
[..]

> the only thing I could find is the


> /proc/sys/net/ipv4/tcp_low_latency


> flag that is set to "0" by default.


> it certainly "sounds" interesting to me, but I can't find anything on
> the net neither that would explain EXACTLY what it does...


> any insight ?


The answer is directly in the kernel source Documentation:

Documentation/networking/ip-sysctl.txt

--
Michael Heiming (X-PGP-Sig > GPG-Key ID: EDD27B94)
mail: echo (E-Mail Removed) | perl -pe 'y/a-z/n-za-m/'
#bofh excuse 273: The cord jumped over and hit the power switch.
 
Reply With Quote
 
antoine
Guest
Posts: n/a

 
      01-23-2007, 11:22 PM
that's right, that's the information I got:

tcp_low_latency - BOOLEAN
If set, the TCP stack makes decisions that prefer lower
latency as opposed to higher throughput. By default, this
option is not set meaning that higher throughput is preferred.
An example of an application where this default should be
changed would be a Beowulf compute cluster.
Default: 0

but still, it does not tell me which kind of "decisions" are made by
the TCP stack !
I still don't know anything about nagle A. or other :-(
of course I will test this flag, but it's a bit obscure...

-Antoine




On 24 jan, 08:02, Michael Heiming <michael+USE...@www.heiming.de>
wrote:
> In comp.os.linux.networking antoine <antoinedu...@hotmail.com>:
> [..]
>
> > the only thing I could find is the
> > /proc/sys/net/ipv4/tcp_low_latency
> > flag that is set to "0" by default.
> > it certainly "sounds" interesting to me, but I can't find anything on
> > the net neither that would explain EXACTLY what it does...
> > any insight ?The answer is directly in the kernel source Documentation:

>
> Documentation/networking/ip-sysctl.txt
>
> --
> Michael Heiming (X-PGP-Sig > GPG-Key ID: EDD27B94)
> mail: echo zvpu...@urvzvat.qr | perl -pe 'y/a-z/n-za-m/'
> #bofh excuse 273: The cord jumped over and hit the power switch.


 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      01-24-2007, 12:10 AM
antoine <(E-Mail Removed)> wrote:
> I've found this page:
> http://www-128.ibm.com/developerwork.../l-hisock.html


> that describes the way to disable nagle algorithm at the application
> level, but I don't have access to this application code.


So someone else is providing you with Solaris and Linux binaries I
presume?

> my only solution (that works well with solaris) is to do a change at
> the system level.


My intuition may be oversaturated with 18 years of experience, but it
is beginning to sound like you are having to kludge around a poorly
written application. It isn't perhaps trying to write requests or
responses to the socket in multiple write calls is it? A truss
(solaris) or strace (linux) of the application, perhaps combined with
a tcpdump trace could be very helpful there...

> however I still couldn't find the equivalent of disabling nagle
> algorithm nor setting deferred acks to 0.


> the only thing I could find is the


> /proc/sys/net/ipv4/tcp_low_latency


> flag that is set to "0" by default.


> it certainly "sounds" interesting to me, but I can't find anything on
> the net neither that would explain EXACTLY what it does...


That would be the source code

> any insight ?


I think someone else has already pointed at the sysctl text. From
what _little_ I know about it, if the application you are running is
indeed busted wrt writing logically associated data in separate send
calls, that setting may not help much. I think it may cause some
stuff to not be deferred until the user's context comes into play.


> Rick Jones a ecrit :
>> What did you do to get past the no better than 8000 transactions
>> per second one gets by default on Solaris (well, 10 at least) when
>> running netperf TCP_RR?


I'm still curious about that one

rick jones
--
a wide gulf separates "what if" from "if only"
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
antoine
Guest
Posts: n/a

 
      01-24-2007, 12:23 AM
> So someone else is providing you with Solaris and Linux binaries I
> presume?


Yes, this is a vendor solution that we've purchased. a stock trading
engine that is receiving order requests from my own in-house trading
application: socket connection, simple text-based API. I have done all
network optimization at the software level - through java - on MY app.
the order engine is then connected to another "system" in the order
sending process (there are a few like this one after the other before
it goes to the market).
the goal is of course to minimize the time it takes to reach the
market...
on solaris, optimizing my app at software level, and changing the
parameters on the server improved performances quite well, but now that
we're migrating to linux (the server), I'm back with not as good
performances, and an intuition the network optimization has at least a
little to do with it...

> > my only solution (that works well with solaris) is to do a change at
> > the system level.

> My intuition may be oversaturated with 18 years of experience, but it
> is beginning to sound like you are having to kludge around a poorly
> written application. It isn't perhaps trying to write requests or
> responses to the socket in multiple write calls is it? A truss
> (solaris) or strace (linux) of the application, perhaps combined with
> a tcpdump trace could be very helpful there...


I think it's exactly the way the app is working, but I don't have any
way to modify it, and I BELIEVE it's more a feature than a bug:
- the client app is sending an order request
- the server is FIRST replying with a message that says "I've heard
you, I'm going to handle the request"
- the server then sends another reply saying something like "I've
handled your request here, it's sent somewhere else"

in the same way, when there are executions on an order, each execution
message might not be sent "as soon as available"...

ALSO, as I'm very often sending several requests at the same time, the
server is replying to each request in different messages, that are
aggregated by Nagle algorithm, so that the FIRST reply is delayed (it's
waiting for more data to send), unless I disable Nagle...

> > it certainly "sounds" interesting to me, but I can't find anything on
> > the net neither that would explain EXACTLY what it does...

> That would be the source code

right :-)

> >> What did you do to get past the no better than 8000 transactions
> >> per second one gets by default on Solaris (well, 10 at least) when
> >> running netperf TCP_RR?

>I'm still curious about that one


well, I didn't know about that limitation, and I'm not sure I
understand correctly, but I doubt I reach 8000 transaction per
second...

-Antoine

 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      01-24-2007, 02:03 AM
antoine <(E-Mail Removed)> wrote:
>> So someone else is providing you with Solaris and Linux binaries I
>> presume?


> Yes, this is a vendor solution that we've purchased. a stock trading
> engine that is receiving order requests from my own in-house trading
> application: socket connection, simple text-based API. I have done
> all network optimization at the software level - through java - on
> MY app.


Java... So, your app is the client yes?

> the order engine is then connected to another "system" in the order
> sending process (there are a few like this one after the other
> before it goes to the market).


> the goal is of course to minimize the time it takes to reach the
> market...


> on solaris, optimizing my app at software level, and changing the
> parameters on the server improved performances quite well, but now
> that we're migrating to linux (the server), I'm back with not as
> good performances, and an intuition the network optimization has at
> least a little to do with it...


>> My intuition may be oversaturated with 18 years of experience, but
>> it is beginning to sound like you are having to kludge around a
>> poorly written application. It isn't perhaps trying to write
>> requests or responses to the socket in multiple write calls is it?
>> A truss (solaris) or strace (linux) of the application, perhaps
>> combined with a tcpdump trace could be very helpful there...


> I think it's exactly the way the app is working, but I don't have
> any way to modify it, and I BELIEVE it's more a feature than a bug:
> - the client app is sending an order request
> - the server is FIRST replying with a message that says "I've heard
> you, I'm going to handle the request"
> - the server then sends another reply saying something like "I've
> handled your request here, it's sent somewhere else"


Well, I've often said that "99 times out of 10" setting TCP_NODELAY is
a kludge, but if the above is accurate, it would be one of those 100th
out of 10 situations.

I was more concerned with say at step two - the server saying "I hear
you" that the server wasn't sending that message with more than one
send call.

> in the same way, when there are executions on an order, each
> execution message might not be sent "as soon as available"...


> ALSO, as I'm very often sending several requests at the same time,
> the server is replying to each request in different messages, that
> are aggregated by Nagle algorithm, so that the FIRST reply is
> delayed (it's waiting for more data to send), unless I disable
> Nagle...


I think your understanding of Nagle is a little off. It is supposed
to work this way:

1) is this user's send(), plus any queued, unsent data >= the MSS for
the connection? if yes, send immediately (modulo things like
congestion window). if no, go to step 2

2) Is this connection otherwise idle - do we have no unACKed data
outstanding to the other side? if yes, send immediately (again modulo
stuff like congestion window) otherwise go to step 3

3) wait for either

a) more sends from the user to get >= MSS
b) ACK's from the remote to make the connection "idle"
c) the retransmission timer to expire

This suggests that the first reply from the server will not be
delayed, it is the second reply from the server which will be delayed.

Soooo, if I've understood correctly, a single transaction at the app
level would look like:

Client Server

Request ->
<- Server "I hear you"
<- Server "response"

Now, in a mostly perfect world, that would be the same picture at the
TCP level:

Client Server

Request + TCP ACK of previous Server stuff ->
<- Server "I hear you" + TCP ACK of Request piggyback
<- Server "response" + TCP ACK of Request piggyback

But with Nagle enabled that server response is indeed going to be
delayed awaiting the delayed ACK from the Clien't TCP Stack:

Client Server

Request + TCP ACK of previous Server stuff ->
<- Server "I hear you" + TCP ACK of Request piggyback

TCP ACK of IHY ->
<- Server "response" + TCP ACK of Request piggyback

However, if you have set the deferred ack max to zero on both ends and
it enables immediate ACK like I think it does, what you are really
going to see on the wire is:

Client Server
ACK of prev server data ->
Request ->
<- TCP ACK of Req
<- Server IHY
ACK of IHY ->
<- Server Response
ACK of Server Rsp

So, what would have been simply three segments on the wire is actually
6 segments on the wire, and in broad handwaving terms since all of
those are small, the TCP level CPU utilization is 2X what it would
have otherwise been.

If you just set TCP_NODELAY (disable Nagle) on both ends it should
become a three segment exchange. I am surprised that it was necessary
to set both naglem_def and deferred_ack.

Given the behaviour you have described, the server software vendor
does indeed have a bug if they offer no way to set TCP_NODELAY on the
connection. Hold their feet to the fire to get one.

Meanwhile, are the message sizes pretty much predictable? Notice that
the Nagle algorithm takes the connection MSS (Maximum Segment Size)
into consideration. If the server response is the larger of the
messages, you could set the PathMTU on the server to be that plus 20
or so bytes and then the send of server response would indeed be >=
MSS and would go out immediately even without setting TCP_NODELAY.

Now, if the server app vendor drags their feet (normally the issue is
getting those sorts of folks to _remove_ a bogus TCP_NODELAY setting
one could in theory write a small shim library which intercepted
say the accept() or connect() call (depending on which way the server
worked) and added a setsockopt() call to set TCP_NODELAY.

Now, since you mention multiple requests outstanding at a time, keep
in mind that setting TCP_NODELAY is an explicit tradeof of perceived
latency against aggregate throughput. The sematics of stock trading
may require that I suppose.

rick jones
--
Wisdom Teeth are impacted, people are affected by the effects of events.
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      01-24-2007, 02:04 AM
Might be good to see tcpdump traces of both configs - Solaris and
Linux.

rick jones
--
Wisdom Teeth are impacted, people are affected by the effects of events.
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Linux equivalent of packeteer? Daz Linux Networking 0 05-10-2008 11:41 AM
Windows equivalent of Linux's 'iwlist scan' command Larry Finger Wireless Internet 5 01-27-2007 08:08 PM
Do I need to use Different Serial cable ( RS232 ) for ( Linux to Solaris ) and ( Linux to Windows ) nurxb01@gmail.com Linux Networking 5 04-20-2006 12:37 AM
Linux equivalent to ipconfig /all IANAL_VISTA Linux Networking 6 11-21-2004 09:05 PM
Solaris ndd command equivalent in Linux Rao Kosaraju Linux Networking 1 12-10-2003 09:10 AM



1 2 3 4 5 6 7 8 9 10 11