"prg" <(E-Mail Removed)> ha scritto nel messaggio
news:(E-Mail Removed) oups.com...
First of all thank you prs for your time! You have been very kind.
> Have you investigated which clients/apps are making "so many"
> connections? Are you sure this _is_ "too many"?
I'm not sure that it is a problem, I'm just investigating on some strange
situations
Sometimes a node of the cluster half-freezes and if I already have a shell
open I can resolve the situation, otherwise
I have to do a hard reboot! I can't login neither from the terminal console
nor from remote with ssh. I can not do ps, top and 'ls -l'. If I do kill -9
$PID_SLAPD
everything goes fine and the node unlocks and then I can restart slapd.
I read that slapd (I have an old 2.0 that is currently in Debian woody
stable) uses the select() call that has some limitation on FD_SETSIZE fixed
to 1024. If the connections in TIME_WAIT are not (for a programming error or
other reason) removed from the current fd list then this FD_SETSIZE is less
than the real length of the queue and the slapd silently cuts the
connections.
I think that the nsswitch has the same problem. In fact when I do 'ls -l' it
has to resolve owner and group and it looks in ldap that is not responding.
The server has a complex configuration, unfortunately I have inherited this
configuration from the previous administrator that is no more working with
us.
Note: the server has worked for months (more than 10!) in the meantime our
accounts are grown and now they are more the 3000. May be this
growing number has overpassed an known threshold and this is creating
problems? I have 6-10 new users/day.
> You can look at $ man tcp for some sysctrl values you can set. Eg.,
> tcp_max_tw_buckets may be useful for testing _before_ going to the
> trouble of kernel recompile. Other "quick close" settings may violate
> TCP rfcs in some respects, IIRC.
You are right, I have looked other parameters like tcp_max_tw_buckets but it
has a minimum value that seams better not to change, on the documantation is
written to enlarge this value, but not to reduce it.
I'm just trying to understand IF the time-waiting connection could be a
problem for my slapd.
Now I can't recompile the kernel nor update the software, I just have to
find what is not configured well .
> Note that the default behavior adjusts according to available memory.
I have to investigate on the memory availability.For the time being, I think
total memory is something like 2GB (but I'm not totaly sure).
If you want I can provide you some significant net parameters of my
configuration (eg. "sysctrl -A|grep net" etc.)
> In my RH box here (an old box) /usr/src/linux-xxx/include/net/tcp.h
> shows:
> #define TCP_TIMEWAIT_LEN (60*HZ)
> /* how long to wait to destroy TIME-WAIT * state, about 60 seconds */
>
> TIME_WAIT state would normally be set on the client. Do you have a
> client that is "relaying" connection requests?
I have several clients like pop3d, imapd, lmtp, local and remote client
like webserver and win2k pc. I don't know if they could relay connection
requests.
There are also some apache reverse-proxy, but I have to check.
> Anyway, I would be skeptical that your current numbers are "too many".
I'm skeptical too. The strange thing is that the old kernel 2.2 has a
configurable parameter for the time_wait length. Bsd and solaris have also a
configurable parameter but the kernel 2.4 has no such parameter and it is
hardcoded in tcp.h ... it seams weird, doesn't it?
Thanks for you help
Maurizio
|