cannot ssh from master node to slave node in cluster

Discussion in 'Linux Networking' started by Yeo, Sep 9, 2003.

  1. Yeo

    Yeo Guest

    PROBLEM
    -------
    I was able to ssh from the master node (server) to a slave node
    (oscarnode1) previously. That means ssh was running properly
    previously. But now, I can't login to oscarnode1. I tried ping and it
    works perfectly. Meanwhile, I can still ssh from server to all other
    nodes successfully.

    How this happens is that I am running MPI programs (using MPICH) from
    the master node on the slave nodes using "mpirun -np 8 allgatherv2".
    Everything works fine until my program appear to consume too much on
    oscarnode1 (as seen from the /var/log/messages), forcing oscarnode1 to
    'die'. Now, I can't
    1) "ssh oscarnode1"
    2) "mpirun -np 8 allgatherv2" (possibly caused by the 'dead'
    oscarnode1)
    output from /var/log/messages:
    -----------
    Sep 7 20:19:09 oscarnode1 sshd(pam_unix)[2000]: session opened for
    user csyeo by (uid=0)
    Sep 7 20:20:56 oscarnode1 sshd(pam_unix)[2018]: session opened for
    user csyeo by (uid=0)
    Sep 7 20:21:17 oscarnode1 kernel: Out of Memory: Killed process
    2001(allgatherv2).
    Sep 7 20:21:28 oscarnode1 sshd(pam_unix)[2000]: session closed for
    user csyeo
    Sep 7 20:21:42 oscarnode1 sshd(pam_unix)[2082]: session opened for
    user csyeo by (uid=0)
    -----------


    Now, I try running ssh with -v option to debug, it is not responding
    after showing the below:
    -----------
    [[email protected] csyeo]$ ssh -v oscarnode1
    OpenSSH_3.1p1, SSH protocols 1.5/2.0, OpenSSL 0x0090602f
    debug1: Reading configuration data /etc/ssh/ssh_config
    debug1: Applying options for *
    debug1: Rhosts Authentication disabled, originating port will not be
    trusted.
    debug1: restore_uid
    debug1: ssh_connect: getuid 508 geteuid 0 anon 1
    debug1: Connecting to oscarnode1 [192.168.1.1] port 22.
    debug1: temporarily_use_uid: 508/100 (e=0)
    debug1: restore_uid
    debug1: temporarily_use_uid: 508/100 (e=0)
    debug1: restore_uid
    debug1: Connection established.
    debug1: read PEM private key done: type DSA
    debug1: read PEM private key done: type RSA
    debug1: identity file /home/csyeo/.ssh/identity type 0
    debug1: identity file /home/csyeo/.ssh/id_rsa type 1
    debug1: identity file /home/csyeo/.ssh/id_dsa type 2
    [not responding --- hang]
    -----------


    A successful ssh to another slave node oscarnode2:
    -----------
    [[email protected] csyeo]$ ssh -v oscarnode2
    OpenSSH_3.1p1, SSH protocols 1.5/2.0, OpenSSL 0x0090602f
    debug1: Reading configuration data /etc/ssh/ssh_config
    debug1: Applying options for *
    debug1: Rhosts Authentication disabled, originating port will not be
    trusted.
    debug1: restore_uid
    debug1: ssh_connect: getuid 508 geteuid 0 anon 1
    debug1: Connecting to oscarnode2 [192.168.1.2] port 22.
    debug1: temporarily_use_uid: 508/100 (e=0)
    debug1: restore_uid
    debug1: temporarily_use_uid: 508/100 (e=0)
    debug1: restore_uid
    debug1: Connection established.
    debug1: read PEM private key done: type DSA
    debug1: read PEM private key done: type RSA
    debug1: identity file /home/csyeo/.ssh/identity type 0
    debug1: identity file /home/csyeo/.ssh/id_rsa type 1
    debug1: identity file /home/csyeo/.ssh/id_dsa type 2
    debug1: Remote protocol version 1.99, remote software version
    OpenSSH_3.1p1
    debug1: match: OpenSSH_3.1p1 pat OpenSSH*
    Enabling compatibility mode for protocol 2.0
    debug1: Local version string SSH-2.0-OpenSSH_3.1p1
    debug1: SSH2_MSG_KEXINIT sent
    debug1: SSH2_MSG_KEXINIT received


    OS ver for all the nodes:
    Red Hat Linux release 7.3 (Valhalla)


    ssh ver for all the nodes:
    OpenSSH_3.1p1, SSH protocols 1.5/2.0, OpenSSL 0x0090602f


    Many Thanks,
    Yeo
     
    Yeo, Sep 9, 2003
    #1
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.