Dear All,
I have a networking question, and was hoping someone on the group
could help. I have a Linux computer cluster where one
node acts as the front end to the world, and has two ethernet
cards - one to communicate with the external world, via a public
IP address, and one to communicate with an internal network
(192.168.0.x) of a dozen linux PCs, all sharing one 100 Mbps
ethernet switch.
Now, each of these dozen internal PCs acts as an independent
computing node (they are dual Xeons), running intensive calculations
on its two internal CPUs (using LAM-MPI) with little or no
communications to the master node (or to the other nodes)
through the switch.
Each of these internal nodes has (or could have) a second
gigabit card.
Here is the question: I would like to double up each of these internal
nodes, so that each one now is paired via the gigabit card to
a new twin, with which it communicates as fast as possible.
In practice, I'd like to go from a cluster of 12 2-processor nodes,
that do not communicate much between each other, to a cluster of
12 (2-processor+2-processor) nodes, where each (2-proc+2-proc) node
needs to do a lot of internal communications over LAM-MPI and the
gigabit ports, but very little communication over the switch and to the outer
world.
The naive way in which I would set this up is for each of the original
12 dual-Xeon nodes to keep the slow ethernet card set up with the 192.168.0.x
address, and have the fast gigabit card set up with a 10.0.0.1 address.
Each twin would have only a gigabit card, set up with a 10.0.0.2 address,
and would be linked with a direct cable to its networked twin (I
understand that no crossover cable is needed for peer-to-peer gigabit connections).
The advantage of this set up with respect to a 24-port gigabit switch is
that I only need fast communications between two 2-cpu nodes; in addition,
the communication is very intensive, and I'm pretty sure that a 24-port
gigabit switch would be greatly overwhelemed.
Does this make sense ? (I guess so, but I wonder if it would be really
gigabit-efficient, and in particular if there are special precautions
to set up so that LAM-MPI on one of 2-cpu computers knows that it should only
communicate to the other 2-cpu partner via the 10.0.0. direct route).
Thanks for your help,
nicola marzari
|