[NCLUG] Question about IP forwarding

Mon Apr 26 13:10:28 MDT 2010

My company has a Linux cluster which I have recently rebuilt with Fedora 10 and OpenSharedRoot. The cluster consists of an NFS server which holds the cluster root and allows the individual diskless worker nodes to PXE boot Fedora. The NFS server has two NICs: one NIC connects to our LAN (10.50.x.x/16) while the other NIC connects to a private switched LAN where all the worker nodes are connected (192.168.234.x/24).

What I would like to be able to do is to make it so people on the LAN
can access the worker nodes directly by the 192.168.234.x addresses. My first attempt at this was to enable IP forwarding on the cluster's NFS server and then go to individual workstations and set up a static route to the 192.168.234.x, using the NFS server as the gateway, i.e, from a Linux workstation I would type:

  route add -net 192.168.235.0 netmask 255.255.255.0 gw 10.50.2.10

This works like a charm. I can now SSH into the individual nodes that are behind the NFS server from that particular workstation.

The problem has been to scale this beyond one workstation. On our LAN, we have a Cisco ASA 5510 at 10.50.0.1 -- all our workstations have that as their default gateway. I figured it would be a simple matter to set up a static route for the 192.168.234.x subnet on the Cisco device and that the Cisco would receive packets for the 192.168.235.x subnet, pass them on to my NFS server, and all would be good. Only that it isn't. I'm now able to ping the worker nodes just fine, but SSH sessions hang before even reaching the login prompt.

I did some wiresharking on the NFS server and noticed that the first SSH packet appeared to come in fine, but wireshark colored subsequent ones red and black and seemed to say something or other about sequence numbers being fubar. I don't know enough to troubleshoot it past this. One of my ideas was that packets *into* the private network were being sent to the Cisco and forwarded to the NFS server, but return packets were bypassing the Cisco and being delivered directly from the NFS server to the workstations (because both the NFS server and the workstations have NICs in the 10.50.x.x subnet). I don't know enough about TCP/IP to know if this would be the problem, although I have an inkling that I may be incorrect in my assumption that I can have two devices acting as routers on the same network, and have one router "hand off" packets to another.

So, my question is, am what I am trying to do feasible and am I approching it the correct way?

-- Marcio