[NCLUG] A *nix DFS alternative?

DJ Eshelman djsbignews at gmail.com
Tue Feb 16 11:38:15 MST 2010


I think I may have posed this question before but I'm still having 
trouble believing Microsoft has the only solution.

Here's the situation:

I want to start a business for my wife that will service professional 
photographers and others that want to have photos professionally 
retouched and, hopefully also sold as a storage solution (we'll get to 
that later- Storage as a Service for photographers and AV studios that 
have high storage needs but low budgets- not exactly IronMountain's 
niche market).

We want to have the main server (facing clients) at my office where we 
have bandwidth to spare and can handle upwards of 20 Mbit/sec transfers, 
not to mention be on a good Xen (or maybe ESXi) server, so I can sell 
this reliably and scalably down the road.

What I want at home is twofold- both the ability to have near-immediate 
bit-level sync over a VPN (preferably with good compression as RAW 
photographs tend to be quite bulky), and the ability to work directly 
from this server at home independently of the main server at the 
office.  It's a branch server, in a sense, but with different 
permissions/user accounts and completely isolated file storage.  This 
gives a good level of backup/redundancy as I can just do delta backups.

Now, in the Windows Server world I can accomplish this with a fair 
amount of decency, but I'm required to:
1)  Pay licensing for two servers (booo!), and for the backup solution 
because Windows NT Backup sucks.
2)  Have both servers part of the same domain structure, which I don't 
need (booo!)
3)  It's CIFS/NTFS based which while it has gotten better, still kind of 
sucks in comparison to modern FS's
4)  Maintain a separate Linux front end website, which would SUCK to 
have to integrate.  There's no way I'd want to have an IIS web server 
facing the public- that's just asking for trouble.

However- with Windows Server, I'd have a fairly easy way to do this:
DFS exists in a common namespace (//domain.local/namespace/sharename) 
and can synchronize with a variety of options including time and 
bandwidth throttling- both of which I want to have in this case, as 
bogging down our connection mid-day is pretty much unacceptable, yet I 
still need files to sync during the day.
What's more it's built around the branch office concept, so there's 
built-in shadow-copy based deltas (that means you can go back on either 
server- the shadow copies are set aside and the new bits of the file are 
written- so you can easily go back to a file's old data without having 
to restore a backup).
The main thing is that it uses a level of compression for the file 
deltas- making it a very efficient across high-latency links.

Of course the default answer is to use rsync or complex HA solutions- 
unfortunately neither of these will work well in this case.  I have 
searched high and low for bit-level (delta) solutions and if they are 
out there, they are lost in a sea of dead projects, and so far the only 
solutions supporting immediate sync are HA solutions that will not work 
in this case because both servers need to access the data simultaneously 
in different locations.

So I need:
1)  Bit-Level sync (delta- changes only, not the whole file every single 
time)
2)  Automatic sync (the files arrive or change and immediately being to 
synchronize)
3)  Very low overhead AND scalable (we're talking about storage that 
could grow to several terabytes in a matter of a few years if we're 
successful in this)

I'd LIKE (but don't necessarily need)
1)  Independent permissions structure but access to the same files
2)  Built-in compression at a higher level than IPSEC - I know there are 
a few projects out there but I'm not sure how battle-tested they are
3)  WAN optimization (sending the all the TCP data without waiting for 
an ack, just getting a checksum at the end of the stream)

This whole solution would also apply to several office/branch office 
situations I deal with daily, though their real-time data needs would be 
much higher.

Any ideas?




More information about the NCLUG mailing list