[NCLUG] A *nix DFS alternative?

John Gilmore j.arthur.gilmore at gmail.com
Sun Feb 21 18:10:12 MST 2010


One of the problems mentioned with git is that it munges binary files
when it doesn't realize the file is binary by attempting to merge
changes.

For an uncompressed image, that should actually work fine.

And it implies that it would give you change tracking without storing
complete copies of the different files.

I really doubt that any other approach discussed here will give you
change control & multiple versions without having complete copies of
each revision. This is because they're going to treat them as binary
files, and not attempt to look deeper. So if that's a requirement,
you'll probably have to use git, SVN, or something like them. The file
you want to look at is ".gitattributes" for control of diff etc. You
can actually force it to do diffs but not automatic merges of
differing files - it would throw an error if the file was changed in
both places instead of attempting to merge changes, but would still
store differences only.

The other approach that MIGHT give you something similar would be
block-level mirroring with a versioning Logical volume. Sounds like
trouble waiting to happen to me.

OTOH, I don't think that the M$ approach would give you this either.

But you don't need bit-level version storage, just bit-level change
transmission, so this may not be relevant.

Also, all version control based systems are going to try to give you a
"working copy" of all the files in the dataset. This is required for a
source compile, but isn't what you want when working with a large
repository of images! Possibly mitigated by using branches, one branch
per customer probably. But you'd still have to do something odd to
check out/in single files.

For considering the class of programs originally intended to track
changes in source code, I don't think you need to look any further
than git. The others typically have larger repositories, keep multiple
copies of files, etc. All bad things for your application. I think you
could fairly easily make git work by wrapping the single file
checkin/upload in web-foo for your clients, and wrapping single-file
checkout in something similar for your wife, the one doing the file
modifications.

Managing bandwidth is going to be a bit sticky no matter what you use,
I think. Do you want client uploaded files to be instantly moved to
home? Probably not, if a single user uploads the latest wedding shoot,
you want that to wait till night when you have plenty of bandwidth.
BUT the three pictures he's paying to have modified need to be
available at home immediately so your wife can work on them.

This obviously cannot be a automated process. Your wife will have to
log into the office server and download the pictures she needs. You'll
have to have a script in place to selectively fetch just the ones she
needs on command.

OTOH, copying files back to the office can be automated. Assuming your
wife will only be modifying files by hand, they can be automatically
uploaded to the office without worrying about bandwidth. Just make
sure she knows that if SHE wants to upload HER latest wedding shoot,
she'll have to stop the automatic process, and it'll be restarted
automatically that evening. Or, I suppose, just make it choke and die
if more than Xmb of files are merged to the local repository.

Or look into "pyshaper" which can limit bandwidth per process. So you
can give the interactive stuff more bandwidth, and limit the automatic
stuff during the daytime.

On 2/19/10, Stephen Warren <swarren at wwwdotorg.org> wrote:
> DJ Eshelman wrote:
>> I think I may have posed this question before but I'm still having
>> trouble believing Microsoft has the only solution.
>>
>> Here's the situation:
>>
>> I want to start a business for my wife that will service professional
>> photographers and others that want to have photos professionally
>> retouched and, hopefully also sold as a storage solution (we'll get to
>> that later- Storage as a Service for photographers and AV studios that
>> have high storage needs but low budgets- not exactly IronMountain's
>> niche market).
>>
>> We want to have the main server (facing clients) at my office where we
>> have bandwidth to spare and can handle upwards of 20 Mbit/sec transfers,
>> not to mention be on a good Xen (or maybe ESXi) server, so I can sell
>> this reliably and scalably down the road.
>>
>> What I want at home is twofold- both the ability to have near-immediate
>> bit-level sync over a VPN (preferably with good compression as RAW
>> photographs tend to be quite bulky), and the ability to work directly
>> from this server at home independently of the main server at the
>> office.  It's a branch server, in a sense, but with different
>> permissions/user accounts and completely isolated file storage.  This
>> gives a good level of backup/redundancy as I can just do delta backups.
>
> I should probably read the whole thread, but I use unison for this kind of
> thing. It is basically a multi-master rsync.
>
> In the past, I've had file-stores on two servers sync using a cron job
> that ran every 10 minutes; almost immediate. As I think others have noted,
> inotify/similar could decrease latency here.
>
> More recently, I share our photo tree across 3 laptops and a server using
> manual unison invocations.
>
>
> _______________________________________________
> NCLUG mailing list       NCLUG at lists.nclug.org
>
> To unsubscribe, subscribe, or modify
> your settings, go to:
> http://lists.nclug.org/mailman/listinfo/nclug
>



More information about the NCLUG mailing list