[NCLUG] Re: parallel processing users?
John L. Bass
jbass at dmsd.com
Mon Oct 17 17:39:02 MDT 2005
Chad Perrin <perrin at apotheon.com> writes:
[snip lots of stuff about clustering, parallel processing, and
distributed computing]
Sorta on-topic:
Has anyone here worked with Condor for high-throughput computing?
If so: Do you know anything about how well it would manage a "virtual
node" made up of a parallel virtual machine?
I've got a Condor cluster behind me right now, and the need may well
develop to have Condor manage a PVM of some description made up of
laptops that come and go on this network. I'm looking for ideas and
feasibility advice.
If the data involved is at all critical (financial or other high value hard
to replace/correct), then you need to start thinking about SEU's caused by
neutron strikes. The probability is relatively low for single machines, but
grows quickly as you add lots of machines and lots of memory to the cluster.
Machines that do not have CPU's, caches, memory and disk subsystems protected
by parity/ecc are at risk of silent data corruption that becomes nearly impossible
to track after the fact. Clusters of machines which lack complete parity/ecc
protection are open to random data corruption, especially at this altitude.
Most note books, and the vast majority of comodity PC's lack this protection.
John
More information about the NCLUG
mailing list