[NCLUG] Re: parallel processing users?

John L. Bass jbass at dmsd.com
Mon Oct 17 17:39:02 MDT 2005


	Chad Perrin <perrin at apotheon.com> writes:
	[snip lots of stuff about clustering, parallel processing, and
	distributed computing]

	Sorta on-topic:
	Has anyone here worked with Condor for high-throughput computing?
	If so: Do you know anything about how well it would manage a "virtual
	node" made up of a parallel virtual machine?

	I've got a Condor cluster behind me right now, and the need may well
	develop to have Condor manage a PVM of some description made up of
	laptops that come and go on this network.  I'm looking for ideas and
	feasibility advice.

If the data involved is at all critical (financial or other high value hard
to replace/correct), then you need to start thinking about SEU's caused by
neutron strikes. The probability is relatively low for single machines, but
grows quickly as you add lots of machines and lots of memory to the cluster.

Machines that do not have CPU's, caches, memory and disk subsystems protected
by parity/ecc are at risk of silent data corruption that becomes nearly impossible
to track after the fact. Clusters of machines which lack complete parity/ecc
protection are open to random data corruption, especially at this altitude.

Most note books, and the vast majority of comodity PC's lack this protection.

John



More information about the NCLUG mailing list