<div dir="auto">I've had good results with Unixsurplus on Ebay.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Jun 18, 2023, 11:33 AM Sean Reifschneider <<a href="mailto:jafo00@gmail.com">jafo00@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>>datails of it." "Just shy of 39,000 GPUs." "5TB of RAM. 12.8TBits/s</div><div><br></div>5TB of ram in a cluster seems impossibly low. I just got a home server off ebay for $330 (landed) that came with a 1/4 TB of RAM.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jun 13, 2023 at 9:56 PM Bob Proulx <<a href="mailto:bob@proulx.com" target="_blank" rel="noreferrer">bob@proulx.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">j dewitt wrote:<br>
> What: Tuesday June 13th, 2023 NCLUG Meeting<br>
<br>
Tonight we had a full house! AWESOME! Summer is here and people are<br>
coming out.<br>
<br>
Mory started things off with a very nice talk "ExaFlop Clusters Use<br>
Linux". I'll just note down some words about the super computers he<br>
talked about. "Just shy of 10,000 systems." "I brought my compute<br>
cluster tonight." "Frontier ExaFlop AMD -HPE, a multi-million dollar<br>
machine, has their documentation online. Anyone can read all of the<br>
datails of it." "Just shy of 39,000 GPUs." "5TB of RAM. 12.8TBits/s<br>
I/O transfer."<br>
<br>
Warewulf is a computer cluster implementation toolkit that facilitates<br>
the process of installing a cluster and long term administration.<br>
Clusters run from RAM instead of from disk. Load the OS into RAM and<br>
then run. Otherwise there will be too much disk failure. And it is<br>
all about speed. PXE, TFTP, DHCP, NFS, no local disk storage.<br>
<br>
<a href="https://en.wikipedia.org/wiki/Warewulf" rel="noreferrer noreferrer" target="_blank">https://en.wikipedia.org/wiki/Warewulf</a><br>
<a href="https://warewulf.org/" rel="noreferrer noreferrer" target="_blank">https://warewulf.org/</a><br>
<br>
Reboots are very slow when there is so much RAM. So reboots are<br>
avoided. Instead it uses overlays which are live and created on the<br>
fly. Demo! Mory's compute cluster (a few laptops) showed a<br>
demonstration of booting and loading the system into RAM.<br>
<br>
Some complaints about the proprietary nvidia driver. It was a pain to<br>
make work in the overlay. Since it is not part of the OS it has to be<br>
installed separately. But then it always must be installed<br>
separately. Not impossible. Just more difficult to get going. But<br>
required in order to use the GPUs in the compute cluster.<br>
<br>
The way things work is that there are chroots with the raw file system<br>
for the OS. But then that gets packed into .img files. Then there<br>
overlays on them that contain the application.<br>
<br>
<a href="https://openhpc.community/downloads/" rel="noreferrer noreferrer" target="_blank">https://openhpc.community/downloads/</a><br>
<a href="https://github.com/stanfordhpccenter/OpenHPC/tree/main/hpc-for-the-rest-of-us/recipes/rocky8/warewulf4/slurm" rel="noreferrer noreferrer" target="_blank">https://github.com/stanfordhpccenter/OpenHPC/tree/main/hpc-for-the-rest-of-us/recipes/rocky8/warewulf4/slurm</a><br>
<br>
Warewulf manages all aspects of the cluter. PXE boot. DHCP. DNS.<br>
And on and on for everything that is needed to diskless boot each of<br>
the cluter machines and making them available in the cluster. Though<br>
we had some conversation about NTP. But time synchronization is<br>
critical just the same and nodes cannot authenticate if the time is<br>
offset from the manager host.<br>
<br>
We had a little discussion about Intel HyperThreading. It makes the<br>
OS process scheduler more complicated. It was amusing that we had at<br>
least five people who voiced that they disable HT in order to improve<br>
the total performance for high performance computing. (And I am in<br>
that camp too because when we benchmarked we were faster with HT off<br>
than on and could get more simulations through.) So many of us<br>
disable HT as a matter of routine now.<br>
<br>
Mory was very enthused about supercomputing! But all good things must<br>
come to an end! We had many new people so we decided to do a round<br>
robin to give everyone that wanted to say something to the group a<br>
chance to do so. Then we adjourned the meeting. Many of us then went<br>
to Slyce Pizza afterward for dinner.<br>
</blockquote></div>
</blockquote></div>