Tuesday June 13th, 2023 NCLUG Meeting
Sean Reifschneider
jafo00 at gmail.com
Sun Jun 18 17:32:40 UTC 2023
>datails of it." "Just shy of 39,000 GPUs." "5TB of RAM. 12.8TBits/s
5TB of ram in a cluster seems impossibly low. I just got a home server off
ebay for $330 (landed) that came with a 1/4 TB of RAM.
On Tue, Jun 13, 2023 at 9:56 PM Bob Proulx <bob at proulx.com> wrote:
> j dewitt wrote:
> > What: Tuesday June 13th, 2023 NCLUG Meeting
>
> Tonight we had a full house! AWESOME! Summer is here and people are
> coming out.
>
> Mory started things off with a very nice talk "ExaFlop Clusters Use
> Linux". I'll just note down some words about the super computers he
> talked about. "Just shy of 10,000 systems." "I brought my compute
> cluster tonight." "Frontier ExaFlop AMD -HPE, a multi-million dollar
> machine, has their documentation online. Anyone can read all of the
> datails of it." "Just shy of 39,000 GPUs." "5TB of RAM. 12.8TBits/s
> I/O transfer."
>
> Warewulf is a computer cluster implementation toolkit that facilitates
> the process of installing a cluster and long term administration.
> Clusters run from RAM instead of from disk. Load the OS into RAM and
> then run. Otherwise there will be too much disk failure. And it is
> all about speed. PXE, TFTP, DHCP, NFS, no local disk storage.
>
> https://en.wikipedia.org/wiki/Warewulf
> https://warewulf.org/
>
> Reboots are very slow when there is so much RAM. So reboots are
> avoided. Instead it uses overlays which are live and created on the
> fly. Demo! Mory's compute cluster (a few laptops) showed a
> demonstration of booting and loading the system into RAM.
>
> Some complaints about the proprietary nvidia driver. It was a pain to
> make work in the overlay. Since it is not part of the OS it has to be
> installed separately. But then it always must be installed
> separately. Not impossible. Just more difficult to get going. But
> required in order to use the GPUs in the compute cluster.
>
> The way things work is that there are chroots with the raw file system
> for the OS. But then that gets packed into .img files. Then there
> overlays on them that contain the application.
>
> https://openhpc.community/downloads/
>
> https://github.com/stanfordhpccenter/OpenHPC/tree/main/hpc-for-the-rest-of-us/recipes/rocky8/warewulf4/slurm
>
> Warewulf manages all aspects of the cluter. PXE boot. DHCP. DNS.
> And on and on for everything that is needed to diskless boot each of
> the cluter machines and making them available in the cluster. Though
> we had some conversation about NTP. But time synchronization is
> critical just the same and nodes cannot authenticate if the time is
> offset from the manager host.
>
> We had a little discussion about Intel HyperThreading. It makes the
> OS process scheduler more complicated. It was amusing that we had at
> least five people who voiced that they disable HT in order to improve
> the total performance for high performance computing. (And I am in
> that camp too because when we benchmarked we were faster with HT off
> than on and could get more simulations through.) So many of us
> disable HT as a matter of routine now.
>
> Mory was very enthused about supercomputing! But all good things must
> come to an end! We had many new people so we decided to do a round
> robin to give everyone that wanted to say something to the group a
> chance to do so. Then we adjourned the meeting. Many of us then went
> to Slyce Pizza afterward for dinner.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nclug.org/pipermail/nclug/attachments/20230618/b28b21aa/attachment.htm>
More information about the NCLUG
mailing list