Tuesday June 13th, 2023 NCLUG Meeting

Mon Jun 19 15:53:52 UTC 2023

I've had good results with Unixsurplus on Ebay.

On Sun, Jun 18, 2023, 11:33 AM Sean Reifschneider <jafo00 at gmail.com> wrote:

> >datails of it."  "Just shy of 39,000 GPUs."  "5TB of RAM. 12.8TBits/s
>
> 5TB of ram in a cluster seems impossibly low.  I just got a home server
> off ebay for $330 (landed) that came with a 1/4 TB of RAM.
>
> On Tue, Jun 13, 2023 at 9:56 PM Bob Proulx <bob at proulx.com> wrote:
>
>> j dewitt wrote:
>> > What: Tuesday June 13th, 2023 NCLUG Meeting
>>
>> Tonight we had a full house!  AWESOME!  Summer is here and people are
>> coming out.
>>
>> Mory started things off with a very nice talk "ExaFlop Clusters Use
>> Linux".  I'll just note down some words about the super computers he
>> talked about.  "Just shy of 10,000 systems."  "I brought my compute
>> cluster tonight."  "Frontier ExaFlop AMD -HPE, a multi-million dollar
>> machine, has their documentation online.  Anyone can read all of the
>> datails of it."  "Just shy of 39,000 GPUs."  "5TB of RAM. 12.8TBits/s
>> I/O transfer."
>>
>> Warewulf is a computer cluster implementation toolkit that facilitates
>> the process of installing a cluster and long term administration.
>> Clusters run from RAM instead of from disk.  Load the OS into RAM and
>> then run.  Otherwise there will be too much disk failure.  And it is
>> all about speed.  PXE, TFTP, DHCP, NFS, no local disk storage.
>>
>>     https://en.wikipedia.org/wiki/Warewulf
>>     https://warewulf.org/
>>
>> Reboots are very slow when there is so much RAM.  So reboots are
>> avoided.  Instead it uses overlays which are live and created on the
>> fly.  Demo!  Mory's compute cluster (a few laptops) showed a
>> demonstration of booting and loading the system into RAM.
>>
>> Some complaints about the proprietary nvidia driver.  It was a pain to
>> make work in the overlay.  Since it is not part of the OS it has to be
>> installed separately.  But then it always must be installed
>> separately.  Not impossible.  Just more difficult to get going.  But
>> required in order to use the GPUs in the compute cluster.
>>
>> The way things work is that there are chroots with the raw file system
>> for the OS.  But then that gets packed into .img files.  Then there
>> overlays on them that contain the application.
>>
>>     https://openhpc.community/downloads/
>>
>> https://github.com/stanfordhpccenter/OpenHPC/tree/main/hpc-for-the-rest-of-us/recipes/rocky8/warewulf4/slurm
>>
>> Warewulf manages all aspects of the cluter.  PXE boot.  DHCP.  DNS.
>> And on and on for everything that is needed to diskless boot each of
>> the cluter machines and making them available in the cluster.  Though
>> we had some conversation about NTP.  But time synchronization is
>> critical just the same and nodes cannot authenticate if the time is
>> offset from the manager host.
>>
>> We had a little discussion about Intel HyperThreading.  It makes the
>> OS process scheduler more complicated.  It was amusing that we had at
>> least five people who voiced that they disable HT in order to improve
>> the total performance for high performance computing.  (And I am in
>> that camp too because when we benchmarked we were faster with HT off
>> than on and could get more simulations through.)  So many of us
>> disable HT as a matter of routine now.
>>
>> Mory was very enthused about supercomputing!  But all good things must
>> come to an end!  We had many new people so we decided to do a round
>> robin to give everyone that wanted to say something to the group a
>> chance to do so.  Then we adjourned the meeting.  Many of us then went
>> to Slyce Pizza afterward for dinner.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nclug.org/pipermail/nclug/attachments/20230619/f98c1b77/attachment.htm>