Inspiration

Binghamton University has two major HPC clusters. However, they both run SLURM in user space. If an organization needs to quickly, temporarily, run a heavy computation on the fly, significant setup is required.

What it does

Searches the local network for nodes that one can log in to; built authorization on top of SSH. Commands are distributed via sockets + protobuf, state is kept sane as the network is meant to have the same NFS.

How I built it

TCP Socket server plus rudimentary python

Challenges I ran into

Socket pipe kept breaking. Just try catched it.

Accomplishments that I'm proud of

Nodes are matched via regex, allowing significant clusters

What I learned

Synchronization across multiple processes/nodes, how to get a process to run on a remote server without keeping an active session

What's next for User Cluster

Implement arguments that alias to utilities such as ulimit, for memory and CPU usage. Proper algorithm for choosing nodes rather than just cycling each command. Detaching the manager from the shell, for p

Built With

Share this project:

Updates