Many people want to learn more about machine learning, and try running things hands-on, but don't have access to the necessary resources. On the other hand, many people with powerful GPUs do not use them continuously, and the sit idle some of the time. We wanted to bridge this gap.
What it does
GPUppy allows you to run or train your machine learning models on other peoples' powerful GPUs when they are idle.
How we built it
We built a distributed system to execute these machine learning workloads across a set of GPU-enabled devices. On the client side, we created wrapper scripts to package the context and source code and push it to a centralized scheduling server. The scheduling server pushes these tasks to the set of idle workers. The workers stream the output back as the task executes, using websockets for real-time communication. Additionally, the client uses rsync to pull the model artifacts off of the server.
Challenges we ran into
Getting everything to work correctly was difficult. This project had a lot of moving parts,
Accomplishments that we're proud of
Making it work relatively well.
What we learned
NVIDIA is... interesting
What's next for GPUppy
- Use Sia for job and artifact storage
- Use blockchain technology for more secure billing