Inspiration

Modern machine learning and data processing often require large amounts of computing power. It is not uncommon to have many idle machines at any given time. At the same time, setting up traditional distributed computing infrastructure is complex and usually requires dedicated clusters, cloud infrastructure, or specialized platforms. We wanted to explore whether it was possible to turn ordinary computers on the same network into an ad hoc compute cluster. Our goal was to build a system where machines could automatically discover each other, join a shared pool of resources, and execute distributed workloads collaboratively. This project was inspired by ideas behind distributed systems like cluster computing, container orchestration platforms, and volunteer computing networks, but adapted for a lightweight local environment that can run across everyday machines.

What it does

The project gives the users pc the role of a coordinator to a potential local cluster. Other pcs can opt in to the cluster and be designated the worker role. Workers automatically discover a coordinator node using LAN broadcast. The coordinator then has the ability to distribute worker GPU/CPU power. Additionally, tasks are executed inside Docker containers, which allows the system to run workloads written in different languages or requiring specific dependencies. This makes the platform flexible and capable of running a wide range of workloads, from simulations to machine learning jobs.

How we built it

We built this project by creating the specifications and architecture with the assistance of Chatgpt. Then we divided the tasks into 4 between the group. Aneesh handled the backend coordinator role and general backend, Ira and Ben created the worker components for the backend as well, and Venkatesh created the front end. The coordinator server is built using Python and FastAPI. It manages worker registration, task scheduling, and job tracking. Communication between the coordinator, workers, and dashboard is handled through WebSocket's to support real-time updates.

Challenges we ran into

There were multiple challenges when it came to creating the project. Early on we took a lot of time deciding what project we wanted to make and which direction we wanted to go in. We faced issues when it came to factoring the different dependencies of potential GPU's and software that would be required for the cluster. There was also an issue where the Wi-Fi that was provided by the University was intercepting our signal and preventing us from administering complex tasks. Ben was new to training AI models and had to research the process. Aneesh did not have much experience with networking and had to learn on the fly. Venkatesh had issues with Git as it had been a while since he had utilized it. For Ira, it was her first time having to split between the tasks.

Accomplishments that we're proud of

The moment we tested the signal and established a connection. The genera work we had committed toward the project. The camaraderie we had built along the way. The resilience we demonstrated in staying up for the sake of the projects completion.

What we learned

Through this project we gained hands-on experience with distributed systems concepts such as node discovery, task scheduling, and resource management. We also learned how containerization enables reproducible workloads across heterogeneous machines. Another key takeaway was the importance of well-defined communication protocols when building systems composed of multiple independent components. Finally, we learned how to coordinate development across multiple team members while designing a system that requires tight integration between modules. We also learned about networking, ai models, ai training, gpus, cpus, the cloud, node js, electron, collaborative coding, ai assisted coding, and much more.

What's next for Rain

The development of Rain will continue. Our next course of action is to expand the system to support larger clusters across multiple networks, potentially enabling distributed computing beyond a single local network.

Built With

Share this project:

Updates