Inspiration
I was inspired by the reaserachers who built alphafold, because they used a game to crowdsource knowledge about how proteins fold. They allowed users to play the game and guess how proteins folded, and iteratively used that to improve alphafold. I believe people have even more to offer, and crowdsourcing compute for LLMs is part of that.
What it does
We offer distributed inference and training for LLMs, hosted on our swarm of 'nodes', which are compute units that vary from laptops to enterprise VMs. This allows enterprises to cheaply train on compute, and pay in our crypto for the time.
How we built it
we built it on hedera and petals, which offer services and modules that we combined into this service.
Challenges we ran into
cuda is awful to get working, and we've got major limits from our cloud compute quota being tiny. Because we could only get a few instances running, we are not able to get a large scale network operating quite yet. We've also had issues with training, which seems not super stable. Some other components are brittle, but they work. There's also a pretty solid latency even for small models, likely beause the network is so small and there are optimizations we are missing.
Accomplishments that we're proud of
The individual components have done all the things we wanted to do! We've been on a swarm, we've moved crypto around and built some structure around that, and we've gotten inference and training working for certain models.
What we learned
we learned that it's possible to do really cool things ontop of open source! this is a hidden, almost unknown framework with 1-2 contributors from a year ago. Despite that, we were able to get it running with modern tooling, and turn this into a really cool demo! Not to mention, a full product of this scale could revolutionize how we do LLM inference in training.
What's next for Roko
with the experience we've gained from this, we'd love to deploy a large aws/gcp account and start from scratch again to develop a enterprise quality product! We'd need some money for the compute during development and for getting the cluster running and then advertising to people that they can join. We also gotta ensure that it's super easy to use, unlike the original framework. (the og framework only has 2 active nodes, because only the creators are able to navigate it to use it)
Built With
- hebera
- petals
- python

Log in or sign up for Devpost to join the conversation.