Inspiration

I was inspired by the reaserachers who built alphafold, because they used a game to crowdsource knowledge about how proteins fold. They allowed users to play the game and guess how proteins folded, and iteratively used that to improve alphafold. I believe people have even more to offer, and crowdsourcing compute for LLMs is part of that.

What it does

We offer distributed inference and training for LLMs, hosted on our swarm of 'nodes', which are compute units that vary from laptops to enterprise VMs. This allows enterprises to cheaply train on compute, and pay in our crypto for the time.

How we built it

we built it on hedera and petals, which offer services and modules that we combined into this service.

Challenges we ran into

cuda is awful to get working, and we've got major limits from our cloud compute quota being tiny. Because we could only get a few instances running, we are not able to get a large scale network operating quite yet. We've also had issues with training, which seems not super stable. Some other components are brittle, but they work. There's also a pretty solid latency even for small models, likely beause the network is so small and there are optimizations we are missing.

Accomplishments that we're proud of

The individual components have done all the things we wanted to do! We've been on a swarm, we've moved crypto around and built some structure around that, and we've gotten inference and training working for certain models.

What we learned

we learned that it's possible to do really cool things ontop of open source! this is a hidden, almost unknown framework with 1-2 contributors from a year ago. Despite that, we were able to get it running with modern tooling, and turn this into a really cool demo! Not to mention, a full product of this scale could revolutionize how we do LLM inference in training.

What's next for Roko

with the experience we've gained from this, we'd love to deploy a large aws/gcp account and start from scratch again to develop a enterprise quality product! We'd need some money for the compute during development and for getting the cluster running and then advertising to people that they can join. We also gotta ensure that it's super easy to use, unlike the original framework. (the og framework only has 2 active nodes, because only the creators are able to navigate it to use it)

Built With

+ 2 more
Share this project:

Updates