Inspiration
gpt-5pro and gemini2.5deepthink
What it does
bunch of notebooks whihc enable training with parallel thinking paths.
How we built it
using claude and codex
Challenges we ran into
huggingface is shit
Accomplishments that we're proud of
dealing w huggingface and not giving up
What we learned
huggingface is utterly utterly bad for anything
What's next for parallel rl
productionize it. and publicize it
Built With
- grpo
- huggingface
- verl
Log in or sign up for Devpost to join the conversation.