Inspiration

gpt-5pro and gemini2.5deepthink

What it does

bunch of notebooks whihc enable training with parallel thinking paths.

How we built it

using claude and codex

Challenges we ran into

huggingface is shit

Accomplishments that we're proud of

dealing w huggingface and not giving up

What we learned

huggingface is utterly utterly bad for anything

What's next for parallel rl

productionize it. and publicize it

Built With

  • grpo
  • huggingface
  • verl
Share this project:

Updates