Inspiration

I’ve always felt that coding with LLMs is a bit too “take it or leave it.” You get one answer, and if it’s wrong or not in the style you want… tough. But devs have preferences. Sometimes I want the fastest solution, sometimes I want something clean and readable that fits my codebase, and sometimes I want something super memory-efficient.

So I built Code Quintet to flip that dynamic. Instead of one answer, you get five—each optimized differently—and you pick the one that fits your goals.

What it does

Code Quintet takes a coding problem, generates five solution variants with different styles, runs them in isolated Daytona sandboxes, and gives you a clean comparison of correctness, runtime, errors, and overall performance.

The whole thing is human-in-the-loop: I don’t assume what you want. You tell me your priority—speed, correctness, memory, readability—and Code Quintet will capture the right choice.

How I built it

I built a small pipeline around HumanEval, pulling the prompt, function signature, and tests. Then I designed five “personalities” for the LLM—baseline, fast, memory-efficient, no-external-libs, and ultra-readable.

Each variant is generated with a different prompt modifier. I pass the code into Daytona sandboxes, run the tests, measure runtime, and collect everything into a summary table.

On top of that, I added a clean CLI, JSON artifacts for every run, and a nice little user flow to tie it all together.

Challenges I ran into

Latency. Latency. Latency.

Running five LLM calls and five sandbox executions back-to-back was painfully slow at first—plus my WiFi was very slow. The initial version ran everything sequentially...

To fix that, I parallelized the whole pipeline, slimmed down the code, reduced unnecessary network calls, and used TQDM to track everything cleanly as it ran. Once I got concurrency dialed in, the whole system felt way snappier.

I also had the usual fun bugs with sandboxes, code extraction, and test imports.

Accomplishments I’m proud of

  • Built a working ensemble system end-to-end as a solo builder
  • Integrated Daytona sandboxes seamlessly
  • Ran real HumanEval tests—no mock data
  • Made the whole workflow feel surprisingly nice from the CLI
  • And honestly, just seeing five solutions race against each other feels awesome
  • Really fun idea for a variant marketplace.
  • added a HTML UI for users to look at their runs!

What I learned

I learned that LLMs benefit a lot from diversity. Asking for five different approaches surfaces totally different trade-offs.

I also learned that sandboxing is essential if you want to trust code. HumanEval reminded me how subtle “correctness” can be. And the human-in-the-loop pattern—letting people choose their preferred solution—feels like a direction more coding tools should embrace.

What’s next for Code Quintet

I want to turn this into a Cursor extension that works live inside the editor. Imagine highlighting a function and instantly seeing five solution variants appear in real time—tested, ranked, and ready to drop into your codebase.

The long-term dream is a system that learns from developer preferences and adapts its variants to each person’s style.

Built With

Share this project:

Updates