R2 | Devpost

Inspiration

Most people now have access to several AI models — Claude, ChatGPT, Gemini, Grok, Llama, Mistral — but almost nobody knows which one to use for which task, or how to prompt each one to get its best output. The same idea, dropped into Claude vs. Gemini vs. ChatGPT, can produce wildly different results depending on how the prompt is structured. We wanted to take that gap away from the non-technical user: tell us your idea, and we'll pick the right model from the ones you actually have access to and write the prompt that gets the most out of it — in plain Spanish.

## What we learned

The biggest surprise was just how different these models behave when prompted:

Claude rewards structure: role, context, objectives, constraints, output format. The more scaffolding you give it, the more nuanced its reasoning.
Gemini prefers concise, well-segmented prompts and responds especially well when you ask it to "think step by step" explicitly.
ChatGPT (GPT-4o, o3) is the most forgiving with conversational prompts; reasoning models benefit from telling them how much effort to spend.
Grok shines when the task involves real-time/current data — prompt engineering matters less than picking it for the right use case.
Llama is open-source and less "post-trained" for chat, so it needs more explicit instructions.
Mistral is a multilingual powerhouse — it handles Spanish prompts better than most.

We also learned that a recommendation engine doesn't need a giant model to work. Gemini 2.5 Flash, given the right structured knowledge base, can pick the best tool from a pool of six and write a tailored prompt in seconds — for a fraction of a cent per call.

## How we built it

Stack: Next.js 14 (App Router) + TypeScript on Vercel, with Gemini 2.5 Flash as the engine.

Design first. We mocked the UI in Claude Design, iterated from a 3-screen wizard down to a single-page experience after testing how non-technical users navigate. Apple-style minimalism, Geist font, dark theme.
A six-document knowledge base. We hand-curated a .txt usage doc for each provider — model tiers, ideal use cases, prompting best practices. These ship with the app and are loaded server-side at boot.
A smart, cheap backend. A Next.js API route receives the user's idea and selected models, then injects only the relevant docs into a Gemini system prompt. Gemini returns a strict JSON object: the recommended model, a one-line reason in Spanish, and the optimized prompt.
Cost guardrails. IP-based rate limiting (10 requests / 24h) keeps the free tier sustainable without forcing users to sign up.
Pixel-perfect port. The CSS from the Claude Design prototype was carried over verbatim so the production app looks identical to the mockup.

## Challenges we faced

Token budget pressure. Six model docs is a lot of context to feed an LLM on every call. We solved it by only including the docs for the user's selected models, plus a 5K-character cap per doc.
Reliable structured output. Free-form text from an LLM is hard to render. We used Gemini's JSON mode with an explicit responseSchema so the frontend always gets a parseable object, never a malformed string.
The "right model" recommendation is itself a prompt-engineering problem. Getting Gemini to consistently pick from the user's allowed set (and not invent new models) required tightening the system prompt and validating the returned ID server-side.
Geist font in Next.js 14. next/font/google didn't bundle Geist — had to switch to Vercel's official geist npm package. Small thing, but it blocked the first build.
Designing for non-technical users while still surfacing real choice. We almost shipped a "magic black box" with no model picker, then realized the transparency of seeing which model was recommended (and why) is what makes the tool trustworthy.