Builder GPS — Devpost Submission
Tagline
Pick an outcome, not sessions. AI picks your AABW week — and reroutes it the moment your plans change.
Inspiration
Agentic AI Build Week packs 44+ sessions across 5 days. You can realistically attend 6–10. Pick wrong and your week is wasted on workshops that don't move your goal forward.
The way most builders approach it is the same: scroll Discord for workshop announcements, screenshot the ones that look good, paste them into a notes app, then guess at what fits. No tool asks the question that actually matters — "what are you trying to ship?"
So we built one.
What it does
Builder GPS turns a 30-second goal into a 5-day path — and now researches the web while doing it.
- You declare an outcome. "Ship a Stripe Connect payments demo by Friday."
- An agent decomposes it. Not a single LLM call — a multi-turn tool-calling loop that searches the open web, self-evaluates its capability list, and iterates until it's confident.
- AI picks your path. From the 44-session catalog, 6–10 sessions across Day 1–5 that hit those exact capabilities — sequenced for prerequisites.
- You mark sessions live. Attended → that knowledge unlocks deeper material later. Skipped → we re-route around the gap. Every change comes with a one-line explanation ("Added INTEG-04 because the auth foundation in ENABLE-03 unlocks deeper material").
- You take it with you — three ways:
- Calendar: Export to Apple/Google Calendar with native reminders, or subscribe to a live URL so reroutes sync automatically.
- Inline resources: Each prerequisite capability shows 2–3 real YouTube/docs/blog links, sourced from the web searches the agent ran during decomposition.
- AI assistant context: One-click
.mdexport of goal + capabilities + resources + the agent's reasoning trace. Drop it into Claude Code / Cursor / Aider and your AI is briefed on your whole week.
The product remembers you — close the tab, come back tomorrow, your path is still there. No login. Cookie-bound UUID = anonymous but persistent.
How we built it
| Layer | Tech | Why |
|---|---|---|
| Frontend | Next.js 15 (App Router) + TypeScript + Tailwind v4 + Framer Motion | Standalone build, < 50 MB Docker image |
| State | TanStack Query + Zustand | Live optimistic mutations on the reroute |
| Backend | Python 3.11 + FastAPI + Pydantic v2 | Type-safe LLM I/O via structured outputs |
| Decompose agent | Cerebras + gpt-oss-120b | Tool calling on a 120B open model. 1M tokens/day free tier. Loops up to 10 iterations with parallel search_web calls per turn. |
| Path compute | Cerebras + gpt-oss-120b (JSON mode) | Same model as the decompose agent, used here in single-turn structured mode. Pydantic-validated output. |
| Critic | Cerebras + gpt-oss-120b (JSON mode) | Reviews the candidate path; if "weak", compute_path re-runs with the critic's suggested constraint. Up to 2 retries. |
| Web search tool | Tavily | The agent calls this once per drafted capability so each prerequisite has its own grounded resources |
| Embeddings | Voyage + voyage-3-lite | 512-dim catalog embedded at deploy time for semantic retrieval |
| Storage | SQLite (single file on Railway volume) | YAGNI — no Postgres needed for 1-week demo |
| Deploy | Railway (api + web, separate containers, shared project) | Two-service hackathon deploy in one dashboard |
The agent pipeline. A goal goes through 4 stages:
decomposeAgent()— Cerebras gpt-oss-120b in a tool-calling loop. The agent has two tools:search_web(Tavily) andevaluate_coverage(rule-based grader). It searches, proposes capabilities, self-grades, iterates. The prompt forces one parallelsearch_webcall per drafted capability, so a typical run does 1 goal search + 5–8 per-capability searches + 1 grade + 1 commit = ~5–9 LLM turns per goal, with most searches emitted in parallel within a single turn. Every Tavily search is cached per capability and exposed via/path/resources.computePath()— Cerebras gpt-oss-120b in JSON mode picks 6–10 sessions from the catalog against the decomposed capabilities. Single structured call, Pydantic-validated.evaluatePath()(critic) — a second Cerebras call grades the candidate path; if the verdict is "weak",computePath()re-runs with the critic's suggested constraint. Up to 2 retries. The critic is allowed to fail silently — flaky critic responses don't break the user request.buildMarkdown()— assembles the kitchen-sink agent-context export — goal + capabilities + resources + path + the agent's full reasoning trace — for download.
All schemas are Pydantic-validated; if the model hallucinates, the request fails loud instead of corrupting state. A provider-agnostic agent_client abstraction lets us swap Cerebras for Claude Sonnet 4.6 post-Devpost by changing one env var.
Challenges we ran into
- Llama tool-calling on Groq is brittle. The adapter occasionally rejected the model's own function-call syntax with
tool_use_failed(stray whitespace between function name and JSON args). We added a retry-with-temp-0 layer and eventually moved decomposition to Cerebras + gpt-oss-120b, where tool calling is rock solid. - Capability slug ↔ Tavily cache slug mismatch. The agent searches with broad phrasing ("MCP server GitHub integration") but the final capability slugs are decomposition-specific ("mcp-protocol-fundamentals-and-server-scaffolding"). Direct slug lookup returned empty resources. Our first fix attached the largest cached search to every unmatched capability — which produced the inverse bug: the same 3 resources appeared on 3+ different capability cards, making the timeline look broken. The real fix has two parts: (1) a greedy unique matcher (each cached search assigned to at most one capability, with orphan caps falling back to unused caches), and (2) an updated agent prompt that forces one parallel
search_webcall per drafted capability. End state: every capability gets its own grounded resources, no duplicates. SameSite=Lax+ Public Suffix List = silent cookie drops. Railway's*.up.railway.appis on the PSL, so ourweb-*andapi-*subdomains are cross-site to browsers. Cookie wasn't carrying across fetch —/builder/mereturned 404 on every reload. Fix:SameSite=None; Secureenv-driven via config.- Pydantic Settings +
CORS_ORIGINS=*. Pydantic tried to JSON-decode*and crashed the app on startup. Solved by typing the field asstrand exposing acors_origins_listproperty. - Monorepo Dockerfile vs. Next.js + pnpm workspaces. Eventually inlined the shared types package and made
apps/weba standalone single-package build. Cut deploy time from 6 min to 2.
Accomplishments we're proud of
- Shipped a complete deployed prototype in 5 days — frontend, backend, multi-LLM agentic pipeline, persistence, calendar export, web search, markdown context export
- A genuinely agentic decompose loop — the LLM autonomously researches a goal on the open web before committing to capabilities, and self-evaluates with a forcing function
- Inline resource recommendations for free — the same Tavily searches the agent ran during decomposition become user-facing resource cards. Zero extra API spend.
- Markdown agent-context export — the killer meta-tooling moment for AABW judges. Drop the file into Claude Code and your AI is briefed on your whole week.
- Sub-3-second reroute experience — mark a session, AI recomputes the path, animated diff renders inline with per-change explanations
- Returning users hydrate into their existing path on reload — no login, just a cookie
What we learned
- LLM JSON mode + Pydantic is a complete game-changer for product reliability. We never had a parse failure in production
- Decomposition prompting is harder than path-picking — most of our prompt iteration went into getting the model to extract crisp capabilities from messy goals
- The Public Suffix List is a load-bearing piece of internet infrastructure that you only learn about when it breaks your cookies
What's next
- Real-time schedule sync — partner with the AABW organizers to replace the mock 44 sessions with the live agenda
- Attendance verification — let builders check in at sessions to auto-mark attended
- Goal templates — "Ship a payments demo," "Build my first agent," "Find a co-founder" — common starting points
- Multi-builder mode — for teams of 2–4 splitting capabilities across the week
Built With
- cerebras
- docker
- fastapi
- framer-motion
- gpt-oss-120b
- nest
- pydantic
- python
- railway
- sqlite
- tailwindcss
- tavily
- typescript
- voyage-ai
Log in or sign up for Devpost to join the conversation.