Builder GPS — Devpost Submission

Tagline

Pick an outcome, not sessions. AI picks your AABW week — and reroutes it the moment your plans change.


Inspiration

Agentic AI Build Week packs 44+ sessions across 5 days. You can realistically attend 6–10. Pick wrong and your week is wasted on workshops that don't move your goal forward.

The way most builders approach it is the same: scroll Discord for workshop announcements, screenshot the ones that look good, paste them into a notes app, then guess at what fits. No tool asks the question that actually matters — "what are you trying to ship?"

So we built one.


What it does

Builder GPS turns a 30-second goal into a 5-day path — and now researches the web while doing it.

  1. You declare an outcome. "Ship a Stripe Connect payments demo by Friday."
  2. An agent decomposes it. Not a single LLM call — a multi-turn tool-calling loop that searches the open web, self-evaluates its capability list, and iterates until it's confident.
  3. AI picks your path. From the 44-session catalog, 6–10 sessions across Day 1–5 that hit those exact capabilities — sequenced for prerequisites.
  4. You mark sessions live. Attended → that knowledge unlocks deeper material later. Skipped → we re-route around the gap. Every change comes with a one-line explanation ("Added INTEG-04 because the auth foundation in ENABLE-03 unlocks deeper material").
  5. You take it with you — three ways:
    • Calendar: Export to Apple/Google Calendar with native reminders, or subscribe to a live URL so reroutes sync automatically.
    • Inline resources: Each prerequisite capability shows 2–3 real YouTube/docs/blog links, sourced from the web searches the agent ran during decomposition.
    • AI assistant context: One-click .md export of goal + capabilities + resources + the agent's reasoning trace. Drop it into Claude Code / Cursor / Aider and your AI is briefed on your whole week.

The product remembers you — close the tab, come back tomorrow, your path is still there. No login. Cookie-bound UUID = anonymous but persistent.


How we built it

Layer Tech Why
Frontend Next.js 15 (App Router) + TypeScript + Tailwind v4 + Framer Motion Standalone build, < 50 MB Docker image
State TanStack Query + Zustand Live optimistic mutations on the reroute
Backend Python 3.11 + FastAPI + Pydantic v2 Type-safe LLM I/O via structured outputs
Decompose agent Cerebras + gpt-oss-120b Tool calling on a 120B open model. 1M tokens/day free tier. Loops up to 10 iterations with parallel search_web calls per turn.
Path compute Cerebras + gpt-oss-120b (JSON mode) Same model as the decompose agent, used here in single-turn structured mode. Pydantic-validated output.
Critic Cerebras + gpt-oss-120b (JSON mode) Reviews the candidate path; if "weak", compute_path re-runs with the critic's suggested constraint. Up to 2 retries.
Web search tool Tavily The agent calls this once per drafted capability so each prerequisite has its own grounded resources
Embeddings Voyage + voyage-3-lite 512-dim catalog embedded at deploy time for semantic retrieval
Storage SQLite (single file on Railway volume) YAGNI — no Postgres needed for 1-week demo
Deploy Railway (api + web, separate containers, shared project) Two-service hackathon deploy in one dashboard

The agent pipeline. A goal goes through 4 stages:

  1. decomposeAgent() — Cerebras gpt-oss-120b in a tool-calling loop. The agent has two tools: search_web (Tavily) and evaluate_coverage (rule-based grader). It searches, proposes capabilities, self-grades, iterates. The prompt forces one parallel search_web call per drafted capability, so a typical run does 1 goal search + 5–8 per-capability searches + 1 grade + 1 commit = ~5–9 LLM turns per goal, with most searches emitted in parallel within a single turn. Every Tavily search is cached per capability and exposed via /path/resources.
  2. computePath() — Cerebras gpt-oss-120b in JSON mode picks 6–10 sessions from the catalog against the decomposed capabilities. Single structured call, Pydantic-validated.
  3. evaluatePath() (critic) — a second Cerebras call grades the candidate path; if the verdict is "weak", computePath() re-runs with the critic's suggested constraint. Up to 2 retries. The critic is allowed to fail silently — flaky critic responses don't break the user request.
  4. buildMarkdown() — assembles the kitchen-sink agent-context export — goal + capabilities + resources + path + the agent's full reasoning trace — for download.

All schemas are Pydantic-validated; if the model hallucinates, the request fails loud instead of corrupting state. A provider-agnostic agent_client abstraction lets us swap Cerebras for Claude Sonnet 4.6 post-Devpost by changing one env var.


Challenges we ran into

  • Llama tool-calling on Groq is brittle. The adapter occasionally rejected the model's own function-call syntax with tool_use_failed (stray whitespace between function name and JSON args). We added a retry-with-temp-0 layer and eventually moved decomposition to Cerebras + gpt-oss-120b, where tool calling is rock solid.
  • Capability slug ↔ Tavily cache slug mismatch. The agent searches with broad phrasing ("MCP server GitHub integration") but the final capability slugs are decomposition-specific ("mcp-protocol-fundamentals-and-server-scaffolding"). Direct slug lookup returned empty resources. Our first fix attached the largest cached search to every unmatched capability — which produced the inverse bug: the same 3 resources appeared on 3+ different capability cards, making the timeline look broken. The real fix has two parts: (1) a greedy unique matcher (each cached search assigned to at most one capability, with orphan caps falling back to unused caches), and (2) an updated agent prompt that forces one parallel search_web call per drafted capability. End state: every capability gets its own grounded resources, no duplicates.
  • SameSite=Lax + Public Suffix List = silent cookie drops. Railway's *.up.railway.app is on the PSL, so our web-* and api-* subdomains are cross-site to browsers. Cookie wasn't carrying across fetch — /builder/me returned 404 on every reload. Fix: SameSite=None; Secure env-driven via config.
  • Pydantic Settings + CORS_ORIGINS=*. Pydantic tried to JSON-decode * and crashed the app on startup. Solved by typing the field as str and exposing a cors_origins_list property.
  • Monorepo Dockerfile vs. Next.js + pnpm workspaces. Eventually inlined the shared types package and made apps/web a standalone single-package build. Cut deploy time from 6 min to 2.

Accomplishments we're proud of

  • Shipped a complete deployed prototype in 5 days — frontend, backend, multi-LLM agentic pipeline, persistence, calendar export, web search, markdown context export
  • A genuinely agentic decompose loop — the LLM autonomously researches a goal on the open web before committing to capabilities, and self-evaluates with a forcing function
  • Inline resource recommendations for free — the same Tavily searches the agent ran during decomposition become user-facing resource cards. Zero extra API spend.
  • Markdown agent-context export — the killer meta-tooling moment for AABW judges. Drop the file into Claude Code and your AI is briefed on your whole week.
  • Sub-3-second reroute experience — mark a session, AI recomputes the path, animated diff renders inline with per-change explanations
  • Returning users hydrate into their existing path on reload — no login, just a cookie

What we learned

  • LLM JSON mode + Pydantic is a complete game-changer for product reliability. We never had a parse failure in production
  • Decomposition prompting is harder than path-picking — most of our prompt iteration went into getting the model to extract crisp capabilities from messy goals
  • The Public Suffix List is a load-bearing piece of internet infrastructure that you only learn about when it breaks your cookies

What's next

  • Real-time schedule sync — partner with the AABW organizers to replace the mock 44 sessions with the live agenda
  • Attendance verification — let builders check in at sessions to auto-mark attended
  • Goal templates — "Ship a payments demo," "Build my first agent," "Find a co-founder" — common starting points
  • Multi-builder mode — for teams of 2–4 splitting capabilities across the week

Built With

Share this project:

Updates

posted an update

Big drop tonight: Builder GPS is now actually agentic.

The goal-decomposer used to be one single-shot LLM call. Now it's a multi-turn tool-calling agent — it searches the open web (via Tavily), self-grades its capability list, iterates until it's confident, then commits. Two side effects: every prerequisite capability shows 2-3 real resource links inline (YouTube tutorials, official docs, blog write-ups), and you can now hit one button to export your whole 5-day plan as a markdown file to paste into Claude Code / Cursor as project context.

Log in or sign up for Devpost to join the conversation.