Inspiration
AI agents usually fail for boring reasons: vague instructions, messy repos, and unclear rules about money, email, and irreversible actions. We kept seeing “run this in Cursor” moments where a few missing constraints could turn into a bad booking, a bad refactor, or a leaked assumption pulled from stale docs. There was no quick pre-flight step that treated the task and the workspace as one safety surface. Agent Brief is that checklist: turn a messy human request plus real project context into something an agent can execute without guessing.
What it does
Agent Brief is a local web app (Next.js on localhost) that runs beside your repo. It:
- Scans the workspace (docs and configs, with sensible skips like node_modules, .git, and .env) to understand what an agent might read and trust.
- Analyzes your task for ambiguity, missing constraints, and risky permissions.
- Produces a structured pre-flight report: readiness scores, expandable “context nutrition” rows (why, evidence, fixes), Safety Issues (with an “Agent OSHA” flavor so risks stick in memory), an approval queue, a human-readable work order, a receipt template, and a Copy for Cursor handoff that packages the execution contract for your agent.
- The pitch: messy request + messy workspace → safe, explicit work order.
How we built it
We used Next.js App Router for UI and API routes in one package, with a workspace scanner module that walks the tree, caps depth and size, and concatenates file contents with clear headers for the model. The /api/analyze route sends a single structured prompt to CLōD (OpenAI-compatible API) using DeepSeek V3, with stream: true so the UI can render sections as JSON arrives. The client parses the stream, progressively fills score cards, nutrition rows, safety issues, approvals, work order, and receipt. Pre-flight resolution (resolving safety items and answering approvals) updates the work order client-side so the final brief matches user choices without a second LLM call for the MVP. Styling follows a dark, product-style shell (Inter + JetBrains Mono, resizable two-panel layout, demo presets for quick demos).
Challenges we ran into
- Structured output over streaming: getting consistently parseable JSON while streaming required careful handling of partial chunks and fallbacks when the model drifts.
- Context limits vs. useful workspace signal: balancing how much of the repo to include without blowing tokens or leaking secrets (hence skips, caps, and optional extra context).
- Making “safety” actionable: scores alone are not enough; we needed expandable evidence, fix text, and patches that actually change the work order.
- Hackathon time: we prioritized the end-to-end demo path (scan → analyze → resolve → copy) over persistence, auth, and automated tests.
- The team is a solo competitor so he ended up running into cursor limit
Accomplishments that we're proud of
- A workspace-aware flow that is not just “rewrite my prompt” but audits environment + task together.
- The work order as the product: a readable execution contract (goal, allowed/blocked actions, approvals, missing info, success criteria, receipt) instead of a wall of JSON in the main UI.
- Streaming UI that feels alive in a demo and matches how people expect modern AI tools to behave.
- Copy for Cursor as a practical handoff: zero integration magic, but immediate usefulness for real workflows.
What we learned
- Most agent failures are contract failures: unstated permissions, unstated “done,” and unstated sources of truth.
- Filesystem context changes the quality of risk detection dramatically compared to prompt-only tools.
- For a short build window, one strong LLM pass + deterministic client updates beats two fragile round-trips.
What's next for Agent Brief
- Stronger validation of streamed JSON and richer error recovery.
- Tests for the workspace scanner (mock FS) and snapshot tests for prompt assembly.
- Optional file watch or re-scan on demand, history of briefs, and tighter Cursor-oriented formats.
- Exploration of multiple providers and tighter guardrails for enterprise-style policies—still with local-first, privacy-conscious defaults.
Built With
- clod
- nextjs
Log in or sign up for Devpost to join the conversation.