Inspiration

AI agents usually fail for boring reasons: vague instructions, messy repos, and unclear rules about money, email, and irreversible actions. We kept seeing “run this in Cursor” moments where a few missing constraints could turn into a bad booking, a bad refactor, or a leaked assumption pulled from stale docs. There was no quick pre-flight step that treated the task and the workspace as one safety surface. Agent Brief is that checklist: turn a messy human request plus real project context into something an agent can execute without guessing.

What it does

Agent Brief is a local web app (Next.js on localhost) that runs beside your repo. It:

  • Scans the workspace (docs and configs, with sensible skips like node_modules, .git, and .env) to understand what an agent might read and trust.
  • Analyzes your task for ambiguity, missing constraints, and risky permissions.
  • Produces a structured pre-flight report: readiness scores, expandable “context nutrition” rows (why, evidence, fixes), Safety Issues (with an “Agent OSHA” flavor so risks stick in memory), an approval queue, a human-readable work order, a receipt template, and a Copy for Cursor handoff that packages the execution contract for your agent.
  • The pitch: messy request + messy workspace → safe, explicit work order.

How we built it

We used Next.js App Router for UI and API routes in one package, with a workspace scanner module that walks the tree, caps depth and size, and concatenates file contents with clear headers for the model. The /api/analyze route sends a single structured prompt to CLōD (OpenAI-compatible API) using DeepSeek V3, with stream: true so the UI can render sections as JSON arrives. The client parses the stream, progressively fills score cards, nutrition rows, safety issues, approvals, work order, and receipt. Pre-flight resolution (resolving safety items and answering approvals) updates the work order client-side so the final brief matches user choices without a second LLM call for the MVP. Styling follows a dark, product-style shell (Inter + JetBrains Mono, resizable two-panel layout, demo presets for quick demos).

Challenges we ran into

  • Structured output over streaming: getting consistently parseable JSON while streaming required careful handling of partial chunks and fallbacks when the model drifts.
  • Context limits vs. useful workspace signal: balancing how much of the repo to include without blowing tokens or leaking secrets (hence skips, caps, and optional extra context).
  • Making “safety” actionable: scores alone are not enough; we needed expandable evidence, fix text, and patches that actually change the work order.
  • Hackathon time: we prioritized the end-to-end demo path (scan → analyze → resolve → copy) over persistence, auth, and automated tests.
  • The team is a solo competitor so he ended up running into cursor limit

Accomplishments that we're proud of

  • A workspace-aware flow that is not just “rewrite my prompt” but audits environment + task together.
  • The work order as the product: a readable execution contract (goal, allowed/blocked actions, approvals, missing info, success criteria, receipt) instead of a wall of JSON in the main UI.
  • Streaming UI that feels alive in a demo and matches how people expect modern AI tools to behave.
  • Copy for Cursor as a practical handoff: zero integration magic, but immediate usefulness for real workflows.

What we learned

  • Most agent failures are contract failures: unstated permissions, unstated “done,” and unstated sources of truth.
  • Filesystem context changes the quality of risk detection dramatically compared to prompt-only tools.
  • For a short build window, one strong LLM pass + deterministic client updates beats two fragile round-trips.

What's next for Agent Brief

  • Stronger validation of streamed JSON and richer error recovery.
  • Tests for the workspace scanner (mock FS) and snapshot tests for prompt assembly.
  • Optional file watch or re-scan on demand, history of briefs, and tighter Cursor-oriented formats.
  • Exploration of multiple providers and tighter guardrails for enterprise-style policies—still with local-first, privacy-conscious defaults.

Built With

  • clod
  • nextjs
Share this project:

Updates