Inspiration
Modern developers rely heavily on third-party APIs and technical specifications, but these documents are often incomplete, ambiguous, or inconsistent. During development, this leads to confusion, wasted time, and constant back-and-forth communication with providers.
We were inspired by how much friction exists in simply understanding documentation before any code is even written. We wanted to build something that acts like a “first-pass reviewer”, an intelligent system that catches issues early and helps developers move faster with confidence.
What it does
SpecSentinel is an agentic AI system that analyzes technical specification documents and automatically identifies:
Ambiguous or unclear definitions Missing required fields Contradictions across sections Undefined or inconsistent terminology
Beyond detection, it generates actionable clarification questions that developers can directly use when communicating with API or exchange providers.
The system outputs a structured report with issues, severity levels, and suggested next steps, turning hours of manual review into seconds.
How we built it
We built SpecSentinel as a full-stack, agentic review pipeline for technical spec PDFs. The backend uses FastAPI and a sequential orchestration flow: ingest the PDF, validate extracted structure against a canonical reference, run ambiguity detection with Gemini plus retrieval context, then synthesize everything into a prioritized report and outreach-ready email draft. We used typed models end to end so each stage exchanges structured data instead of raw text blobs. On the frontend, we built a React dashboard with upload, live pipeline progress, severity-grouped findings, PDF context, and an email panel for quick copy-and-send workflows. For demo reliability, we added two acceleration paths: a golden cached demo mode and a reusable per-document cache keyed by PDF hash.
Challenges we ran into
- PDF parsing quality varies a lot by section and table formatting, so extraction needed careful handling.
- Ambiguity detection can over-flag soft language; we had to reduce noise and focus on actionable findings.
- Keeping LLM outputs trustworthy required grounding checks against actual extracted source text.
- End-to-end runtime on first pass was too slow for a short live demo, so we needed cache strategies.
- Environment toggles were easy to misconfigure in separate terminals, especially for demo mode flags.
- Some integration-style polling tests were timing-sensitive under cold starts, which made validation less predictable.
Accomplishments that we're proud of
- A real multi-stage pipeline that works on actual spec PDFs, not just prompt demos.
- Structured, typed outputs across all stages, which made orchestration and UI integration clean.
- A useful final artifact for teams: severity summary, prioritized questions, and an email draft.
- Clear progress visibility in the UI so users can see what each stage is doing.
- Practical demo hardening with golden mode and per-PDF cache for faster repeat runs.
- Tight iteration during the hackathon: major UI redesign, signal-to-noise improvements, and regression tests.
What we learned
- Agent workflows are much more reliable when every stage has strict schemas and validation.
- Grounding and post-checking are essential when you want trustable LLM-assisted findings.
- Demo readiness is an engineering problem, not just a presentation problem; caching and observability matter.
- Small UX details like severity filtering and scrollable drafts significantly improve real usability.
- Operational details, like how environment variables are scoped to a terminal process, can make or break live demos.
What's next for SpecSentinel
- Add a true revision diff workflow so teams can compare updated specs against prior runs.
- Improve latency further with smarter incremental processing and cache invalidation controls.
- Expand reference coverage and evaluation datasets beyond the current canonical subset.
- Strengthen ranking and deduplication so high-impact issues always surface first.
- Add team-ready capabilities such as persistence, history, and collaboration-friendly exports.
- Package deployment for easier judge and pilot onboarding with one-command startup and clearer run profiles.
Built With
- css
- docker
- dockerfile
- html
- javascript
- powershell
- python
Log in or sign up for Devpost to join the conversation.