Inspiration
Insurance adjusters review 30+ claims per week, each involving 4–10 documents that need to be cross-referenced line by line — FNOLs, policies, medical bills, police reports. They spend 30+ hours a week just reading paperwork. Missed fraud costs the U.S. insurance industry $80B+ annually. We wanted to build something that matches how adjusters actually work — on the phone, in the field, thinking out loud — not typing into a chatbot.
## What it does
DocuVoice is a voice-first document analysis platform. Upload documents to a workspace, and an AI agent reads, extracts structured fields, and cross-references everything — surfacing discrepancies, exposure risks, red flags, and missing information — before you even start talking. Then connect to a real-time voice session and have a natural conversation with an agent that has perfect recall of every document, every field, every number.
Example: Upload an FNOL, policy, medical bills, and police report for an auto claim. The agent automatically finds that the FNOL reports 2 passengers but the police report says 3, that medical costs are at 94% of the policy BI limit, and that treatment started before the recorded accident date. Then you ask questions, and the agent calls tools mid-conversation to search documents, compare fields, calculate exposure ratios, and generate adjuster notes.
## How we built it
Amazon Nova Sonic 2 powers the entire voice conversation — speech-to-speech with a 1M token context window. All documents are injected directly into the conversation context (no RAG needed), so the agent has zero-latency access to
every detail. The agent calls 5 function tools mid-conversation: search_documents, compare_fields, calculate_exposure, flag_red_flags, and generate_summary.
Amazon Nova Pro handles document field extraction (via Instructor + Bedrock Converse API) and cross-document findings generation — producing structured, severity-rated findings with Pydantic-validated schemas.
Amazon Nova Lite runs fast domain classification to validate that uploaded documents actually belong to the workspace domain (e.g., rejects a recipe PDF uploaded to an insurance claim).
The frontend is Next.js 16 with React 19, TypeScript, Tailwind CSS v4, and shadcn/ui. The backend is FastAPI with DynamoDB (single-table design) and S3 for document storage. Voice sessions run on LiveKit Agents v1.4 with Silero VAD. Production is deployed on EC2 with Docker, ECR, and Caddy for auto-TLS.
## Challenges we ran into
- Getting Nova Sonic 2 tool calling to work reliably during live voice sessions required careful prompt engineering and context structuring
- Balancing the 1M token context window — fitting all document text plus system prompts, findings, and tool definitions without hitting limits on large claim files
- Async document processing pipeline needed to handle OCR fallback (Textract) gracefully when PyMuPDF couldn't extract text from scanned PDFs
## Accomplishments that we're proud of
- No RAG — Full document context injection with Nova Sonic 2's 1M token window eliminates retrieval latency entirely
- Findings-first agent — The agent leads with what matters instead of waiting to be asked
- Voice-native — Built for how professionals actually work, not how chatbots want them to work
- Production deployed — Live at https://novasonic-hackathon.sumanpaudel.me with real AWS infrastructure
## What we learned
Nova Sonic 2's speech-to-speech architecture fundamentally changes what's possible with voice AI. Eliminating the STT → LLM → TTS pipeline means sub-second response latency with full reasoning capabilities. The 1M token context window means you can skip the entire RAG infrastructure for document-heavy use cases — simpler architecture, better accuracy, faster responses.
## What's next for DocuVoice
- Multi-domain support — legal contract review, financial due diligence, HR compliance
- Batch claim processing for high-volume adjusting teams
- Report export for compliance and audit trails
- Multi-language support leveraging Nova Sonic 2's multilingual capabilities
Built With
- amazon-web-services
- boto3
- dynamodb
- ec2
- ecr
- fastapi
- livekit
- nextjs
- nova
Log in or sign up for Devpost to join the conversation.