docuvoice

Inspiration

Insurance adjusters review 30+ claims per week, each involving 4–10 documents that need to be cross-referenced line by line — FNOLs, policies, medical bills, police reports. They spend 30+ hours a week just reading paperwork. Missed fraud costs the U.S. insurance industry $80B+ annually. We wanted to build something that matches how adjusters actually work — on the phone, in the field, thinking out loud — not typing into a chatbot.

## What it does

DocuVoice is a voice-first document analysis platform. Upload documents to a workspace, and an AI agent reads, extracts structured fields, and cross-references everything — surfacing discrepancies, exposure risks, red flags, and missing information — before you even start talking. Then connect to a real-time voice session and have a natural conversation with an agent that has perfect recall of every document, every field, every number.

Example: Upload an FNOL, policy, medical bills, and police report for an auto claim. The agent automatically finds that the FNOL reports 2 passengers but the police report says 3, that medical costs are at 94% of the policy BI limit, and that treatment started before the recorded accident date. Then you ask questions, and the agent calls tools mid-conversation to search documents, compare fields, calculate exposure ratios, and generate adjuster notes.

## How we built it

Amazon Nova Sonic 2 powers the entire voice conversation — speech-to-speech with a 1M token context window. All documents are injected directly into the conversation context (no RAG needed), so the agent has zero-latency access to every detail. The agent calls 5 function tools mid-conversation: search_documents, compare_fields, calculate_exposure, flag_red_flags, and generate_summary.

Amazon Nova Pro handles document field extraction (via Instructor + Bedrock Converse API) and cross-document findings generation — producing structured, severity-rated findings with Pydantic-validated schemas.

Amazon Nova Lite runs fast domain classification to validate that uploaded documents actually belong to the workspace domain (e.g., rejects a recipe PDF uploaded to an insurance claim).

The frontend is Next.js 16 with React 19, TypeScript, Tailwind CSS v4, and shadcn/ui. The backend is FastAPI with DynamoDB (single-table design) and S3 for document storage. Voice sessions run on LiveKit Agents v1.4 with Silero VAD. Production is deployed on EC2 with Docker, ECR, and Caddy for auto-TLS.

## Challenges we ran into

Getting Nova Sonic 2 tool calling to work reliably during live voice sessions required careful prompt engineering and context structuring
Balancing the 1M token context window — fitting all document text plus system prompts, findings, and tool definitions without hitting limits on large claim files
Async document processing pipeline needed to handle OCR fallback (Textract) gracefully when PyMuPDF couldn't extract text from scanned PDFs

## Accomplishments that we're proud of

No RAG — Full document context injection with Nova Sonic 2's 1M token window eliminates retrieval latency entirely
Findings-first agent — The agent leads with what matters instead of waiting to be asked
Voice-native — Built for how professionals actually work, not how chatbots want them to work
Production deployed — Live at https://novasonic-hackathon.sumanpaudel.me with real AWS infrastructure

## What we learned

Nova Sonic 2's speech-to-speech architecture fundamentally changes what's possible with voice AI. Eliminating the STT → LLM → TTS pipeline means sub-second response latency with full reasoning capabilities. The 1M token context window means you can skip the entire RAG infrastructure for document-heavy use cases — simpler architecture, better accuracy, faster responses.

## What's next for DocuVoice

Multi-domain support — legal contract review, financial due diligence, HR compliance
Batch claim processing for high-volume adjusting teams
Report export for compliance and audit trails
Multi-language support leveraging Nova Sonic 2's multilingual capabilities

Built With

amazon-web-services
boto3
dynamodb
ec2
ecr
fastapi
livekit
nextjs
nova

Updates

Suman Paudel started this project — Mar 16, 2026 01:30 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.