Inspiration

Insurance adjusters review 30+ claims per week, each involving 4–10 documents that need to be cross-referenced line by line — FNOLs, policies, medical bills, police reports. They spend 30+ hours a week just reading paperwork. Missed fraud costs the U.S. insurance industry $80B+ annually. We wanted to build something that matches how adjusters actually work — on the phone, in the field, thinking out loud — not typing into a chatbot.

## What it does

DocuVoice is a voice-first document analysis platform. Upload documents to a workspace, and an AI agent reads, extracts structured fields, and cross-references everything — surfacing discrepancies, exposure risks, red flags, and missing information — before you even start talking. Then connect to a real-time voice session and have a natural conversation with an agent that has perfect recall of every document, every field, every number.

Example: Upload an FNOL, policy, medical bills, and police report for an auto claim. The agent automatically finds that the FNOL reports 2 passengers but the police report says 3, that medical costs are at 94% of the policy BI limit, and that treatment started before the recorded accident date. Then you ask questions, and the agent calls tools mid-conversation to search documents, compare fields, calculate exposure ratios, and generate adjuster notes.

## How we built it

Amazon Nova Sonic 2 powers the entire voice conversation — speech-to-speech with a 1M token context window. All documents are injected directly into the conversation context (no RAG needed), so the agent has zero-latency access to every detail. The agent calls 5 function tools mid-conversation: search_documents, compare_fields, calculate_exposure, flag_red_flags, and generate_summary.

Amazon Nova Pro handles document field extraction (via Instructor + Bedrock Converse API) and cross-document findings generation — producing structured, severity-rated findings with Pydantic-validated schemas.

Amazon Nova Lite runs fast domain classification to validate that uploaded documents actually belong to the workspace domain (e.g., rejects a recipe PDF uploaded to an insurance claim).

The frontend is Next.js 16 with React 19, TypeScript, Tailwind CSS v4, and shadcn/ui. The backend is FastAPI with DynamoDB (single-table design) and S3 for document storage. Voice sessions run on LiveKit Agents v1.4 with Silero VAD. Production is deployed on EC2 with Docker, ECR, and Caddy for auto-TLS.

## Challenges we ran into

  • Getting Nova Sonic 2 tool calling to work reliably during live voice sessions required careful prompt engineering and context structuring
  • Balancing the 1M token context window — fitting all document text plus system prompts, findings, and tool definitions without hitting limits on large claim files
  • Async document processing pipeline needed to handle OCR fallback (Textract) gracefully when PyMuPDF couldn't extract text from scanned PDFs

## Accomplishments that we're proud of

  • No RAG — Full document context injection with Nova Sonic 2's 1M token window eliminates retrieval latency entirely
  • Findings-first agent — The agent leads with what matters instead of waiting to be asked
  • Voice-native — Built for how professionals actually work, not how chatbots want them to work
  • Production deployed — Live at https://novasonic-hackathon.sumanpaudel.me with real AWS infrastructure

## What we learned

Nova Sonic 2's speech-to-speech architecture fundamentally changes what's possible with voice AI. Eliminating the STT → LLM → TTS pipeline means sub-second response latency with full reasoning capabilities. The 1M token context window means you can skip the entire RAG infrastructure for document-heavy use cases — simpler architecture, better accuracy, faster responses.

## What's next for DocuVoice

  • Multi-domain support — legal contract review, financial due diligence, HR compliance
  • Batch claim processing for high-volume adjusting teams
  • Report export for compliance and audit trails
  • Multi-language support leveraging Nova Sonic 2's multilingual capabilities

Built With

Share this project:

Updates