-
-
Screenshot of App shows splash screen for Free Tier/Pro Plan with buttons to make payments
-
Screenshot of Pro Plan version of the OCR by California Vision app
-
Final Version Including LLM Analysis with Claude 3 Haiku in Bedrock!
-
CSV Output - Preview of Extracted File
-
Architecture Diagram
-
LLM Analysis in JSON format (Claude 3 Haiku)
💡 Inspiration
Traditional OCR solutions are frustrating. They're inconsistent, expensive (often $0.10+ per page), and offer zero customization. Businesses need intelligent document processing that doesn't break the bank.
I discovered Amazon Textract offers enterprise-grade OCR at $0.0015 per page with unlimited scalability. But raw text extraction isn't enough—documents need to be understood, not just read. That's where the magic happens: combining Textract's precision with Claude 3 Haiku's intelligence creates an autonomous AI agent that truly comprehends documents.
🚀 What it does
AI-Powered Document Intelligence Platform that transforms traditional OCR into an intelligent analysis system:
Core Features
- 📄 Smart OCR: AWS Textract extracts text from PDFs and images with 99%+ accuracy
- 🧠 AI Analysis: Claude 3 Haiku (via Amazon Bedrock) autonomously analyzes and structures document content
- 🎯 Specialized Processing: Automatically adapts analysis for invoices, contracts, forms, and general documents
- 💼 Freemium SaaS: Three tiers (Free, Pro $10/mo, Enterprise $99/mo) with usage-based limits
- 🔐 Enterprise API: RESTful API with Bearer token authentication for programmatic access
- 📊 Document History: Track all processed documents with metadata and re-download results
- 💳 Stripe Integration: Seamless subscription management and upgrades
The AI Agent Difference
Unlike basic OCR tools, this platform autonomously decides how to analyze each document:
- Detects document type without user input
- Selects specialized prompts for optimal extraction
- Structures unstructured data into JSON
- Identifies entities, dates, amounts, and key terms
- All without human intervention after upload
🛠️ How we built it
Technology Stack
- Backend: Flask (Python) with application factory pattern, deployed on Vercel serverless
- AWS Services:
- Textract for OCR text extraction
- Bedrock (Claude 3 Haiku) for intelligent analysis
- S3 for document and result storage
- IAM for security and access control
- Database: PostgreSQL (Neon) with SQLAlchemy ORM
- Auth: Google OAuth 2.0 for secure user authentication
- Payments: Stripe for subscription management
- Frontend: Jinja2 templates with vanilla JavaScript for real-time status updates
Development Process with Kiro AI
I used Kiro IDE's spec-driven development workflow to build this entire platform systematically:
- Requirements Phase: Defined user stories and acceptance criteria using EARS (Easy Approach to Requirements Syntax)
- Design Phase: Created comprehensive architecture with AWS service integration patterns
- Implementation Phase: Built features incrementally with 13 structured tasks
- Testing: Automated test scripts for AWS connectivity and Bedrock validation
Kiro's AI agent helped me:
- ✅ Design the three-tier freemium model with quota enforcement
- ✅ Implement LLM service with specialized prompts for different document types
- ✅ Create database schema with usage tracking and document history
- ✅ Build Enterprise API with rate limiting and authentication
- ✅ Write comprehensive documentation and troubleshooting guides
Architecture Highlights
User Upload → S3 Storage → Textract OCR → Claude 3 Haiku Analysis →
Structured JSON + CSV → S3 Storage → Presigned URLs → User Download
The system processes documents asynchronously with real-time status polling, ensuring a smooth UX even for large multi-page documents.
😅 Challenges we ran into
1. AWS Bedrock Payment Validation 🔥
Hit a wall with INVALID_PAYMENT_INSTRUMENT errors when invoking Claude 3 Haiku. Even with valid payment methods, AWS Marketplace requires separate validation. Solution: Discovered the issue was region-specific (us-west-1 doesn't support Claude 3 Haiku). Migrated to us-east-1 and everything worked perfectly.
2. Database Schema Evolution 🗄️
After implementing LLM features, existing users couldn't log in—the database was missing new columns (llm_analyses_this_month, api_key, document_history table). Solution: Created a migration script that safely adds columns to production databases without data loss.
3. Tier-Based Access Control 🎫
Implementing three subscription tiers with different quotas (documents + LLM analyses) while maintaining a smooth OAuth flow was complex. Solution: Built a decorator-based system that checks quotas before processing and provides clear upgrade prompts.
4. Cross-Region S3 and Bedrock 🌍
Initially had S3 bucket in us-west-1 but needed Bedrock in us-east-1. Solution: Documented that cross-region access works but recommended co-locating resources for optimal performance.
5. Presigned URL Security 🔒
Balancing security (short expiration) with UX (users need time to download). Solution: 5-minute presigned URLs with document history allowing re-generation of download links.
🏆 Accomplishments that we're proud of
Technical Achievements
- ✨ Fully Functional AI Agent: Autonomously processes documents end-to-end without human intervention
- 💰 Cost-Effective: $0.0016 per document with LLM analysis (96-99% profit margins on paid tiers)
- ⚡ Fast Processing: 3-8 seconds total (2-5s OCR + 1-3s LLM analysis)
- 🔄 Production-Ready: Deployed on Vercel with PostgreSQL, Stripe webhooks, and OAuth
- 📚 Comprehensive Documentation: 200+ line README with setup guides, troubleshooting, and architecture diagrams
Business Model
Built a real SaaS business with:
- Freemium acquisition strategy (5 docs/month free)
- Clear upgrade path ($10/mo Pro, $99/mo Enterprise)
- API access for Enterprise customers
- Automated billing and quota enforcement
Code Quality
- Clean architecture with separation of concerns
- Reusable LLM service with specialized prompts
- Automated testing scripts for AWS connectivity
- Database migration tools for schema updates
- Comprehensive error handling and retry logic
📚 What we learned
About AI Agents
True AI agents make autonomous decisions. This platform doesn't just follow instructions—it:
- Analyzes document content to determine type
- Selects optimal processing strategies
- Structures unstructured data intelligently
- Manages its own error recovery and retries
About AWS Bedrock
- Claude 3 Haiku is incredibly cost-effective: ~$0.0001 per analysis vs $0.0015 for Textract
- Regional availability matters: Not all models are available in all regions
- Payment validation is separate: AWS Marketplace requires its own validation beyond regular AWS billing
- Structured output is reliable: With proper prompting, Claude consistently returns valid JSON
About Kiro IDE
Kiro transformed my development process. The spec-driven workflow forced me to:
- Think through requirements before coding
- Design the system architecture comprehensively
- Break complex features into manageable tasks
- Document everything as I built
Without Kiro, this would have taken weeks of trial-and-error. Instead, I built a production-ready SaaS in days with clear requirements, solid architecture, and comprehensive documentation.
About Building SaaS
- Freemium models work: Free tier drives adoption, paid tiers drive revenue
- Usage-based pricing aligns incentives: Users pay for value received
- API access is a premium feature: Enterprise customers will pay for programmatic access
- Documentation is critical: Good docs reduce support burden and increase adoption
🔮 What's next for AI Document Intelligence Platform
Short-Term Enhancements (Next 3 Months)
- 📋 Template Library: Pre-built extraction templates for common document types (W-2s, 1099s, purchase orders)
- 🎨 Custom Prompts: Enterprise users can define their own analysis templates
- 📊 Analytics Dashboard: Usage trends, cost tracking, and document insights
- 🔔 Webhook Notifications: Real-time alerts when processing completes
- 🌐 Multi-Language Support: OCR and analysis for non-English documents
Medium-Term Features (6-12 Months)
- 🤖 Multi-Model Support: Let users choose between Claude, GPT-4, or Gemini
- 📦 Batch Processing: Upload and process multiple documents simultaneously
- 🔍 Document Comparison: Diff analysis between document versions
- 🔗 Integrations: Slack, email, Zapier, and business tool connectors
- 💾 Advanced Storage: Longer retention periods and archival options
Long-Term Vision (12+ Months)
- 🧠 Fine-Tuned Models: Custom Claude models trained on user-specific document types
- 🎯 Confidence Scores: Quality metrics for extracted data
- 🔗 Entity Linking: Connect extracted entities to knowledge bases (companies, people, products)
- 📑 Multi-Document Analysis: Summarize and compare across document sets
- 🌍 Global Deployment: Multi-region support for lower latency worldwide
Cost Optimization Architecture
The platform's dual-architecture approach (Textract + Bedrock) enables intelligent cost optimization:
- Simple documents: Use Textract only ($0.0015/page)
- Complex documents: Add Claude analysis ($0.0016/page total)
- Future: Auto-detect complexity and route accordingly
- Result: Optimal cost-to-value ratio for every document type
This architecture demonstrates that AI agents can make economic decisions, not just technical ones—choosing the most cost-effective processing path based on document characteristics.
Built for the AWS AI Agent Global Hackathon 2024 🚀
Demonstrating how autonomous AI agents can transform traditional services (OCR) into intelligent platforms that understand, analyze, and structure information without human intervention.
Built With
- amazon-web-services
- anthropic
- bedrock
- claude
- flask
- kiro
- neondb
- python
- stripe
- vercel

Log in or sign up for Devpost to join the conversation.