💡 Inspiration

Traditional OCR solutions are frustrating. They're inconsistent, expensive (often $0.10+ per page), and offer zero customization. Businesses need intelligent document processing that doesn't break the bank.

I discovered Amazon Textract offers enterprise-grade OCR at $0.0015 per page with unlimited scalability. But raw text extraction isn't enough—documents need to be understood, not just read. That's where the magic happens: combining Textract's precision with Claude 3 Haiku's intelligence creates an autonomous AI agent that truly comprehends documents.

🚀 What it does

AI-Powered Document Intelligence Platform that transforms traditional OCR into an intelligent analysis system:

Core Features

  • 📄 Smart OCR: AWS Textract extracts text from PDFs and images with 99%+ accuracy
  • 🧠 AI Analysis: Claude 3 Haiku (via Amazon Bedrock) autonomously analyzes and structures document content
  • 🎯 Specialized Processing: Automatically adapts analysis for invoices, contracts, forms, and general documents
  • 💼 Freemium SaaS: Three tiers (Free, Pro $10/mo, Enterprise $99/mo) with usage-based limits
  • 🔐 Enterprise API: RESTful API with Bearer token authentication for programmatic access
  • 📊 Document History: Track all processed documents with metadata and re-download results
  • 💳 Stripe Integration: Seamless subscription management and upgrades

The AI Agent Difference

Unlike basic OCR tools, this platform autonomously decides how to analyze each document:

  • Detects document type without user input
  • Selects specialized prompts for optimal extraction
  • Structures unstructured data into JSON
  • Identifies entities, dates, amounts, and key terms
  • All without human intervention after upload

🛠️ How we built it

Technology Stack

  • Backend: Flask (Python) with application factory pattern, deployed on Vercel serverless
  • AWS Services:
    • Textract for OCR text extraction
    • Bedrock (Claude 3 Haiku) for intelligent analysis
    • S3 for document and result storage
    • IAM for security and access control
  • Database: PostgreSQL (Neon) with SQLAlchemy ORM
  • Auth: Google OAuth 2.0 for secure user authentication
  • Payments: Stripe for subscription management
  • Frontend: Jinja2 templates with vanilla JavaScript for real-time status updates

Development Process with Kiro AI

I used Kiro IDE's spec-driven development workflow to build this entire platform systematically:

  1. Requirements Phase: Defined user stories and acceptance criteria using EARS (Easy Approach to Requirements Syntax)
  2. Design Phase: Created comprehensive architecture with AWS service integration patterns
  3. Implementation Phase: Built features incrementally with 13 structured tasks
  4. Testing: Automated test scripts for AWS connectivity and Bedrock validation

Kiro's AI agent helped me:

  • ✅ Design the three-tier freemium model with quota enforcement
  • ✅ Implement LLM service with specialized prompts for different document types
  • ✅ Create database schema with usage tracking and document history
  • ✅ Build Enterprise API with rate limiting and authentication
  • ✅ Write comprehensive documentation and troubleshooting guides

Architecture Highlights

User Upload → S3 Storage → Textract OCR → Claude 3 Haiku Analysis →
Structured JSON + CSV → S3 Storage → Presigned URLs → User Download

The system processes documents asynchronously with real-time status polling, ensuring a smooth UX even for large multi-page documents.

😅 Challenges we ran into

1. AWS Bedrock Payment Validation 🔥

Hit a wall with INVALID_PAYMENT_INSTRUMENT errors when invoking Claude 3 Haiku. Even with valid payment methods, AWS Marketplace requires separate validation. Solution: Discovered the issue was region-specific (us-west-1 doesn't support Claude 3 Haiku). Migrated to us-east-1 and everything worked perfectly.

2. Database Schema Evolution 🗄️

After implementing LLM features, existing users couldn't log in—the database was missing new columns (llm_analyses_this_month, api_key, document_history table). Solution: Created a migration script that safely adds columns to production databases without data loss.

3. Tier-Based Access Control 🎫

Implementing three subscription tiers with different quotas (documents + LLM analyses) while maintaining a smooth OAuth flow was complex. Solution: Built a decorator-based system that checks quotas before processing and provides clear upgrade prompts.

4. Cross-Region S3 and Bedrock 🌍

Initially had S3 bucket in us-west-1 but needed Bedrock in us-east-1. Solution: Documented that cross-region access works but recommended co-locating resources for optimal performance.

5. Presigned URL Security 🔒

Balancing security (short expiration) with UX (users need time to download). Solution: 5-minute presigned URLs with document history allowing re-generation of download links.

🏆 Accomplishments that we're proud of

Technical Achievements

  • Fully Functional AI Agent: Autonomously processes documents end-to-end without human intervention
  • 💰 Cost-Effective: $0.0016 per document with LLM analysis (96-99% profit margins on paid tiers)
  • Fast Processing: 3-8 seconds total (2-5s OCR + 1-3s LLM analysis)
  • 🔄 Production-Ready: Deployed on Vercel with PostgreSQL, Stripe webhooks, and OAuth
  • 📚 Comprehensive Documentation: 200+ line README with setup guides, troubleshooting, and architecture diagrams

Business Model

Built a real SaaS business with:

  • Freemium acquisition strategy (5 docs/month free)
  • Clear upgrade path ($10/mo Pro, $99/mo Enterprise)
  • API access for Enterprise customers
  • Automated billing and quota enforcement

Code Quality

  • Clean architecture with separation of concerns
  • Reusable LLM service with specialized prompts
  • Automated testing scripts for AWS connectivity
  • Database migration tools for schema updates
  • Comprehensive error handling and retry logic

📚 What we learned

About AI Agents

True AI agents make autonomous decisions. This platform doesn't just follow instructions—it:

  • Analyzes document content to determine type
  • Selects optimal processing strategies
  • Structures unstructured data intelligently
  • Manages its own error recovery and retries

About AWS Bedrock

  • Claude 3 Haiku is incredibly cost-effective: ~$0.0001 per analysis vs $0.0015 for Textract
  • Regional availability matters: Not all models are available in all regions
  • Payment validation is separate: AWS Marketplace requires its own validation beyond regular AWS billing
  • Structured output is reliable: With proper prompting, Claude consistently returns valid JSON

About Kiro IDE

Kiro transformed my development process. The spec-driven workflow forced me to:

  • Think through requirements before coding
  • Design the system architecture comprehensively
  • Break complex features into manageable tasks
  • Document everything as I built

Without Kiro, this would have taken weeks of trial-and-error. Instead, I built a production-ready SaaS in days with clear requirements, solid architecture, and comprehensive documentation.

About Building SaaS

  • Freemium models work: Free tier drives adoption, paid tiers drive revenue
  • Usage-based pricing aligns incentives: Users pay for value received
  • API access is a premium feature: Enterprise customers will pay for programmatic access
  • Documentation is critical: Good docs reduce support burden and increase adoption

🔮 What's next for AI Document Intelligence Platform

Short-Term Enhancements (Next 3 Months)

  • 📋 Template Library: Pre-built extraction templates for common document types (W-2s, 1099s, purchase orders)
  • 🎨 Custom Prompts: Enterprise users can define their own analysis templates
  • 📊 Analytics Dashboard: Usage trends, cost tracking, and document insights
  • 🔔 Webhook Notifications: Real-time alerts when processing completes
  • 🌐 Multi-Language Support: OCR and analysis for non-English documents

Medium-Term Features (6-12 Months)

  • 🤖 Multi-Model Support: Let users choose between Claude, GPT-4, or Gemini
  • 📦 Batch Processing: Upload and process multiple documents simultaneously
  • 🔍 Document Comparison: Diff analysis between document versions
  • 🔗 Integrations: Slack, email, Zapier, and business tool connectors
  • 💾 Advanced Storage: Longer retention periods and archival options

Long-Term Vision (12+ Months)

  • 🧠 Fine-Tuned Models: Custom Claude models trained on user-specific document types
  • 🎯 Confidence Scores: Quality metrics for extracted data
  • 🔗 Entity Linking: Connect extracted entities to knowledge bases (companies, people, products)
  • 📑 Multi-Document Analysis: Summarize and compare across document sets
  • 🌍 Global Deployment: Multi-region support for lower latency worldwide

Cost Optimization Architecture

The platform's dual-architecture approach (Textract + Bedrock) enables intelligent cost optimization:

  • Simple documents: Use Textract only ($0.0015/page)
  • Complex documents: Add Claude analysis ($0.0016/page total)
  • Future: Auto-detect complexity and route accordingly
  • Result: Optimal cost-to-value ratio for every document type

This architecture demonstrates that AI agents can make economic decisions, not just technical ones—choosing the most cost-effective processing path based on document characteristics.


Built for the AWS AI Agent Global Hackathon 2024 🚀

Demonstrating how autonomous AI agents can transform traditional services (OCR) into intelligent platforms that understand, analyze, and structure information without human intervention.

Built With

Share this project:

Updates