AWS Hackathon: Smart Document Intelligence [# Freedom for Data]

What Inspired Me

Document digitization crisis in corporate environments. Companies process 100+ daily documents (invoices, POs, GARs, LRs, e-way bills, agreements, expenses) requiring unique binding numbers that vary per company , also continue to grapple with inefficient workflows, rising operational costs, and poor decision-making because data trapped in documents isn’t accessible or usable.Traditional OCR systems fail because:

  • Static technology - no learning capability from processed data
  • Complex layout handling failure - struggles with diverse document structures
  • Multi-sector terminology confusion - same concepts labeled differently across industries
  • Manual verification bottleneck - unreliable extraction requires human intervention

What I Learned

From the above Problem : The digitization crisis isn’t about simply moving paper to digital. It’s about transforming how businesses capture, process and to resource in any form from the data locked in documents. As AI continue to evolve, they offer a promising solution to overcome the limitations of traditional OCR systems.

Full-stack AI architecture implementation:

  • ICR with RAG model integration - architecting intelligent character recognition with retrieval-augmented generation
  • Local LLM optimization - LLAMA model fine-tuning for document parsing and embedding generation
  • AWS serverless ecosystem mastery - DynamoDB session management, S3-SQS integration via Lambda functions
  • Frontend development - React with Three.js for interactive document interface
  • Containerization expertise - Docker deployment, EC2 integration, Lambda layers management

Technical Architecture & Workflow

Core Components:

  • Frontend (20%) - React interface with Three.js
  • AWS Backend (80%) - Lambda, S3, DynamoDB, SQS, API Gateway

AWS Architecture Workflow (80% Backend - 100% under Free tire limitation):

Document Processing Pipeline:

  1. Frontend (20%) → Post/Get requests to AWS Lambda via API Gateway
  2. Session Management → DynamoDB stores session ID with document metadata
  3. Document Storage → Presigned URLs enable direct S3 upload via SQS queuing
  4. LLAMA LLM Processing → Parsing and document extraction mapped to session ID
  5. Data Persistence → Parsed data and extraction results stored in DynamoDB along parsed embeddings

Complete AWS Service Integration:

  • API Gateway → Routes frontend requests to Lambda functions
  • Lambda Functions → Handle LlamaGet, Llama_Sqs, Llama_Dynamic_Retrival, Llama_Embedding_Func
  • DynamoDB → Session-based data storage and retrieval
  • S3 Bucket → Document storage with presigned URL access
  • SQS → Parallel document processing queue management
  • AWS Billing → Zero Budget Limitation

RAG Model Implementation:

User Query (Frontend) → API Gateway → Lambda (Llama_Dynamic_Retrival) → 
DynamoDB (Retrieve Session Data) → Lambda (Llama_Embedding_Func) → 
Embedding Processing → LLM Augmented Generation → Response with Parsed Chunks

RAG Processing Details:

  • User Queries → Frontend sends session ID and query
  • Dynamo Data Retrieval → Gets parsed chunks and embeddings for session
  • LLM Data Generation → Embeddings processing and augmentation
  • Response Generation → Contextual answers based on document content

Key Challenges Faced

Primary Technical Challenge: Running LLM through Docker containers and deploying to EC2 while maintaining AWS free-tier limits.

Specific Issues:

  • Container optimization - Pushing LLAMA model via Docker to EC2
  • Lambda layer management - Image container integration for LLM processing
  • Resource constraints - Bringing LLM functionality within free-tier architecture
  • Performance optimization - Balancing accuracy with processing speed without auto scaling and cold start issues caused by llms

Market Impact & Corporate Applications

Target Market: Corporations across multiple sectors handling extensive paperwork workflows.

Key Value Propositions:

  • Precise extraction accuracy vs traditional OCR failure rates
  • Intelligent prompting - extract specific data requirements through natural language
  • Session-based storage - all extracted data preserved for future reference
  • Natural language queries - clarify extracted content through chat interface
  • Workflow automation - replace manual verification with AI-powered processing

Use Cases:

  • Financial documents - Invoice processing, PO matching, expense management
  • Legal documentation - Agreement processing, contract analysis
  • Supply chain - GAR documents, LR processing, e-way bill management
  • Cross-sector adaptation - handles varying terminologies and formats

Competitive Advantage

ICR vs OCR Enhancement:

  • Adaptive learning - improves with processed data
  • Context understanding - handles complex layouts and multi-sector terminology
  • Prompt-based extraction - users specify exactly what data to extract
  • RAG-powered queries - natural language interaction with extracted data
  • Serverless scalability - AWS architecture handles variable document volumes

Built With

  • apigateway
  • dynamo
  • ec2
  • huggingface
  • lambda
  • llama
  • ollama
  • python
  • react
  • s3
  • sqs
  • torch
Share this project:

Updates