Case 2- Vision Creator

Benchmark
Accuracy

Inspiration:

We wanted to simulate a real-world document processing system where noisy scanned inputs are transformed into efficient, storage-ready representations. The goal was to combine deep learning for perception (OCR) with classical algorithms for compression, reflecting how modern pipelines balance intelligence and efficiency.

What it does:

The system takes a noisy document image, removes noise, extracts text using a CNN-based OCR model, and then compresses the text using a custom Adaptive Huffman algorithm. It also supports lossless decompression, ensuring the original text is perfectly recovered.

How we built it:

We designed the solution as a two-stage microservice pipeline:

Stage 1 (OCR Service): Median filtering + DnCNN for denoising → CNN (OCRNet) for text extraction Stage 2 (Compression Service): Custom Adaptive Huffman (FGK) encoder + decoder with metrics

Both stages are exposed via FastAPI APIs and connected through a lightweight orchestration script (pipeline_runner.py). The system was trained on MNIST with added noise profiles and evaluated using accuracy, compression ratio, and latency.

Clean Architecture Justification:

We adopted a modular microservice architecture to ensure:

Separation of concerns: OCR and compression are independent, making each component easier to develop, test, and optimize Scalability: Each service can be scaled or replaced independently (e.g., swapping OCR with a transformer-based model later) Maintainability: Clear boundaries between preprocessing, inference, and encoding logic Extensibility: New stages (e.g., language models, better compressors) can be added without breaking the pipeline Reusability: Each service can function as a standalone API in other applications

This mirrors production systems where ML inference and data processing are decoupled.

Challenges we ran into: Handling different noise types (Gaussian vs salt-and-pepper) required combining classical and deep learning denoisers Implementing Adaptive Huffman from scratch while ensuring correctness and efficiency Debugging microservice communication and startup issues (ports, imports, async behavior) Balancing accuracy vs latency under hackathon time constraints

Accomplishments that we're proud of: Achieved ~99% OCR accuracy even on noisy inputs Built a fully working end-to-end system, not just isolated models Implemented lossless compression with 0% failure rate Maintained low latency (~3–4 ms) for the full pipeline Designed a clean, production-like architecture

What we learned: The importance of preprocessing (denoising) in improving model performance How classical algorithms (Huffman) still play a critical role alongside deep learning Practical challenges of deploying ML as microservices How to measure systems beyond accuracy — including latency, efficiency, and reliability

Built With

fastapi
python

Updates

nithishsrirams Srinivasan started this project — Apr 18, 2026 04:53 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.