Inspiration

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for Case 2 - Stage Neural Compression Pipeline

2-Stage Neural Compression Pipeline

A two-microservice pipeline that ingests a noisy scanned document image, extracts digits using a CNN-based OCR engine, and compresses the text output using a from-scratch Adaptive Huffman implementation.

Noisy Image → [Stage 1: CNN OCR] → Text → [Stage 2: Adaptive Huffman] → Compressed Bytes
                                                                        ↓
                                                              Decompressed Text ✓

Repository Structure

├── stage1_ocr/                  # CNN OCR microservice
│   ├── models/cnn.py            # EMNISTClassifier architecture
│   ├── train.py                 # Training script (MNIST + noise augmentation)
│   ├── inference.py             # Model loading and per-page inference
│   ├── segmentation.py          # Projection-profile character segmentation
│   ├── noise.py                 # Gaussian and salt-and-pepper noise utilities
│   ├── post_processing.py       # Text cleanup (digit/letter fixes, dedup)
│   ├── validate.py              # 95% accuracy gate (MNIST test set)
│   ├── app.py                   # FastAPI service (POST /ocr, GET /health)
│   ├── weights/                 # Saved model weights (after training)
│   └── tests/test_ocr.py        # pytest suite (10 tests)
│
├── stage2_compression/          # Adaptive Huffman compression microservice
│   ├── adaptive_huffman.py      # Vitter's algorithm (encode + decode)
│   ├── bit_io.py                # MSB-first bit stream I/O
│   ├── metrics.py               # Compression ratio, entropy, efficiency
│   ├── app.py                   # FastAPI service (POST /compress, POST /decompress)
│   └── tests/test_roundtrip.py  # 12 roundtrip tests
│
├── pipeline/
│   ├── orchestrator.py          # End-to-end 5-step pipeline runner
│   └── benchmark.py            # N-run latency benchmarker
│
├── demo/
│   └── pipeline_demo.html       # Single-file browser demo UI
│
└── README.md

Quick Start

Requirements

# Stage 1
cd stage1_ocr
pip install -r requirements.txt

# Stage 2
cd stage2_compression
pip install -r requirements.txt

Train the OCR Model

cd stage1_ocr
python train.py

Training takes ~10 minutes on CPU. Saves:

  • weights/mnist_best_model.h5
  • weights/mnist_class_mapping.json
  • weights/mnist_training_history.png

Start Both Services

# Terminal 1 — Stage 1 OCR (port 8001)
cd stage1_ocr
uvicorn app:app --host 0.0.0.0 --port 8001

# Terminal 2 — Stage 2 Compression (port 8002)
cd stage2_compression
uvicorn app:app --host 0.0.0.0 --port 8002

Run the Full Pipeline

cd pipeline
python orchestrator.py --image ../stage1_ocr/test_document.png \
                        --ocr-url http://localhost:8001 \
                        --compress-url http://localhost:8002

Open the Demo UI

Open demo/pipeline_demo.html in any browser. Upload an image and watch all pipeline stages animate live.


Stage 1 — CNN OCR Microservice

Model Architecture

Input: (28 × 28 × 1) greyscale character patch
│
├─ Block 1 ── Conv2D(32, 3×3) → BN → ReLU
│             Conv2D(32, 3×3) → ReLU
│             MaxPool(2×2) → Dropout(0.25)
│             Output: 14 × 14 × 32
│
├─ Block 2 ── Conv2D(64, 3×3) → BN → ReLU
│             Conv2D(64, 3×3) → ReLU
│             MaxPool(2×2) → Dropout(0.25)
│             Output: 7 × 7 × 64
│
├─ Block 3 ── Conv2D(128, 3×3) → BN → ReLU → Dropout(0.35)
│             Output: 7 × 7 × 128   [no pooling — preserves fine detail]
│
└─ Head ───── Flatten → Dense(512) → BN → Dropout(0.5)
              Dense(10, softmax)
              Output: class probabilities for digits 0–9

Design choices:

Decision Justification
3 conv blocks Hierarchical features: edges → curves → digit topology
3×3 kernels throughout Captures local stroke geometry; two 3×3 = one 5×5 receptive field with an extra nonlinearity
BatchNorm after each conv Stabilises training on noisy inputs; allows higher learning rate
No pooling in Block 3 Preserves the 7×7 spatial resolution so subtle shape differences (0 vs 6 vs 8) are not discarded
Dense(512) Provides enough capacity to linearly separate 10 digit classes after spatial compression
Graduated Dropout (0.25→0.35→0.5) Milder regularisation in early blocks (preserve learned edges); strongest before output (most overfitting-prone layer)

Noise Augmentation

Training data is tripled using two noise profiles:

x_clean      ─┐
x_gaussian   ─┼─ concatenated → 180k samples → shuffle → train
x_salt_pepper─┘

Gaussian:      N(0, σ=0.2) additive, clipped to [0,1]
Salt & Pepper: 2.5% pixels → 1.0, 2.5% pixels → 0.0

Accuracy Results

Noise Profile Test Accuracy
Clean ≥ 99.0% ✓
Gaussian (σ=0.2) ≥ 97.5% ✓
Salt & Pepper 5% ≥ 97.5% ✓

Accuracy gate: service returns HTTP 503 on all /ocr requests until the model passes 95% on the MNIST test set at startup.

API Endpoints

POST /ocr
  Body:    multipart/form-data  { file: <image> }
  Returns: { text, char_count, noise_profile, num_patches,
             patches_accepted, patches_rejected, avg_confidence, inference_ms }

GET /health
  Returns: { status, model_loaded, validation: { accuracy, threshold, passed, message } }

GET /noise-profiles
  Returns: { profiles: [ { name, description, std | density } ] }

Stage 2 — Adaptive Huffman Compression Microservice

Algorithm: Vitter's Adaptive Huffman

Implemented from scratch — no compression libraries used (no zlib, gzip, bz2, etc.).

Key properties of the implementation:

  • Implicit numbering: nodes are numbered 0–512; root = MAX_NUMBER. NYT (Not Yet Transmitted) gets the lowest available number.
  • Sibling property: at all times, nodes are arranged so that weights are non-decreasing left-to-right in a breadth-first traversal.
  • Block-leader swaps: when a node's weight is incremented, it is swapped with the highest-numbered node of equal weight (the block leader) before the increment — maintaining the sibling property.
  • Wire format: 4-byte big-endian length prefix + Huffman-encoded payload.
encode("Hello"):
  H → NYT + code(H)
  e → NYT + code(e)    [tree updates after each symbol]
  l → tree code(l)
  l → tree code(l)     [different code than first 'l' — tree evolved]
  o → tree code(o)

Compressed bytes → base64 for JSON transport

Metrics Reported

compression_ratio    = original_bytes / compressed_bytes
shannon_entropy      = -Σ p(c) log₂ p(c)   [bits per symbol]
encoding_efficiency  = entropy / avg_bits_per_symbol

API Endpoints

POST /compress
  Body:    { "text": "Hello World" }
  Returns: { compressed_b64, original_bytes, compressed_bytes,
             compression_ratio, entropy, encoding_efficiency }

POST /decompress
  Body:    { "compressed_b64": "..." }
  Returns: { text, compressed_bytes, decompressed_bytes }

GET /health
  Returns: { status, service }

Pipeline Orchestrator

Runs the full 5-step pipeline end-to-end:

Step 1: Health check — both services must be up
Step 2: OCR        — POST image to Stage 1, get text
Step 3: Compress   — POST text to Stage 2, get compressed bytes
Step 4: Decompress — POST compressed bytes back, verify text matches
Step 5: Summary    — print metrics table
python pipeline/orchestrator.py --image <path> [--save-output out.json] [--verbose]

Latency Benchmark

python pipeline/benchmark.py --image <path> --runs 20

Reports: mean, median, p95, min, max latency across N full pipeline runs.

Typical latency (CPU, local):

Stage Latency
OCR inference 150–400 ms
Huffman compress 5–20 ms
Huffman decompress 5–15 ms
End-to-end total 200–500 ms

Pipeline Flow Diagram

┌─────────────────────────────────────────────────────────────────┐
│                     INPUT: Noisy Document Image                 │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│                    STAGE 1: OCR Microservice                    │
│                                                                 │
│  ┌─────────────┐   ┌───────────────┐   ┌────────────────────┐  │
│  │ Preprocess  │──▶│  Segmentation │──▶│   CNN Classifier   │  │
│  │ Greyscale   │   │  Projection   │   │  3-Block CNN       │  │
│  │ Resize      │   │  profiles     │   │  10-class softmax  │  │
│  │ Normalise   │   │  Word gaps    │   │  conf ≥ 0.65 gate  │  │
│  └─────────────┘   └───────────────┘   └────────────────────┘  │
│                                                   │             │
│                                        ┌──────────▼──────────┐  │
│                                        │  Post-Processing    │  │
│                                        │  Digit→letter fix   │  │
│                                        │  Duplicate collapse │  │
│                                        │  Space insertion    │  │
│                                        └──────────┬──────────┘  │
└─────────────────────────────────────────────────┼───────────────┘
                               │                   │
                               │    Extracted Text │
                               ▼                   ▼
┌─────────────────────────────────────────────────────────────────┐
│                STAGE 2: Compression Microservice                │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │              Vitter's Adaptive Huffman                  │    │
│  │                                                         │    │
│  │  NYT node ──▶ symbol arrival ──▶ block-leader swap     │    │
│  │  ──▶ weight increment ──▶ tree rebalance ──▶ encode    │    │
│  │                                                         │    │
│  │  Output: 4-byte length prefix + Huffman bit stream     │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                 │
│  Metrics: compression ratio │ Shannon entropy │ efficiency      │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│               DECOMPRESSOR (lossless round-trip)                │
│  Compressed bytes ──▶ Vitter decode ──▶ Original text ✓        │
└─────────────────────────────────────────────────────────────────┘

Demo UI

Open demo/pipeline_demo.html in a browser.

Features:

  • Drag-and-drop image upload
  • Animated 4-stage pipeline flow
  • OCR text output panel
  • Compressed bytes (base64) panel
  • Huffman frequency explorer — top 10 symbols, estimated bits, bar chart
  • Noise profile badges (coffee / footprint / fold / wrinkle)
  • Service health indicators
  • Keyboard: Enter = run pipeline, R = reset

Reproducing Results

# 1. Train
cd stage1_ocr && python train.py

# 2. Start services
uvicorn app:app --port 8001 &          # Stage 1
cd ../stage2_compression
uvicorn app:app --port 8002 &          # Stage 2

# 3. Verify accuracy gate
curl http://localhost:8001/health | python3 -m json.tool

# 4. Run full pipeline
cd ../pipeline
python orchestrator.py --image ../stage1_ocr/test_document.png

# 5. Benchmark latency
python benchmark.py --image ../stage1_ocr/test_document.png --runs 10

# 6. Run tests
cd ../stage1_ocr && python -m pytest tests/ -v
cd ../stage2_compression && python -m pytest tests/ -v

Constraints Met

Requirement Status
CNN built with TensorFlow
No pre-built compression libraries ✓ Vitter's algorithm from scratch
POST /ocr endpoint
POST /compress + /decompress endpoints
Lossless decompression ✓ Verified in tests and orchestrator
≥ 2 noise profiles with measurable accuracy ✓ Gaussian + Salt & Pepper
Compression ratio, entropy, efficiency metrics
End-to-end latency benchmarked
CNN architecture documented ✓ This README

Built With

  • fastapi
Share this project:

Updates