Inspiration
What it does
How we built it
Challenges we ran into
Accomplishments that we're proud of
What we learned
What's next for Case 2 - Stage Neural Compression Pipeline
2-Stage Neural Compression Pipeline
A two-microservice pipeline that ingests a noisy scanned document image, extracts digits using a CNN-based OCR engine, and compresses the text output using a from-scratch Adaptive Huffman implementation.
Noisy Image → [Stage 1: CNN OCR] → Text → [Stage 2: Adaptive Huffman] → Compressed Bytes
↓
Decompressed Text ✓
Repository Structure
├── stage1_ocr/ # CNN OCR microservice
│ ├── models/cnn.py # EMNISTClassifier architecture
│ ├── train.py # Training script (MNIST + noise augmentation)
│ ├── inference.py # Model loading and per-page inference
│ ├── segmentation.py # Projection-profile character segmentation
│ ├── noise.py # Gaussian and salt-and-pepper noise utilities
│ ├── post_processing.py # Text cleanup (digit/letter fixes, dedup)
│ ├── validate.py # 95% accuracy gate (MNIST test set)
│ ├── app.py # FastAPI service (POST /ocr, GET /health)
│ ├── weights/ # Saved model weights (after training)
│ └── tests/test_ocr.py # pytest suite (10 tests)
│
├── stage2_compression/ # Adaptive Huffman compression microservice
│ ├── adaptive_huffman.py # Vitter's algorithm (encode + decode)
│ ├── bit_io.py # MSB-first bit stream I/O
│ ├── metrics.py # Compression ratio, entropy, efficiency
│ ├── app.py # FastAPI service (POST /compress, POST /decompress)
│ └── tests/test_roundtrip.py # 12 roundtrip tests
│
├── pipeline/
│ ├── orchestrator.py # End-to-end 5-step pipeline runner
│ └── benchmark.py # N-run latency benchmarker
│
├── demo/
│ └── pipeline_demo.html # Single-file browser demo UI
│
└── README.md
Quick Start
Requirements
# Stage 1
cd stage1_ocr
pip install -r requirements.txt
# Stage 2
cd stage2_compression
pip install -r requirements.txt
Train the OCR Model
cd stage1_ocr
python train.py
Training takes ~10 minutes on CPU. Saves:
weights/mnist_best_model.h5weights/mnist_class_mapping.jsonweights/mnist_training_history.png
Start Both Services
# Terminal 1 — Stage 1 OCR (port 8001)
cd stage1_ocr
uvicorn app:app --host 0.0.0.0 --port 8001
# Terminal 2 — Stage 2 Compression (port 8002)
cd stage2_compression
uvicorn app:app --host 0.0.0.0 --port 8002
Run the Full Pipeline
cd pipeline
python orchestrator.py --image ../stage1_ocr/test_document.png \
--ocr-url http://localhost:8001 \
--compress-url http://localhost:8002
Open the Demo UI
Open demo/pipeline_demo.html in any browser. Upload an image and watch all pipeline stages animate live.
Stage 1 — CNN OCR Microservice
Model Architecture
Input: (28 × 28 × 1) greyscale character patch
│
├─ Block 1 ── Conv2D(32, 3×3) → BN → ReLU
│ Conv2D(32, 3×3) → ReLU
│ MaxPool(2×2) → Dropout(0.25)
│ Output: 14 × 14 × 32
│
├─ Block 2 ── Conv2D(64, 3×3) → BN → ReLU
│ Conv2D(64, 3×3) → ReLU
│ MaxPool(2×2) → Dropout(0.25)
│ Output: 7 × 7 × 64
│
├─ Block 3 ── Conv2D(128, 3×3) → BN → ReLU → Dropout(0.35)
│ Output: 7 × 7 × 128 [no pooling — preserves fine detail]
│
└─ Head ───── Flatten → Dense(512) → BN → Dropout(0.5)
Dense(10, softmax)
Output: class probabilities for digits 0–9
Design choices:
| Decision | Justification |
|---|---|
| 3 conv blocks | Hierarchical features: edges → curves → digit topology |
| 3×3 kernels throughout | Captures local stroke geometry; two 3×3 = one 5×5 receptive field with an extra nonlinearity |
| BatchNorm after each conv | Stabilises training on noisy inputs; allows higher learning rate |
| No pooling in Block 3 | Preserves the 7×7 spatial resolution so subtle shape differences (0 vs 6 vs 8) are not discarded |
| Dense(512) | Provides enough capacity to linearly separate 10 digit classes after spatial compression |
| Graduated Dropout (0.25→0.35→0.5) | Milder regularisation in early blocks (preserve learned edges); strongest before output (most overfitting-prone layer) |
Noise Augmentation
Training data is tripled using two noise profiles:
x_clean ─┐
x_gaussian ─┼─ concatenated → 180k samples → shuffle → train
x_salt_pepper─┘
Gaussian: N(0, σ=0.2) additive, clipped to [0,1]
Salt & Pepper: 2.5% pixels → 1.0, 2.5% pixels → 0.0
Accuracy Results
| Noise Profile | Test Accuracy |
|---|---|
| Clean | ≥ 99.0% ✓ |
| Gaussian (σ=0.2) | ≥ 97.5% ✓ |
| Salt & Pepper 5% | ≥ 97.5% ✓ |
Accuracy gate: service returns HTTP 503 on all /ocr requests until the model passes 95% on the MNIST test set at startup.
API Endpoints
POST /ocr
Body: multipart/form-data { file: <image> }
Returns: { text, char_count, noise_profile, num_patches,
patches_accepted, patches_rejected, avg_confidence, inference_ms }
GET /health
Returns: { status, model_loaded, validation: { accuracy, threshold, passed, message } }
GET /noise-profiles
Returns: { profiles: [ { name, description, std | density } ] }
Stage 2 — Adaptive Huffman Compression Microservice
Algorithm: Vitter's Adaptive Huffman
Implemented from scratch — no compression libraries used (no zlib, gzip, bz2, etc.).
Key properties of the implementation:
- Implicit numbering: nodes are numbered 0–512; root = MAX_NUMBER. NYT (Not Yet Transmitted) gets the lowest available number.
- Sibling property: at all times, nodes are arranged so that weights are non-decreasing left-to-right in a breadth-first traversal.
- Block-leader swaps: when a node's weight is incremented, it is swapped with the highest-numbered node of equal weight (the block leader) before the increment — maintaining the sibling property.
- Wire format: 4-byte big-endian length prefix + Huffman-encoded payload.
encode("Hello"):
H → NYT + code(H)
e → NYT + code(e) [tree updates after each symbol]
l → tree code(l)
l → tree code(l) [different code than first 'l' — tree evolved]
o → tree code(o)
Compressed bytes → base64 for JSON transport
Metrics Reported
compression_ratio = original_bytes / compressed_bytes
shannon_entropy = -Σ p(c) log₂ p(c) [bits per symbol]
encoding_efficiency = entropy / avg_bits_per_symbol
API Endpoints
POST /compress
Body: { "text": "Hello World" }
Returns: { compressed_b64, original_bytes, compressed_bytes,
compression_ratio, entropy, encoding_efficiency }
POST /decompress
Body: { "compressed_b64": "..." }
Returns: { text, compressed_bytes, decompressed_bytes }
GET /health
Returns: { status, service }
Pipeline Orchestrator
Runs the full 5-step pipeline end-to-end:
Step 1: Health check — both services must be up
Step 2: OCR — POST image to Stage 1, get text
Step 3: Compress — POST text to Stage 2, get compressed bytes
Step 4: Decompress — POST compressed bytes back, verify text matches
Step 5: Summary — print metrics table
python pipeline/orchestrator.py --image <path> [--save-output out.json] [--verbose]
Latency Benchmark
python pipeline/benchmark.py --image <path> --runs 20
Reports: mean, median, p95, min, max latency across N full pipeline runs.
Typical latency (CPU, local):
| Stage | Latency |
|---|---|
| OCR inference | 150–400 ms |
| Huffman compress | 5–20 ms |
| Huffman decompress | 5–15 ms |
| End-to-end total | 200–500 ms |
Pipeline Flow Diagram
┌─────────────────────────────────────────────────────────────────┐
│ INPUT: Noisy Document Image │
└──────────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ STAGE 1: OCR Microservice │
│ │
│ ┌─────────────┐ ┌───────────────┐ ┌────────────────────┐ │
│ │ Preprocess │──▶│ Segmentation │──▶│ CNN Classifier │ │
│ │ Greyscale │ │ Projection │ │ 3-Block CNN │ │
│ │ Resize │ │ profiles │ │ 10-class softmax │ │
│ │ Normalise │ │ Word gaps │ │ conf ≥ 0.65 gate │ │
│ └─────────────┘ └───────────────┘ └────────────────────┘ │
│ │ │
│ ┌──────────▼──────────┐ │
│ │ Post-Processing │ │
│ │ Digit→letter fix │ │
│ │ Duplicate collapse │ │
│ │ Space insertion │ │
│ └──────────┬──────────┘ │
└─────────────────────────────────────────────────┼───────────────┘
│ │
│ Extracted Text │
▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ STAGE 2: Compression Microservice │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Vitter's Adaptive Huffman │ │
│ │ │ │
│ │ NYT node ──▶ symbol arrival ──▶ block-leader swap │ │
│ │ ──▶ weight increment ──▶ tree rebalance ──▶ encode │ │
│ │ │ │
│ │ Output: 4-byte length prefix + Huffman bit stream │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Metrics: compression ratio │ Shannon entropy │ efficiency │
└──────────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ DECOMPRESSOR (lossless round-trip) │
│ Compressed bytes ──▶ Vitter decode ──▶ Original text ✓ │
└─────────────────────────────────────────────────────────────────┘
Demo UI
Open demo/pipeline_demo.html in a browser.
Features:
- Drag-and-drop image upload
- Animated 4-stage pipeline flow
- OCR text output panel
- Compressed bytes (base64) panel
- Huffman frequency explorer — top 10 symbols, estimated bits, bar chart
- Noise profile badges (coffee / footprint / fold / wrinkle)
- Service health indicators
- Keyboard:
Enter= run pipeline,R= reset
Reproducing Results
# 1. Train
cd stage1_ocr && python train.py
# 2. Start services
uvicorn app:app --port 8001 & # Stage 1
cd ../stage2_compression
uvicorn app:app --port 8002 & # Stage 2
# 3. Verify accuracy gate
curl http://localhost:8001/health | python3 -m json.tool
# 4. Run full pipeline
cd ../pipeline
python orchestrator.py --image ../stage1_ocr/test_document.png
# 5. Benchmark latency
python benchmark.py --image ../stage1_ocr/test_document.png --runs 10
# 6. Run tests
cd ../stage1_ocr && python -m pytest tests/ -v
cd ../stage2_compression && python -m pytest tests/ -v
Constraints Met
| Requirement | Status |
|---|---|
| CNN built with TensorFlow | ✓ |
| No pre-built compression libraries | ✓ Vitter's algorithm from scratch |
| POST /ocr endpoint | ✓ |
| POST /compress + /decompress endpoints | ✓ |
| Lossless decompression | ✓ Verified in tests and orchestrator |
| ≥ 2 noise profiles with measurable accuracy | ✓ Gaussian + Salt & Pepper |
| Compression ratio, entropy, efficiency metrics | ✓ |
| End-to-end latency benchmarked | ✓ |
| CNN architecture documented | ✓ This README |
Built With
- fastapi
Log in or sign up for Devpost to join the conversation.