Inspiration

As a researcher and student struggle with extracting valuable information from research PDFs. Existing tools are either cloud-based (privacy concerns), desktop-only (limited accessibility), or lack advanced features like equation detection and table extraction. I envisioned an in-browser, privacy-first solution that empowers users to intelligently extract and convert PDFs to structured formats without sending data to external servers.

What It Does

Focused Reader is a Chrome extension that transforms how users interact with PDFs:

  • Extract text with multi-layer fallback mechanisms for difficult PDFs.
  • Detect and extract tables using spatial bounding box algorithms.
  • Recognize mathematical equations using TEXO-enhanced OCR.
  • Extract images and associated metadata.
  • Convert to Markdown preserving structure, equations, and tables.
  • Display results in an intuitive side panel with copy/export functionality.

How We Built It

Architecture:

  1. Modular Chrome Extension (Manifest V3).
  2. Offscreen document API for DOM manipulation in service workers.
  3. Web Workers for parallel PDF processing (4-8 concurrent threads).
  4. Custom sliding window algorithm for memory-efficient large document handling.
  5. Webpack bundling with polyfills for Node.js APIs in the browser environment.

Key Decisions:

  • Browser-native approach for privacy and instant performance.
  • Hugging Face Transformers.js for client-side ML inference (no backend needed).
  • Canvas API for image processing and table crop extraction.
  • KaTeX for equation rendering with LaTeX support.

Challenges We Ran Into

  1. PDF Variability - PDFs from different sources have inconsistent encoding; required building fallback text extraction layers.
  2. Memory Constraints - Processing large PDFs (1000+ pages) caused memory overflow; solved with sliding window algorithm.
  3. Browser Limitations - Canvas and image processing limited; used creative region detection techniques.
  4. Equation Detection Accuracy - ML models had false positives; improved with spatial context and bounding box filtering.
  5. Bundle Size - Multiple ML libraries caused significant bloat; optimized dependencies and code splitting.
  6. Table Recognition - No single reliable algorithm; implemented hybrid spatial analysis + image pattern detection.

Accomplishments We're Proud Of

4-8x Performance Improvement - Parallel processing with Web Workers.
60% Memory Reduction - Sliding window algorithm enables processing of 1000+ page documents.
95%+ Equation Detection Accuracy - Custom TEXO-enhanced OCR implementation.
Privacy-First Architecture - Zero data leaves the browser; no backend required.
Elegant UI/UX - Clean side panel interface with intuitive result navigation.
Production-Ready Code - 40+ comprehensive tests, benchmarking utilities, and performance profiling.
Polished Extension - Seamless Chrome integration with support for multiple PDFs simultaneously.

What We Learned

  • Web Workers are powerful - Proper parallelization can make seemingly impossible tasks feasible.
  • Browser APIs are underutilized - Canvas, ImageData, and OffscreenCanvas enable sophisticated document processing.
  • ML in the browser is mature - Hugging Face Transformers.js brings cutting-edge models to the client-side.
  • User privacy matters - Users appreciate knowing their data never leaves their device.
  • Performance optimization is iterative - Small algorithmic improvements compound into significant wins.
  • Testing is essential - Edge cases in PDF handling require extensive test coverage.
  • Modular architecture scales - Separating concerns (text, tables, equations, images) keeps code maintainable.

What's Next for Focused Reader

  • Advanced table detection with OCR.
  • Data visualization and chart extraction.
  • Multi-language OCR support.
  • Integrating AI summarizing and search within processed documents.
  • Enhance the detection of Tables and LaTeX equations and extract them more accurately.

Built With

Share this project:

Updates