Inspiration

Many organizations struggle with converting static PDF documents into accessible, interactive web content. The rise of AI models like Baidu's ERNIE and PaddleOCR-VL provided the perfect opportunity to build an intelligent, automated solution.

What it does

DocWeb is a web application that transforms PDF documents into responsive webpages using artificial intelligence. The platform:

  • Extracts text from multi-page PDFs using PaddleOCR-VL for accurate optical character recognition
  • Converts content into structured Markdown format
  • Generates HTML webpages with AI-powered styling via ERNIE 4.5
  • Exports multiple formats (HTML, Markdown, JSON) for maximum flexibility
  • Provides real-time previews of generated content

How I built it

DocWeb was built using a modern, modular architecture:

  • Frontend: Streamlit for an intuitive, responsive user interface
  • OCR Engine: PaddleOCR-VL for document text extraction
  • AI Models: Baidu's ERNIE 4.5 for intelligent HTML generation and styling
  • Processing Pipeline: Custom Python modules for PDF extraction, Markdown conversion, and HTML generation
  • Styling: CSS-based theming for a clean, professional interface with full customization

The workflow follows a logical five-step process: Upload → Extract → Convert → Generate → Download

Challenges we ran into

Several technical challenges shaped our development:

  • OCR Accuracy: Ensuring reliable text extraction from PDFs with varying quality, layouts, and fonts
  • Markdown Conversion: Preserving document structure and formatting during conversion
  • UI/UX Design: Creating an intuitive interface while maintaining performance with large files

Accomplishments that we're proud of

  • Multiple Export Formats: Users can download HTML, Markdown, and JSON—maximizing flexibility
  • Real-time Preview: Implemented live preview functionality so users see results instantly
  • Error Handling: Robust error management with user-friendly feedback messages

What I learned

This project taught me valuable lessons:

  • How to integrate multiple AI services (OCR + LLM) into a cohesive workflow
  • The balance between automation and user control in AI applications
  • Document processing challenges: PDFs are not uniformly structured, requiring flexible approaches
  • The power of combining multiple specialized AI models for superior results

What's next for DocWeb

Future enhancements we're planning:

  • Batch Processing: Enable users to convert multiple PDFs simultaneously
  • Advanced Styling: Give users templates and customization options for generated HTML
  • API Endpoint: Expose DocWeb as an API for enterprise integration

Built With

  • ernie
  • paddleocr-vl
  • streamlit
Share this project:

Updates