Inspiration
Many organizations struggle with converting static PDF documents into accessible, interactive web content. The rise of AI models like Baidu's ERNIE and PaddleOCR-VL provided the perfect opportunity to build an intelligent, automated solution.
What it does
DocWeb is a web application that transforms PDF documents into responsive webpages using artificial intelligence. The platform:
- Extracts text from multi-page PDFs using PaddleOCR-VL for accurate optical character recognition
- Converts content into structured Markdown format
- Generates HTML webpages with AI-powered styling via ERNIE 4.5
- Exports multiple formats (HTML, Markdown, JSON) for maximum flexibility
- Provides real-time previews of generated content
How I built it
DocWeb was built using a modern, modular architecture:
- Frontend: Streamlit for an intuitive, responsive user interface
- OCR Engine: PaddleOCR-VL for document text extraction
- AI Models: Baidu's ERNIE 4.5 for intelligent HTML generation and styling
- Processing Pipeline: Custom Python modules for PDF extraction, Markdown conversion, and HTML generation
- Styling: CSS-based theming for a clean, professional interface with full customization
The workflow follows a logical five-step process: Upload → Extract → Convert → Generate → Download
Challenges we ran into
Several technical challenges shaped our development:
- OCR Accuracy: Ensuring reliable text extraction from PDFs with varying quality, layouts, and fonts
- Markdown Conversion: Preserving document structure and formatting during conversion
- UI/UX Design: Creating an intuitive interface while maintaining performance with large files
Accomplishments that we're proud of
- Multiple Export Formats: Users can download HTML, Markdown, and JSON—maximizing flexibility
- Real-time Preview: Implemented live preview functionality so users see results instantly
- Error Handling: Robust error management with user-friendly feedback messages
What I learned
This project taught me valuable lessons:
- How to integrate multiple AI services (OCR + LLM) into a cohesive workflow
- The balance between automation and user control in AI applications
- Document processing challenges: PDFs are not uniformly structured, requiring flexible approaches
- The power of combining multiple specialized AI models for superior results
What's next for DocWeb
Future enhancements we're planning:
- Batch Processing: Enable users to convert multiple PDFs simultaneously
- Advanced Styling: Give users templates and customization options for generated HTML
- API Endpoint: Expose DocWeb as an API for enterprise integration
Built With
- ernie
- paddleocr-vl
- streamlit
Log in or sign up for Devpost to join the conversation.