Inspiration
PDFs are still one of the most common ways information is shared, but they’re often difficult to read, navigate, and publish online—especially on mobile devices. We wanted to explore how AI could bridge the gap between static documents and modern web content by automatically transforming PDFs into clean, accessible websites.
What it does
Doc2Web automatically converts a PDF into a fully deployable static website. It extracts text and layout information from the PDF, restructures the content, and generates a clean, responsive web page that can be hosted on GitHub Pages with no manual editing required.
How we built it
PaddleOCR-VL is used to extract text and layout information from the input PDF.
The extracted content is converted into structured Markdown while preserving headings and sections.
The Markdown is sent to ERNIE via API to improve readability and generate semantically structured HTML.
The final output is assembled into a static website using HTML and CSS, ready for deployment on GitHub Pages.
The entire pipeline is automated and beginner-friendly.
Challenges we ran into
Preserving document structure from PDFs with inconsistent layouts
Ensuring the generated HTML followed a logical heading hierarchy
Balancing automation with readability so the output didn’t feel “AI-generated”
Keeping the pipeline simple while meeting all warm-up task requirements
Accomplishments that we're proud of
Successfully built a full PDF → Website pipeline using PaddleOCR-VL and ERNIE
Generated a deployable website with no manual content cleanup
Created a clear, reproducible workflow suitable for beginners
Completed the official Warm-Up Task requirements end-to-end
What we learned
OCR quality strongly affects downstream AI generation
Prompt design plays a major role in turning raw text into clean HTML
ERNIE is effective at restructuring and improving extracted document content
Clear documentation and simple architecture matter as much as technical depth
What's next for Doc2Web: PDF to Website Generator
Support for multi-page navigation instead of a single page
Better table and image handling
Theme customization for generated websites
Optional multilingual output using ERNIE
Log in or sign up for Devpost to join the conversation.