Inspiration

The idea behind this project came from wanting to understand how AI tools like PaddleOCR and ERNIE can automate simple but time-consuming tasks. PDF documents, especially CVs, are commonly used but not very interactive. I wanted to transform a static PDF into a clean, responsive webpage entirely through an AI pipeline. The warm-up task felt like a great way to learn how OCR and LLMs can work together to build something useful in minutes.

What it does

The project converts a PDF CV into a fully generated HTML webpage using PaddleOCR-VL and ERNIE.
It:

  • Extracts text and structure from a PDF using OCR
  • Converts the extracted content into Markdown
  • Uses ERNIE to turn the Markdown into a modern HTML webpage
  • Publishes the page automatically using GitHub Pages

The final result is a clean, responsive, AI-generated personal website.

How we built it

  1. PaddleOCR-VL was used to extract text and layout from the CV PDF.
  2. The OCR output was cleaned and formatted into Markdown so ERNIE could understand the structure.
  3. ERNIE generated a fully styled HTML webpage with responsive design using the Markdown as input.
  4. The generated HTML was saved as index.html and uploaded to a GitHub repository.
  5. GitHub Pages was enabled to instantly deploy the website online.

Challenges we ran into

  • Character limits in the ERNIE interface required condensing the Markdown and restructuring the content.
  • Markdown formatting needed to be carefully cleaned so the final webpage would look professional.
  • GitHub Pages configuration initially resulted in a blank page due to the wrong folder being selected for deployment. Switching Pages to serve from the root fixed the issue.

Accomplishments that we're proud of

  • Successfully building a complete AI pipeline with OCR → Markdown → HTML → Deployment.
  • Creating a polished, responsive webpage generated entirely by AI.
  • Finishing the warm-up task with a smooth workflow that can be reused for more complex applications.

What we learned

  • How OCR models extract and structure PDF content.
  • How to prepare Markdown to maximize the quality of HTML generated by ERNIE.
  • How LLMs can automate UI layout and styling.
  • How to deploy static websites easily using GitHub Pages.
  • The importance of input cleaning, prompt design, and handling tool limitations.

What's next for Web Builder: Build a Web Page with PaddleOCR & ERNIE

  • Adding customizable themes so users can choose different styles.
  • Allowing users to upload any PDF (reports, articles, resumes) and auto-generate a website.
  • Integrating multilingual support using ERNIE to translate content before generating the webpage.
  • Expanding the pipeline into a full “PDF-to-Website AI Builder” platform for non-technical users.

Built With

Share this project:

Updates