Inspiration

I have always wanted to explore Large Language Models (LLMs) to automate real-world tasks and better understand how they work internally. This interest aligned with my professor’s suggestion to investigate techniques for identifying and summarizing key findings from monthly developmental plans published on city council websites. These documents are usually released as PDFs with complex layouts, images, and tables, making them difficult to process automatically. This challenge inspired me to experiment with PaddleOCRVL for document understanding and ERNIE 4.5 for intelligent content generation.


What it does

Website Builder using PaddleOCRVL & ERNIE 4.5 converts PDF documents into fully functional web pages. The pipeline extracts text, layout information, and images from PDFs, converts them into Markdown, uses an LLM to generate clean HTML, and deploys the result as a live website using GitHub Pages.


How I built it

I created a Python virtual environment and installed the required Paddle and OCR libraries. Using PaddleOCRVL, I implemented a pipeline to extract structured content and images from PDF documents and convert them into Markdown format. I initially tested this workflow on my resume and successfully transformed it into a polished webpage using ERNIE 4.5.

Next, I processed a four-page visualization report containing images and structured text. PaddleOCRVL accurately extracted the elements, although the conversion took around 45 minutes since it was run locally on CPU. For webpage generation, I used the ERNIE 4.5 Chat Completion API to convert the Markdown content into HTML. Finally, I deployed both the resume and the visualization report as live websites using GitHub Pages.


Challenges I ran into

I faced multiple environment and compatibility issues while building this project. A safetensors/framework paddle is invalid error occurred during OCR processing, which I resolved by uninstalling the safetensors library and installing the full paddlex[ocr] dependencies. I also attempted to run PaddleOCRVL on Google Colab using a GPU, but version incompatibilities between Paddle, PaddleX, and Colab’s environment made this approach infeasible.

Additionally, I tried to run ERNIE 4.5 locally using FastDeploy, but this was not possible on Windows due to Unix-only dependencies such as the resource module. To overcome this, I switched to using the ERNIE 4.5 API via Novita.ai for reliable inference.


Accomplishments that I am proud of

I successfully built an end-to-end pipeline that transforms PDF documents into deployed websites. Despite multiple technical challenges, I was able to adapt and find alternative solutions, such as switching from local inference to API-based inference. Completing the project while exploring a new ecosystem and unfamiliar models is something I am particularly proud of.


What I learned

Through this project, I gained hands-on experience with the Paddle ecosystem, including PaddleOCR and PaddleX. I learned how OCR models handle layout detection, image extraction, and multilingual text recognition. I also developed a deeper understanding of the trade-offs between local and API-based LLM inference, environment setup challenges, and performance considerations when running large models on CPU and Windows systems.


What's next for Website Builder using PaddleOCRVL & Ernie 4.5

As a next step, I plan to deploy this pipeline as a Streamlit application on Streamlit Cloud so that users can upload PDFs and automatically convert them into visually appealing websites. I also aim to reduce the processing time by exploring GPU-backed OCR APIs and further optimize the PDF-to-Markdown conversion process. Ultimately, I would like to extend this system to summarize and extract key insights from city council development plans automatically.

Built With

  • ernie4.5
  • novita
  • paddleocrvl
  • paddlex
  • python
Share this project:

Updates