This project started as a warm-up task to explore the end-to-end capabilities of PaddleOCR-VL and ERNIE for real-world document understanding and publishing. The initial inspiration was to see how far a static document like a PDF could be transformed into a modern, web-ready format using AI—without relying on complex backend infrastructure.
During the warm-up phase, I built a complete pipeline that extracts layout-aware text and images from PDFs using PaddleOCR-VL, converts the output into structured Markdown, and then uses ERNIE to generate a clean, professional HTML web page. The final output is deployed using GitHub Pages, demonstrating how legacy documents can be modernized and published with minimal operational overhead.
While completing the warm-up task, I began extending the idea into a more ambitious Application-Building task: an Intelligent Purchase Order Validation System. In this concept, PaddleOCR-VL is used to parse purchase order PDFs with complex layouts, and ERNIE is used not just for text generation, but for reasoning and validation. ERNIE evaluates extracted purchase order data against predefined business rules to determine approval or rejection. For approved cases, the system was designed to generate structured outputs such as validation explanations, audit summaries, and even convert purchase orders into secure EDI-style formats, with ERNIE assisting in the transformation and communication steps (such as drafting approval emails).
Through this process, I learned how critical layout-aware OCR is for downstream reasoning tasks, how prompt design directly impacts structured outputs from large language models, and how combining deterministic rule checks with LLM-based reasoning leads to more robust enterprise applications.
Overall, this project reflects both a completed, production-quality warm-up solution and a forward-looking application concept that showcases the strengths of PaddleOCR-VL and ERNIE in document understanding, reasoning, and intelligent automation.
Built With
- apis:-paddleocr-vl-api-(document-layout-parsing-and-ocr)
- ernie
- ernie-api-(content-transformation-and-web-page-generation)
- githubpages
- googlecoolab
- paddleocr
- python
Log in or sign up for Devpost to join the conversation.