ERNIE & PaddleOCR Implementation Proof

Live Demo: https://sanketnawale.github.io/ernie-warmup-task/

The deployed webpage clearly shows our AI-powered pipeline:

PaddleOCR-VL Usage:

  • Tool: Baidu AI Studio PaddleOCR-VL API
  • Input: z/OS TSO/E Command Reference PDF (448 pages)
  • Output: 1,031,508 extracted characters
  • Code: See step1_extract_pdf_v2.py in repository

ERNIE 4.0 API Usage:

  • API: erniebot Python SDK with ernie-4.0-turbo-8k model
  • Input: Extracted PDF content
  • Output: Generated HTML structure + CSS styling (15,253 characters)
  • Code: See step2_generate_webpage.py in repository

Processing Pipeline: PDF → PaddleOCR-VL → ERNIE 4.0 → GitHub Pages

Visible Proof: The live webpage displays " Built with AI" section with full attribution to both technologies.##Inspiration Working with IBM mainframe documentation, I noticed how valuable technical PDFs are locked away in non-searchable, non-interactive formats. I wanted to leverage ERNIE's AI capabilities to automate the transformation of these resources into modern, accessible web pages.

What it does

This project automatically converts complex technical PDFs into beautiful, responsive web pages using a two-step AI pipeline: PaddleOCR-VL intelligently extracts text from 448-page IBM z/OS Command Reference ERNIE 4.0 generates clean, structured HTML with modern CSS styling GitHub Pages hosts the final result at https://sanketnawale.github.io/ernie-warmup-task/

How I built it

Tech Stack:

PaddleOCR-VL (via Baidu AI Studio) for intelligent document parsing

ERNIE 4.0 API for AI-powered HTML generation

Python 3 with PyPDF2 for fallback text extraction

GitHub Pages for zero-cost hosting

Pipeline:

python

Step 1: Extract PDF content

python step1_extract_pdf_v2.py

Processes 448 pages → 1,031,508 characters

Step 2: Generate webpage

python step2_generate_webpage.py

ERNIE 4.0 creates HTML/CSS → 15,253 characters

Step 3: Deploy

git push origin main

GitHub Pages auto-deploys

Challenges I ran into

Large PDF processing: Initial OCR attempts hit API limits; solved by implementing chunk-based processing ERNIE prompt engineering: Required iteration to get clean HTML output without unnecessary wrappers GitHub Pages deployment: Fought with Jekyll workflow errors; switched to simple branch deployment Character encoding: Handled special mainframe characters and formatting preservation

Accomplishments

Successfully processed 1M+ characters from complex technical documentation Generated production-ready responsive HTML in under 2 minutes Created fully automated pipeline requiring zero manual intervention IBM-themed professional design with mobile responsiveness

What I learned

Advanced prompt engineering for ERNIE 4.0 to generate structured output PaddleOCR-VL's capabilities for technical document understanding GitHub Pages deployment strategies and troubleshooting Efficient chunking strategies for large document processing

What's next

Add search functionality to the generated webpage Support for multiple PDF formats and languages Interactive table of contents generation Batch processing for documentation libraries Integration with CI/CD pipelines for automated doc publishing

Built With

Share this project:

Updates