Inspiration

We were frustrated with the limitations of traditional PDF-to-HTML converters. They produce static, inaccessible output that loses all the semantic meaning of the original document. Tables remain as boring text, data stays locked away, and sensitive information gets sent to cloud services without any privacy protection.

When we discovered ERNIE's multimodal vision capabilities on Novita AI, we saw an opportunity to build something revolutionary: a system that doesn't just extract text, but actually sees and understands document structure — detecting tables that should become charts, Q&A sections that could be interactive quizzes, and timelines that deserve visualization.

We also wanted to solve the privacy problem. Why should sensitive documents be sent to cloud AI services? Our solution: process locally first, scrub PII, then only send sanitized content to the cloud.


What it does

DocuMorph AI transforms PDF documents into interactive, responsive HTML webpages through an intelligent pipeline:

1. Privacy-First Local Processing

  • Extracts text using PaddleOCR (runs entirely on your machine)
  • Detects and redacts PII (emails, phone numbers, SSNs, credit cards) using Presidio
  • Nothing sensitive ever leaves your computer

2. ERNIE Vision Analysis

  • Sends page images to ERNIE-4.5-VL (vision model)
  • AI "sees" the document layout and detects:
    • Tables → Suggests bar/line/pie charts
    • Q&A sections → Suggests interactive quizzes
    • Date sequences → Suggests timeline widgets
    • Locations → Suggests interactive maps

3. Human-in-the-Loop Co-Design

  • Review all AI suggestions before generation
  • Edit OCR errors, approve/reject PII redactions
  • Choose themes, toggle widgets on/off
  • You're in control, not the AI

4. Interactive HTML Generation

  • ERNIE-4.5 generates semantic HTML with proper structure
  • Chart.js visualizations for data
  • Interactive quiz widgets
  • Responsive design (desktop/tablet/mobile)
  • WCAG AA accessible

5. One-Click Deployment

  • Deploy to GitHub Pages, Netlify, Vercel, or AWS S3

How we built it

Architecture

┌─────────────────────────────────────────────────────────────┐
│                 LOCAL PROCESSING (Privacy Zone)              │
│  PDF → PaddleOCR → Presidio PII Detection → Content Blocks  │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│              CO-DESIGN LAYER (Human Review)                  │
│  Edit blocks • Approve PII • Select theme • Choose widgets  │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                CLOUD AI (Novita AI - ERNIE)                  │
│  ERNIE-4.5 (Text) → Theme analysis, HTML generation         │
│  ERNIE-4.5-VL (Vision) → Page image analysis, detection     │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                      OUTPUT                                  │
│  Interactive HTML • Charts • Quizzes • Responsive Design    │
└─────────────────────────────────────────────────────────────┘

Tech Stack

Layer Technologies
Backend FastAPI, Python 3.11, Uvicorn
OCR PaddleOCR (local processing)
PII Detection Microsoft Presidio + spaCy
AI Models ERNIE-4.5 + ERNIE-4.5-VL via Novita AI
Frontend React 18, TypeScript, Vite, Tailwind CSS
Real-time WebSocket for live progress updates
Visualization Chart.js, Recharts

ERNIE Model Integration

We use two ERNIE models for true multimodal processing:

ERNIE-4.5-21B (Text Model)

  • Analyzes document content for theme suggestions
  • Generates semantic HTML structure
  • Creates accessible, responsive output

ERNIE-4.5-VL-28B (Vision Model)

  • Analyzes PDF page images (not just text!)
  • Detects visual patterns: tables, charts, Q&A layouts
  • Suggests appropriate widget transformations

Real API response from our vision analysis: Page 0 - Quiz Detection:

{
  "tables": [],
  "quizzes": [{"question_count": 2, "type": "multiple_choice|true_false"}],
  "timelines": [],
  "maps": [],
  "confidence": 1.0,
  "description": "A quiz document with multiple choice and true/false questions about CPU and Python."
}

Page 1 - Timeline Detection:

{
  "tables": [{"chart_type": "line", "data_summary": "The data shows chronological project milestones over time, from project kickoff to final release."}],
  "quizzes": [],
  "timelines": [{"event_count": 5}],
  "maps": [],
  "confidence": 1.0,
  "description": "The page contains a project timeline with dates and milestones for a project from kickoff to final release."
}

Page 2 - Geographic Location Detection:

{
  "tables": [],
  "quizzes": [],
  "timelines": [],
  "maps": [{"location_count": 4}],
  "confidence": 1.0,
  "description": "The page lists geographic locations and addresses of offices in Bengaluru, Mumbai, New York, and London."
}

Challenges we ran into

1. PaddleOCR Compatibility The newer PaddleX version changed APIs significantly. We had to implement fallback methods and timeout protection to handle different versions gracefully.

2. Real-Time Processing Coordinating WebSocket updates across OCR, PII detection, and AI analysis required careful async handling to provide smooth progress feedback.

3. Vision Model Prompt Engineering Getting ERNIE-VL to return structured JSON for component detection took iteration. We needed specific prompts to identify tables vs. text blocks reliably.

4. Privacy vs. Functionality Balance We wanted maximum privacy (local processing) but also needed cloud AI capabilities. The solution: process and sanitize locally, only send cleaned content to ERNIE.


Accomplishments that we're proud of

✅ True Multimodal AI Usage Not just text — we actually use ERNIE Vision to analyze page images and detect visual patterns.

Privacy-First Architecture Sensitive documents stay local. PII is scrubbed before any cloud calls.

** Complete Working Pipeline** End-to-end: Upload → OCR → PII → Vision Analysis → HTML Generation → Deployment

** Human-in-the-Loop Design** AI suggests, humans decide. No black-box automation.

** Real-Time Feedback** WebSocket updates show exactly what's happening at each step.


🧠 Auto-Generated Knowledge Graph

This feature alone separates DocuMorph AI from 90% of PDF converters. Instead of just extracting text, we build an interactive knowledge graph that shows how concepts, sections, and entities in your document relate to each other. Every knowledge graph is generated live using ERNIE AI. Here's actual server logs from a real document processing:

2026-01-01 22:16:07 | INFO | Generating knowledge graph for document 2bb7c96c-6602-4684-b54b-ebcb53438b97
2026-01-01 22:16:24 | INFO | Generated graph with 23 nodes and 16 edges
POST /api/knowledge-graph/2bb7c96c-6602-4684-b54b-ebcb53438b97/generate HTTP/1.1 200 OK
POST /api/knowledge-graph/2bb7c96c-6602-4684-b54b-ebcb53438b97/simplify HTTP/1.1 200 OK

The AI extracts entities and relationships in ~17 seconds, creating a fully interactive graph visualization.

How It Works

┌─────────────────────────────────────────────────────────────────────────────┐
│                    KNOWLEDGE GRAPH PIPELINE                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  PDF Document                                                               │
│      │                                                                      │
│      ▼                                                                      │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  ENTITY EXTRACTION (ERNIE AI)                                       │   │
│  │  ─────────────────────────────                                      │   │
│  │  • Sections: Chapter 1, Section 2.1, etc.                           │   │
│  │  • Concepts: Key terms, theories, definitions                       │   │
│  │  • People: Authors, researchers, historical figures                 │   │
│  │  • Dates: Important dates, time periods                             │   │
│  │  • Locations: Places, countries, cities                             │   │
│  │  • Tables/Figures: Data summaries                                   │   │
│  │  • Organizations: Companies, institutions                           │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                              │                                              │
│                              ▼                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  RELATIONSHIP DETECTION (AI-Powered)                                │   │
│  │  ─────────────────────────────────────                              │   │
│  │  • references: "See Section 3 for details"                          │   │
│  │  • builds_on: "Building on concepts from Chapter 1"                 │   │
│  │  • summarizes: "Table X summarizes data from Section Y"             │   │
│  │  • defines: Section defines a concept                               │   │
│  │  • contains: Parent-child hierarchy                                 │   │
│  │  • related_to: Semantic similarity                                  │   │
│  │  • contrasts / supports: Argument relationships                     │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                              │                                              │
│                              ▼                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  INTERACTIVE VISUALIZATION (vis.js Compatible)                      │   │
│  │  ─────────────────────────────────────────────                      │   │
│  │  • Color-coded nodes by entity type                                 │   │
│  │  • Force-directed layout for natural clustering                     │   │
│  │  • Click-to-navigate: Jump to any section instantly                 │   │
│  │  • Hover tooltips with full context                                 │   │
│  │  • Simplify mode for co-design preview                              │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Why This Matters

Traditional PDF converters give you a flat Table of Contents. We give you a semantic map of your document:

Traditional ToC DocuMorph Knowledge Graph
Linear list of sections Interactive node-edge visualization
No relationships shown AI-detected references, dependencies
Static navigation Click any node to jump to content
Text-only Color-coded by entity type
Manual creation Auto-generated by ERNIE AI

Real API Response

When we process a research paper, the knowledge graph API returns real data. Here's actual server output:

2026-01-01 22:16:07 | INFO | Generating knowledge graph for document 2bb7c96c-...
2026-01-01 22:16:24 | INFO | Generated graph with 23 nodes and 16 edges
{
  "document_id": "abc123",
  "nodes": [
    {"id": "entity_section_1", "label": "Introduction", "type": "section", "color": "#4e79a7"},
    {"id": "entity_ai_0", "label": "Machine Learning", "type": "concept", "color": "#f28e2c"},
    {"id": "entity_ai_1", "label": "Neural Networks", "type": "concept", "color": "#f28e2c"},
    {"id": "entity_date_1234", "label": "January 2024", "type": "date", "color": "#76b7b2"}
  ],
  "edges": [
    {"from": "entity_section_1", "to": "entity_ai_0", "type": "defines", "label": "defines"},
    {"from": "entity_ai_1", "to": "entity_ai_0", "type": "builds_on", "label": "builds on"}
  ],
  "metadata": {
    "total_nodes": 24,
    "total_edges": 31,
    "entity_types": ["section", "concept", "person", "date", "table"],
    "relationship_types": ["contains", "precedes", "references", "builds_on", "defines"]
  }
}

Entity Types & Visual Encoding

Entity Type Color Shape Example
Section Blue (#4e79a7) Box "Chapter 1: Introduction"
Concept Orange (#f28e2c) Ellipse "Machine Learning"
Person Red (#e15759) Circle "Alan Turing"
Date Teal (#76b7b2) Diamond "January 2024"
Location Green (#59a14f) Triangle "San Francisco"
Table Yellow (#edc949) Square "Table 3: Results"
Organization Brown (#9c755f) Database "OpenAI"

Human-in-the-Loop: Simplify Before Export

Users can preview and simplify the graph in the co-design layer:

# Simplify to top 15 nodes, focusing on sections and concepts
POST /api/knowledge-graph/{id}/simplify
{
  "max_nodes": 15,
  "entity_types": ["section", "concept"]
}

This ensures the final HTML doesn't overwhelm readers with a 100-node graph — they see a clean, navigable overview.

Frontend Integration

The React frontend renders the knowledge graph using vis.js with click-to-navigate:

// Click any node to jump to that section in the document
network.on('click', (params) => {
  if (params.nodes.length > 0) {
    const node = graph.nodes.find(n => n.id === params.nodes[0]);
    if (node.data.block_id) {
      document.getElementById(node.data.block_id).scrollIntoView({ behavior: 'smooth' });
    }
  }
});

What we learned

  • ERNIE Vision is powerful: It can understand document layouts, not just read text
  • Privacy and AI can coexist: Local preprocessing + cloud AI is a viable pattern
  • Human oversight matters: AI suggestions with human approval beats full automation
  • Novita AI makes ERNIE accessible: Easy API access with generous free credits

What's next for DocuMorph AI

  • More widget types: Flowcharts, org charts, Gantt charts
  • Batch processing: Handle multiple PDFs at once
  • Template marketplace: Share and download HTML themes
  • Collaboration: Multiple users co-designing the same document
  • Mobile app: Process PDFs directly from phone camera

Built With

  • ernie
  • ernie-4.5-vl
  • novita-ai
  • fastapi
  • python
  • react
  • typescript
  • paddleocr
  • presidio
  • websocket
  • tailwindcss
  • chartjs

Built With

Share this project:

Updates