Inspiration

I was inspired by the Metabolism First: Iron–Sulfur World hypothesis, which explains how life may have originated from inorganic molecules near hydrothermal vents. I wanted to make this complex scientific content accessible online in a simple, interactive way. What it does

This project converts a scientific PDF about the Metabolism First: Iron–Sulfur World hypothesis into a clean, interactive webpage. It extracts text, images, and layout from the PDF, formats it into Markdown, and displays it online so anyone can easily read and explore the content. Users can view the webpage on any device without needing to open the PDF.

How I Built It

Used PaddleOCR-VL to extract text and layout from the PDF.

Converted the extracted content into Markdown format.

Built a static webpage using the extracted Markdown.

Deployed the webpage to GitHub Pages for public access.

Challenges Faced

Ensuring the formatting of equations and references from the PDF were preserved.

Converting complex text with footnotes and scientific terms into readable HTML.

Optimizing the webpage for readability across devices. Accomplishments that we're proud of

Successfully converted a detailed scientific PDF into a clean, readable webpage.

Preserved the structure, headings, and references from the original document.

Created a webpage that is easy to access and share without needing the original PDF.

What we learned

How to use OCR (PaddleOCR-VL) to extract text and layout from a PDF.

How to structure scientific content for the web while maintaining clarity.

Basics of turning static documents into interactive, shareable webpages.

What's next for Metabolism First Webpage

Add interactive features like collapsible sections and search functionality.

Include diagrams, images, or animations to better explain the content.

Expand the tool to convert more PDFs on scientific topics into webpages automatically.

Built With

  • browser
  • css
  • devpost
  • github
  • html
  • paddleocr-vl-for-text-extraction
  • pdf-file-as-source-document
Share this project:

Updates