Inspiration

The idea for DocLingo came from the frustration of seeing Mandarin–English PDF translations lose their structure, formatting, and visual clarity. Existing tools either break the layout, distort tables, or give plain text instead of a usable PDF. We wanted to create a system that respects the original design—one that feels like the same document, just in a different language.

What it does

DocLingo is an end-to-end PDF translation system that extracts text, understands layout, translates content using transformer-based models, and reconstructs a new PDF that mirrors the original formatting. It preserves fonts, spacing, tables, images, and the overall geometry using bounding box calculations like (x_0, y_0, x_1, y_1). The output is a clean, fully translated PDF with the same structure as the original.

Challenges we ran into

Our biggest challenge was maintaining layout fidelity after translation. English and Mandarin vary significantly in text length, so translated sentences often became wider or narrower, causing misalignment. Ensuring correct font rendering, scaling translated text to fit original bounding boxes, and handling complex objects like tables or embedded images required careful engineering. OCR accuracy on low-quality PDFs also introduced inconsistencies that needed correction.

Accomplishments that we're proud of

We’re proud that DocLingo successfully produces structurally identical translated PDFs—something many mainstream tools fail to do. Achieving stable layout reconstruction, integrating OCR with structured translation, and automating the entire pipeline from input PDF to final output felt like a major milestone. Seeing the final translated PDF visually match the original was especially rewarding.

What we learned

We gained a deep understanding of how PDFs work internally—not as text documents but as coordinate-based canvases. We learned to extract text using OCR, parse layout trees, and use transformer models for accurate translation. We also learned spacing management, font handling, and layout reconstruction using geometric calculations. This project significantly strengthened our understanding of NLP, OCR, and document engineering.

What's next for Doclingo

Next, we plan to integrate more language pairs, support handwritten OCR, and add an AI-based layout correction module to automatically fix distorted or noisy documents. We also aim to build a web interface, enable batch translation for enterprises, and incorporate a quality-check model that flags translation inconsistencies. Ultimately, DocLingo will evolve into a universal, layout-aware multilingual document translation platform.

Built With

Share this project:

Updates