Inspiration

Professional translation tools choke on real documents tables break, formatting vanishes, and low resource languages like Mongolian are barely supported. We wanted to prove that Gemini's multilingual power could deliver production-grade document translation across 154 languages, for free, while keeping every style and layout intact.

What it does

Upload a file in any of 17 formats (PDF, DOCX, PPTX, XLSX, images, subtitles, and more), pick a target language, and get back a translated document with formatting preserved. Supports 154 languages, auto-detects the source, and lets you lock domain terminology with a custom glossary all with a free Google AI Studio key and zero accounts.

How we built it

A 5-stage Python/Flask pipeline:

  1. Preprocessor Sanitizes OOXML, strips malformed elements, normalizes structure.
  2. Segmenter Extracts translatable text from the XML tree while mapping each segment back to its source node.
  3. Translator Batches segments to Gemini with constrained prompting and async concurrency (5 parallel requests via httpx). Uses surrogate markers to prevent hallucinated markup.
  4. Reintegrator Injects translations back into the original XML skeleton, modifying only text nodes to preserve all formatting.
  5. Consistency Checker Validates cross-segment consistency, numeric formats, and glossary compliance.

Non DOCX formats use Gemini's multimodal capabilities directly. Frontend is vanilla HTML/JS. Deployed on Fly.io.

Challenges we ran into

  • Formatting preservation Treating the DOCX as an XML skeleton and replacing only text nodes was the key breakthrough after early attempts corrupted styles and tables.
  • LLM hallucination Gemini would inject HTML tags into outputs. Solved with aggressive constrained prompting and retry-on-failure validation.
  • Async ordering Index-based reassembly ensures segments return in correct order despite concurrent API calls.
  • Glossary enforcement Terms injected into every system prompt and validated post-translation across all batches.

Accomplishments that we're proud of

  • 154 languages, 17 file formats, single codebase.
  • Translated DOCX files pass round-trip validation formatting intact.
  • Fully stateless pipeline every component is pure input → output.
  • Zero cost for users BYOK, no accounts, nothing stored server-side.

What we learned

  • The OOXML spec is full of edge cases (mc:AlternateContent, nested drawings, numbering references) that demand careful handling.
  • Constrained prompting beats output parsing prevent bad output rather than fixing it.
  • Async concurrency cuts a 2-minute translation to ~25 seconds.
  • Browser localStorage is a viable alternative to user accounts for privacy-sensitive tools.

What's next for Gemini Translator

  • Translation memory cache repeated segments for instant, free re-translation.
  • Batch mode translate entire folders at once.
  • Side-by-side review source vs. translation split view for editing before download.
  • More formats HTML, .po/.xliff localization files, InDesign .idml.
  • Browser extension right-click any document on the web and translate in place.

Built With

  • fastapi
  • fly.io
  • nextjs
  • supabase
  • vercel
Share this project:

Updates