Project Showcase: Gemator

Gemator was born out of watching a friend suffer through the absolute manual grind of part-time manhwa translation. If you’ve ever been in that workflow, you know it’s a chaotic mess of context-switching, cleaning raw scans, and repetitive data entry.

We realized that while "live-screen" translation exists on modern mobile OSs, it’s largely useless for manga because it doesn't handle scrolling. You lose the flow, the art gets obscured, and the context breaks. We built Gemator to bridge that gap—combining Web-App flexibility with Computer Vision to create a tool that actually "gets" the medium.


How We Built It

We didn’t want Gemator to just "read" text; we wanted it to understand the scene.

  • The Brains: We used Google AI Studio for prototyping and Gemini LLMs for the heavy lifting.
  • The Workflow: To keep the art clean, Gemator uses an in-painting feature to scrub original Korean text before slapping the new translation on top. This maintains the visual integrity of the original panel.
  • The Stack: The frontend is built with ReactJS, while the backend is a Python/Flask powerhouse running PyTorch and EasyOCR.

Development & Prototyping with Gemini-CLI

To speed up our development cycle, we integrated Gemini-CLI. This was a game-changer for testing prompt engineering and translation nuances without leaving the terminal. Instead of jumping back and forth between the web UI and our IDE, we could pipe OCR outputs directly into the CLI to see how different Gemini models handled specific dialogue strings in real-time. This kept our feedback loop tight and our terminal-centric workflow uninterrupted.


The Technical Hurdle: Coordinate Mapping

Precision is everything in manhwa. We rely on strict coordinate mapping to ensure that the in-painting mask and the new text placement align perfectly with the original scan. For every detected text bubble, we define the spatial boundaries as:

Getting the OCR and in-painting to output something satisfactory within a tight deadline required some serious late-night fine-tuning of these bounding box thresholds.

Challenges & What’s Next

The biggest hurdle wasn't a specific bug, but the classic balancing act: performance vs. latency. Making the vision analysis fast enough to feel "live" while maintaining accuracy is an ongoing battle.

We’re still refining the core product. Right now, we haven't done deep testing on complex cultural nuances or honorifics, and we’re focusing on perfecting the basic "read-and-replace" functionality before moving on to fancy font support or typesetting. It’s a work in progress, but we’re stoked about how much the Google AI suite—especially the CLI tools—helped us move from concept to a working prototype in record time.

Built With

  • easyocr
  • flask;
  • gemini-cli
  • gemini-cli;-meta:-reactjs
  • gemini-llms
  • google:-google-ai-studio
  • python
  • pytorch;
  • react
Share this project:

Updates