Manhwa translator

Inspiration

We like reading manhwa. Sometimes, the official English translation is at least a week behind the Korean "raws," but we don't want to wait for those translations to come out. This browser extension is meant to bridge that gap.

What it does

Our browser extension parses Korean comics on toptoon.com (the most popular website for raw Korean manhwa) and translates speech bubbles into English.

How we built it

We built the frontend web extension component in HTML/CSS/JS. We built our backend API with FastAPI and containerized it with Docker. To translate from Korean to English, we used the Claude API along with structured outputs.

For image processing:

EasyOCR: used to detect bounding boxes for Korean text in the manhwa images
Pillow: used to draw the translated text on top of the original bounding boxes
YOLO: used to detect speech bubbles

Challenges we ran into

Figuring out how to integrate our extension into toptoon.com presented several challenges:

Toptoon detects when a user opens their browser's development tools (i.e., inspect element) and will automatically reroute you to a page that says "please turn off your browser devtools." To get around this, we loaded the page source in a different tab (to bypass the detection) and found that the site was loading a script called "devtools-detector" that uses some heuristics to detect when the user has their devtools open. We added a rule to uBlock that prevents this script from loading.
If we naively call an OCR model on the manhwa images, they will detect every piece of Korean text, including pieces of text that represent sound effects. It's relatively easy to translate and replace text that's inside a speech bubble, since they often have a white background. However, text that's outside a bubble or has a nontraditional font can be hard to detect and also hard to replace. To combat this, we introduced a YOLO model to detect speech bubbles. This allows us to only run translation on text inside speech bubbles. While this means that some Korean will be left untranslated, we believe that's preferable to translated gibberish.
Containerization with Docker was quite painful. We initially wanted to use another OCR library called "PaddleOCR." This presented significant challenges because the library has non-traditional python dependencies that have to be pulled from an index hosted in China (rather than PyPy). Additionally, the library requires that the underlying hardware is x86, but most of us develop on M1 Macs (based on ARM).
It was particularly challenging to figure out how to replace the images in a toptoon.com webpage. Rather than loading the manhwa images into HTML <img> tags, toptoon dynamically loads the images into HTML <canvas> tags as base64-encoded strings. To actually grab the images (so we can send them to our backend), we needed to override a specific event listener that pulls their images from a CDN and redirect those requests to our backend instead, which returns the translated image.
The image libraries we use run much faster when GPU-enabled. However, Docker for Mac cannot pass through the GPU to a container, meaning that we had to run the image models in CPU-only mode.

Accomplishments that we're proud of

While we believe our product is not completely finished, we did get a working MVP that can translate ~85% of the speech bubbles in toptoon.com manhwa.

What we learned

Manhwa and online manga sites often use tricks that make it harder to reverse-engineer their code. They probably don't want people figuring out their CDN URLs.
Image processing needs to be done on a GPU, which means it can be difficult to containerize apps that require image processing. There is an NVIDIA plugin for Docker that can pass through host GPUs, but that's only supported on Windows machines.
uv (Python dependency management system) doesn't play well with some older Python libraries.
How to set up a Python project with uv alongside Docker.