Inspiration
We all like reading webcomics, but they're usually not natively written in English and take weeks to get translated. To prevent our binge sessions from getting cut short, we made Comic Buddy, a chrome extension that translates detected comic pages into English in real time.
What it does
Comic Buddy watches for large webcomic images in the active tab, sends them to a FastAPI service for OCR and translation, and drops styled overlays back on the page. It caches results so panels stay translated across scrolls and lets you re-run the translation for the current, last, or all detected images from the popup UI.
How we built it
The Chrome extension (Manifest v3) uses a content script to track candidate images, manage overlay rendering, and coordinate cache state via chrome.storage. A service worker fetches the binary image, converts it to base64, and calls the /analyze endpoint. On the backendGoogle Vision OCR, KD-tree style grouping heuristics detect and cluster word polygons, and use FastAPI to send results to translators (Cerebras or Gemini) with batching, retry, and rate-limit handling. A lightweight context store persists recent bubbles so the translator can keep dialogue consistent.
Challenges we ran into
Manifest v3’s service-worker sandbox meant reimplementing image fetching (including referer handling) instead of relying on content-script XHR. OCR outputs are fragmented words, so we built geometry-based grouping plus merge heuristics to keep speech bubbles intact. Translation APIs came with strict rate limits and flaky JSON responses, so we had to use batching, throttling, and schema validation layers. Keeping overlays aligned required careful scaling between intrinsic image size and layout changes as the user resized the viewport.
Accomplishments that we’re proud of
The dynamic overlay and font autosizing makes text feel like it belongs in the comic style. To reduce clutter and API usage, we detect only comic panels for translation based on their display size and we assign each candidate a stable data attribute so intersection observers can follow lightbox or infinite-scroll layouts. We also added a context-aware translation pipeline that threads previous lines into new requests, improving tone and pronoun consistency.
What we learned
Bridging browser surfaces with backend AI, especially for comics where asthetics really do matter, means stamping even tiny visual artifacts while also refining prompts and context windows to preserve the imact of the narrative. Lightweight persistence (JSON + locks) can be enough for hackathon-scale memory without dragging in a database. Most importantly, user perception hinges on latency and reliability, so fallbacks and retries have as much UX impact as the core AI model.
What’s next for Comic Buddy
Next up: richer language coverage, smarter bubble detection (likely via lightweight vision models), and automated tests for the grouping/translation pipeline. We also want to expose translation history, polish the popup UX, and explore on-device or edge caching so the buddy feels instant even on longer chapters.

Log in or sign up for Devpost to join the conversation.