VisionTranslate Extension

Post-translation
Pre-translation
VT Logo

Inspiration

The inspiration for this project resulted from the brainstorming ideas of several group members. Those ideas then condensed into the concept of building a project centered on allowing users to translate image text based on their own API access (or lack-thereof). This was inspired by prior experience attempting to use paywalled OCR extensions to translate images such as manga panels.

What it does

VisionTranslate is a chrome extension that allows a user to select an image with text on the screen and translate it using one of four methods available to them. The standard methods include using PaddleOCR or MangaOCR through a backend server, using Tesseract.js which runs through the browser, or using an API key to use Google Cloud Vision. Once the user specifies which method they'd like to use, their API key, and any specific language settings, they can click a button that appears on the upper right with any given image containing text. That text is then translated into the desired language, then posted onto the image where the original text once was.

How we built it

We initially started with an extension basis developed using Claude that provided the framework for a translation based chrome extension. After the repository was initialized with the framework code, we began editing the code for bug fixes and specializing for specific goals. This began with initial bug testing to get the extension UI to appear, followed by bug testing of the OCR calls, translation calls, and overlay process.

Challenges we ran into

We ran into 2 primary challenges. The first one being the use of APIs while testing requiring us to find reliable sources of keys to be able to test the program in the first place. The second challenge was resolving the overlay issues. The differences in pixel counting and the ways in which websites rejected the canvas or overlay formatting caused various visual bugs.

Accomplishments that we're proud of

The accomplishment we're most proud of was our ability to create a quality and varied image translator in the given time that we worked. There were a myriad of unique JS and HTML issues that we carefully analyzed and overcame to provide an incredibly advanced project, especially considering that this was the first Hackathon of 3/4 team members.

What we learned

The main thing we learned from this experience in working with each other is the importance of file management, careful terminal control, and thorough communication. Many of our set backs resulted from github conflicts. However, throughout the project we learned how to work more carefully as a team, and take full advantage of our resources to work parallel to eachother.

What's next for VisionTranslate Extension

The next steps for VisionTranslate is to extend its use cases and consistency so it has improved function with image contexts such as comics. Additionally, a context system and a firefox equivalent are plans within VisionTranslate's future.