Flash Translation

app screenshot

Inspiration

The inspiration for this project stems from a personal "survival crisis" on the streets of Tokyo.

As an engineer who moved to Japan just last year, my limited Japanese proficiency became a significant barrier to daily life. Whether it was navigating procedures at the ward office, describing symptoms at a clinic, or simply ordering at a restaurant, the helplessness caused by communication gaps was overwhelming. I found traditional translation apps almost useless in these real-world scenarios: they were either too slow—making staff wait while I typed—or the translation quality was too robotic to facilitate actual conversation.

To survive, I spent 10,000 JPY per session on human accompaniment translators and tried subscription-based apps costing 6,000 JPY per month. When I discovered the Google Gemini Live API, I realized its real-time interaction and deep contextual understanding could finally replace human intervention. I decided to use my engineering skills to build a tool that could solve my own life-altering difficulties.

What it does

Flash Translation is a real-time voice translation tool designed for extreme speed and human-level quality.

It breaks the outdated "record-wait-translate-play" cycle. Leveraging the full-duplex communication of Gemini Live, it acts as a private interpreter that listens and translates in real-time. It doesn't just convert words; it understands the complexities of Japanese Keigo (honorifics) and nuanced social contexts. Whether it's a professional consultation at a government office or a casual chat at a diner, it provides "lightning-fast" feedback, enabling a barrier-free life in a foreign country.

How we built it

As a developer based in Tokyo, I adopted a hybrid architecture combining native mobile technology with cloud-based AI:

iOS (Swift) Native Development: Utilized AVAudioEngine for deep-level audio capture customization, ensuring stability during high-concurrency mobile usage.
Gemini Live API: Functions as the "core brain" of the app, providing millisecond-level streaming translation responses.
Gemini 1.5 Flash Correction: I discovered that initial transcriptions could sometimes be inaccurate. To solve this, I implemented a secondary loop where the translated output is passed back through the Gemini 1.5 Flash API to cross-reference and correct the original input transcription, resulting in significantly higher accuracy.

Challenges we ran into

Environmental Noise Interference: Tokyo's streets and subways are notoriously noisy. To improve recognition, I integrated iOS-native echo cancellation and gain controls to ensure the AI could precisely extract human voices even amidst the background clamor of a busy ramen shop.
Input Transcription Drifting: When the "Live" transcription faltered, the meaning would be lost. By using the Gemini 1.5 Flash API to perform post-translation "back-correction," I successfully stabilized the input quality, turning a technical flaw into a robust self-healing feature.

Accomplishments that we're proud of

Practical Self-Rescue: What I am most proud of isn't the lines of code, but the fact that this project genuinely solved my life’s difficulties. I no longer feel anxious when going out to handle official business, and I've eliminated the high cost of human translators.
Professional-Grade Accuracy: By fine-tuning the interaction between different Gemini models, the app handles administrative and formal Japanese with a level of precision that general-purpose apps simply cannot match.

What we learned

Technology for Survival: I've realized that the best innovations come from the most acute pain points. An engineer's value lies not just in writing beautiful code, but in building tools that bridge human gaps.
The Power of AI-Driven Programming: This was my first time using the Swift language. The entire project was built with the help of AI-assisted coding (Antigravity/Gemini). It was a transformative experience to realize that, with the right AI partners, I could build a sophisticated native app almost without writing the boilerplate code by hand.

What's next for Flash Translation

Offline Mode Exploration: I plan to integrate the open-source TranslateGemma model. This would allow "Pro" users to connect to their own private translation servers.
On-Device Translation: I am exploring running smaller versions of TranslateGemma directly on the iPhone to achieve high-quality translation even when offline (such as in underground subways).
Privacy-First Translation: By moving toward local models, I aim to provide a completely private environment for users to handle sensitive documents like Permanent Residency (PR) applications.