Inspiration
The inspiration for this project stems from a personal "survival crisis" on the streets of Tokyo.
As an engineer who moved to Japan just last year, my limited Japanese proficiency became a significant barrier to daily life. Whether it was navigating procedures at the ward office, describing symptoms at a clinic, or simply ordering at a restaurant, the helplessness caused by communication gaps was overwhelming. I found traditional translation apps almost useless in these real-world scenarios: they were either too slow—making staff wait while I typed—or the translation quality was too robotic to facilitate actual conversation.
To survive, I spent 10,000 JPY per session on human accompaniment translators and tried subscription-based apps costing 6,000 JPY per month. When I discovered the Google Gemini Live API, I realized its real-time interaction and deep contextual understanding could finally replace human intervention. I decided to use my engineering skills to build a tool that could solve my own life-altering difficulties.
What it does
Flash Translation is a real-time voice translation tool designed for extreme speed and human-level quality.
It breaks the outdated "record-wait-translate-play" cycle. Leveraging the full-duplex communication of Gemini Live, it acts as a private interpreter that listens and translates in real-time. It doesn't just convert words; it understands the complexities of Japanese Keigo (honorifics) and nuanced social contexts. Whether it's a professional consultation at a government office or a casual chat at a diner, it provides "lightning-fast" feedback, enabling a barrier-free life in a foreign country.
How we built it
As a developer based in Tokyo, I adopted a hybrid architecture combining native mobile technology with cloud-based AI:
- iOS (Swift) Native Development: Utilized
AVAudioEnginefor deep-level audio capture customization, ensuring stability during high-concurrency mobile usage. - Gemini Live API: Functions as the "core brain" of the app, providing millisecond-level streaming translation responses.
- Gemini 1.5 Flash Correction: I discovered that initial transcriptions could sometimes be inaccurate. To solve this, I implemented a secondary loop where the translated output is passed back through the Gemini 1.5 Flash API to cross-reference and correct the original input transcription, resulting in significantly higher accuracy.
Challenges we ran into
- Environmental Noise Interference: Tokyo's streets and subways are notoriously noisy. To improve recognition, I integrated iOS-native echo cancellation and gain controls to ensure the AI could precisely extract human voices even amidst the background clamor of a busy ramen shop.
- Input Transcription Drifting: When the "Live" transcription faltered, the meaning would be lost. By using the Gemini 1.5 Flash API to perform post-translation "back-correction," I successfully stabilized the input quality, turning a technical flaw into a robust self-healing feature.
Accomplishments that we're proud of
- Practical Self-Rescue: What I am most proud of isn't the lines of code, but the fact that this project genuinely solved my life’s difficulties. I no longer feel anxious when going out to handle official business, and I've eliminated the high cost of human translators.
- Professional-Grade Accuracy: By fine-tuning the interaction between different Gemini models, the app handles administrative and formal Japanese with a level of precision that general-purpose apps simply cannot match.
What we learned
- Technology for Survival: I've realized that the best innovations come from the most acute pain points. An engineer's value lies not just in writing beautiful code, but in building tools that bridge human gaps.
- The Power of AI-Driven Programming: This was my first time using the Swift language. The entire project was built with the help of AI-assisted coding (Antigravity/Gemini). It was a transformative experience to realize that, with the right AI partners, I could build a sophisticated native app almost without writing the boilerplate code by hand.
What's next for Flash Translation
- Offline Mode Exploration: I plan to integrate the open-source TranslateGemma model. This would allow "Pro" users to connect to their own private translation servers.
- On-Device Translation: I am exploring running smaller versions of TranslateGemma directly on the iPhone to achieve high-quality translation even when offline (such as in underground subways).
- Privacy-First Translation: By moving toward local models, I aim to provide a completely private environment for users to handle sensitive documents like Permanent Residency (PR) applications.
Built With
- cloudflare-works
- gemini-3-flash
- gemini-live
- swift
- typescript
Log in or sign up for Devpost to join the conversation.