Inspiration

My inspiration came from the language barriers that prevent effective communication across different regions. Especially with the cost associated with using the services. Ie wanted to create a tool that harnesses the power of on-device AI to make communication seamless and accessible to everyone, regardless of the language they speak. Gemini Nano provided an excellent offline AI capability that allowed me to realize this vision. And everything being on-device is a cherry on top of the cake.

What it does

Zubaan is a multilingual translation assistant that allows users to get real-time translations via text and voice inputs, and even generate speech responses in their preferred language (english for now). The core idea is to make conversations smoother by providing instant translation and speech synthesis, all powered by the Gemini Nano & Transformers.js.

How we built it

I built Zubaan using Google Gemini Nano for translation, Transformers.js with the Huggingface model Xenova/speecht5_tts for speech generation, and a web worker for faster processing. The core tech stack includes a web-based frontend developed in React, with integration using built-in AI i.e. Gemini Nano APIs. I also used a custom worker for handling speech-to-text and text-to-speech conversions, ensuring smooth user experience even offline. I implemented efficient state management and focused on a modular architecture to make future expansions easy.

Challenges we ran into

Since Gemini Nano is still in early preview, the challenges I faced included the constantly-changing APIs, and limitations around generating responses in languages other than English. The chrome team was indeed very helpful in addressing these issues. I also faced some challenges in optimizing the memory usage alongside providing a really good UI/UX, while adapting our application to work offline effectively. Another key challenge was speech to text, which included using Transformers.js. With the content available online, it really took some time before I could get something running smoothly :)

Accomplishments that we're proud of

I am incredibly proud of having created an AI-driven language tool that works even offline, thanks to Gemini Nano and Transformers.js. The real-time language translation combined with voice input, speech to text generation (english only), and the seamless experience for end-users, is a significant milestone. I also managed to implement an intuitive user interface that helps users switch languages effortlessly, making our tool accessible for non-tech-savvy users.

What we learned

I learned a great deal about the intricacies of language translation, speech to text, and the challenges of working with Transformers.js and hugging face models. The process of building Zubaan taught me about handling edge cases in language detection, optimizing API usage for performance, and enhancing user interaction design for a global audience.

What's next for Zubaan - Gemini Nano

I plan to expand Zubaan by adding more languages, and playing around with models specifically for the target languages with the models available for Transformers.js. Particularly languages that are less represented in mainstream translation tools. Another step forward will be integrating the ability to summarize long conversations, which would be incredibly helpful for users in business meetings or customer support. I also intend to improve the AI's ability to adapt to accents dynamically and add more personalization options for users to make Zubaan truly their own multilingual assistant.

Built With

Share this project:

Updates