ECHO | Devpost

Web Application view
Mobile view

Inspiration

ECHO was born from a personal, pressing need. After my roommate Sunday and I relocated from Nigeria to Kigali, Rwanda, to study at Carnegie Mellon University, we were immediately confronted with a significant communication barrier. Everyday tasks, from going to the market to hiring a plumber, were a constant challenge because we didn't speak Kinyarwanda.

We realized that if we were facing this problem, countless other students, tourists, and expatriates were too. And Lucia (our team member) rigorously validated this. We weren't content with just getting by; we wanted to build a real solution. That's why we created ECHO: to solve a problem we lived every single day.

What it does

ECHO is a voice translation application designed for seamless, real-time conversations. It breaks down language barriers by allowing users to speak in their native tongue and have it instantly translated and spoken aloud in another language.

Our current version focuses on the languages most relevant to our initial use case:

English
Kinyarwanda
Swahili

The goal is to make cross-lingual conversations as natural and fluid as speaking to a friend.

How we built it

We built ECHO as a mobile-first Progressive Web App (PWA) with a focus on speed and user experience.

Frontend: React, TypeScript, Tailwind CSS
Backend: Express.js, PostgreSQL
AI & Core Logic: We leveraged the power of Google's Gemini API for its advanced speech-to-text, translation, and text-to-speech capabilities.

To tackle latency, we engineered an innovative pipeline that, in some cases, performs direct audio-to-audio translation, skipping the intermediate text transcription step entirely to make the conversation faster.

Challenges we ran into

Our single biggest challenge has been latency. For a conversation to feel natural, translation needs to be almost instantaneous. Any delay creates an awkward pause, defeating the purpose of a seamless communication tool.

We've been obsessed with minimizing this delay. We've already closed the latency gap twice by:

Strategically using faster, less computationally-intensive models.
Building a direct audio-to-translated text flow that bypasses the traditional speech-to-text -> text-to-text cascade.

Achieving a truly real-time experience remains our top priority and our most exciting technical challenge.

Accomplishments that we're proud of

Beyond the technical implementation, our proudest accomplishment is seeing the human impact. There is nothing more rewarding than demoing the app to a native Kinyarwanda speaker and seeing them nod their head and smile when they hear the clear, accurate translation.

That moment of understanding, powered by something we built, is the ultimate validation. It proves that ECHO isn't just code; it's a bridge between people.

What we learned

This project has been a journey of immense learning.

The Need is Real: We validated that this tool solves a critical problem, not just for us, but for a whole community of expatriates, students, and tourists in Rwanda.
"Low-Resource" Doesn't Mean "No-Resource": While Kinyarwanda is considered a low-resource language in the AI world, we discovered that modern LLMs like Gemini can handle it with surprising accuracy, even if they haven't been explicitly fine-tuned for it. This was a game-changing realization.
Constraints Drive Innovation: Our battle with latency forced us to think outside the box and develop more efficient solutions than the standard, cascaded approach.

What's next for ECHO

We're just getting started. Our vision is to evolve ECHO into a comprehensive, multi-modal communication platform. Our roadmap includes:

SMS Translation: Integrate with your phone to translate incoming text messages on the fly.
Image Translation: Use your camera to translate text on signs, menus, and documents in real-time.
Live Call Translation: Our ultimate goal—to enable a phone call where you speak your language, and the person on the other end hears it in theirs, creating a truly barrier-free conversation.