About the project Janavaani — meaning “The People’s Voice” — is a multilingual AI voice assistant designed to bridge communication gaps across languages, emotions, and geographies. Built for enterprises, governments, and the public, Janavaani offers real-time transcription, translation, emotion detection, and summarization of spoken conversations.
Inspiration
We were inspired by the growing need for inclusive voice technologies that work not just in English, but in regional languages like Telugu, Hindi, and Tamil. From customer support to citizen services, millions are left behind because machines can’t understand their native speech or emotions. Janavaani aims to fix that.
What it does
Speech-to-Text: Real-time transcription in English, Hindi, Telugu & more. Emotion Detection: Understands speaker emotions (anger, joy, sadness, etc.). Language Translation: Converts regional speech to English and vice versa. Text Summarization: Generates concise summaries of long conversations. Speaker Diarization (Optional): Identifies “who said what” in a multi-speaker conversation.
How we built it
Python 3.10, FastAPI for backend APIs. Whisper for multilingual speech-to-text. Hugging Face Transformers for translation (MarianMT), summarization, and emotion detection. PyTorch + open-source models only — no paid APIs. Modular architecture using our mani-ai framework. Tested with sample audio files across 3 languages.
Challenges we ran into
Language detection errors during translation of mixed-language audio. Ensuring emotion detection accuracy in low-resource languages. Managing resource constraints (RAM/VRAM) while using large models locally. Handling real-time audio processing and streaming.
Accomplishments that we're proud of
Built a complete end-to-end voice AI system using only open-source tools. Successfully supported multilingual and emotional analysis. Created a modular project that can be extended or productized. Aligned with real-world enterprise needs like call analytics, support bots, and smart city citizen feedback.
What we learned
How to integrate and optimize multiple open-source AI models. Best practices for handling multilingual and emotionally rich speech. The importance of fallback handling and user-centric design in voice AI systems.
What's next for Janavaani
Add speaker diarization and real-time streaming UI. Train custom emotion models for Indian languages. Build a mobile/web frontend for public use. Partner with smart city initiatives and enterprises for pilots. Open-source Janavaani to empower local voice AI innovation.
Log in or sign up for Devpost to join the conversation.