Inspiration
In today’s fast-paced world of remote work, client calls, and team sync-ups, it’s easy to miss key information during meetings. Whether it's action items, decisions, or just remembering who said what — manually taking notes is tedious and inefficient. We wanted to create a smart assistant that not only transcribes conversations but also helps teams summarize, understand sentiments, and extract actionable tasks — all from a single audio recording. That’s how Vaani was born — a multilingual AI meeting companion for the modern team.
What it does
Vaani is an AI-powered meeting assistant that: Transcribes live audio or uploaded audio/video files. Performs speaker identification to differentiate participants. Analyzes the sentiment of each speaker’s contributions. Generates a TL;DR summary, bullet-point insights, and action items from the conversation. Supports multiple languages using Whisper's multilingual capabilities. Allows users to ask questions about the meeting and get context-aware answers. Displays everything in a simple Gradio web interface.
How we built it
We combined the power of open-source AI models and simple web technologies:
- 🎙️ Whisper (medium) from OpenAI for high-accuracy transcription.
- 👥 PyAnnote for real-time speaker diarization.
- 💬 VADER for sentiment analysis of each speaker’s statements.
- 📄 LLM-based text summarization & Q&A, using prompt engineering to extract summaries and action items.
- 🧠 MFCC (Mel-Frequency Cepstral Coefficients) for speaker feature extraction.
- 🖥️ Gradio for building the interactive frontend with live and upload-based workflows.
- 🚀 Deployed to Heroku for quick and easy access across devices.
Challenges we ran into
Real-time processing: Balancing Whisper's accuracy with performance during live transcription. Speaker identification: Combining diarization results with transcribed chunks required careful timestamp alignment. Audio preprocessing: Chunking long audio files without losing context or speaker identity. Deployment: Managing dependencies like ffmpeg, PyTorch, and large model files on Heroku with limited memory/time.
Accomplishments that we're proud of
- Built a fully functional end-to-end pipeline for live and file-based meeting analysis.
- Integrated transcription, speaker ID, sentiment analysis, and summarization seamlessly.
- Developed a clean, intuitive UI that makes the tool usable for non-technical users.
- Deployed the entire app to Heroku, making it portable and accessible.
- Created something that’s actually useful and scalable beyond a hackathon!
What we learned
- How to work with audio streams and chunked processing.
- Real-world applications of Whisper, PyAnnote, and VADER.
- Designing and deploying LLM-powered NLP pipelines.
- The importance of user experience when building technical tools.
- How to collaborate under time pressure and build something from scratch in a limited timeframe.
What's next for Vaani AI
- Add support for video meetings (e.g., extract audio from Zoom/Meet recordings).
- Integrate calendar + task manager APIs (e.g., Notion, Google Tasks) to sync action items.
- Build a browser extension for in-meeting transcription.
- Train a custom sentiment model for more nuanced emotional analysis (e.g., stress, sarcasm).
- Implement speaker name assignment using manual or facial recognition hooks.
- Add real-time transcription translation for global teams.
Log in or sign up for Devpost to join the conversation.