Inspiration

In today’s fast-paced world of remote work, client calls, and team sync-ups, it’s easy to miss key information during meetings. Whether it's action items, decisions, or just remembering who said what — manually taking notes is tedious and inefficient. We wanted to create a smart assistant that not only transcribes conversations but also helps teams summarize, understand sentiments, and extract actionable tasks — all from a single audio recording. That’s how Vaani was born — a multilingual AI meeting companion for the modern team.

What it does

Vaani is an AI-powered meeting assistant that: Transcribes live audio or uploaded audio/video files. Performs speaker identification to differentiate participants. Analyzes the sentiment of each speaker’s contributions. Generates a TL;DR summary, bullet-point insights, and action items from the conversation. Supports multiple languages using Whisper's multilingual capabilities. Allows users to ask questions about the meeting and get context-aware answers. Displays everything in a simple Gradio web interface.

How we built it

We combined the power of open-source AI models and simple web technologies:

  • 🎙️ Whisper (medium) from OpenAI for high-accuracy transcription.
  • 👥 PyAnnote for real-time speaker diarization.
  • 💬 VADER for sentiment analysis of each speaker’s statements.
  • 📄 LLM-based text summarization & Q&A, using prompt engineering to extract summaries and action items.
  • 🧠 MFCC (Mel-Frequency Cepstral Coefficients) for speaker feature extraction.
  • 🖥️ Gradio for building the interactive frontend with live and upload-based workflows.
  • 🚀 Deployed to Heroku for quick and easy access across devices.

Challenges we ran into

Real-time processing: Balancing Whisper's accuracy with performance during live transcription. Speaker identification: Combining diarization results with transcribed chunks required careful timestamp alignment. Audio preprocessing: Chunking long audio files without losing context or speaker identity. Deployment: Managing dependencies like ffmpeg, PyTorch, and large model files on Heroku with limited memory/time.

Accomplishments that we're proud of

  • Built a fully functional end-to-end pipeline for live and file-based meeting analysis.
  • Integrated transcription, speaker ID, sentiment analysis, and summarization seamlessly.
  • Developed a clean, intuitive UI that makes the tool usable for non-technical users.
  • Deployed the entire app to Heroku, making it portable and accessible.
  • Created something that’s actually useful and scalable beyond a hackathon!

What we learned

  • How to work with audio streams and chunked processing.
  • Real-world applications of Whisper, PyAnnote, and VADER.
  • Designing and deploying LLM-powered NLP pipelines.
  • The importance of user experience when building technical tools.
  • How to collaborate under time pressure and build something from scratch in a limited timeframe.

What's next for Vaani AI

  • Add support for video meetings (e.g., extract audio from Zoom/Meet recordings).
  • Integrate calendar + task manager APIs (e.g., Notion, Google Tasks) to sync action items.
  • Build a browser extension for in-meeting transcription.
  • Train a custom sentiment model for more nuanced emotional analysis (e.g., stress, sarcasm).
  • Implement speaker name assignment using manual or facial recognition hooks.
  • Add real-time transcription translation for global teams.

Built With

Share this project:

Updates