Audionote AI

Inspiration

In today’s fast-paced world of remote work, client calls, and team sync-ups, it’s easy to miss key information during meetings. Whether it's action items, decisions, or just remembering who said what — manually taking notes is tedious and inefficient. We wanted to create a smart assistant that not only transcribes conversations but also helps teams summarize, understand sentiments, and extract actionable tasks — all from a single audio recording. That’s how Vaani was born — a multilingual AI meeting companion for the modern team.

What it does

Vaani is an AI-powered meeting assistant that: Transcribes live audio or uploaded audio/video files. Performs speaker identification to differentiate participants. Analyzes the sentiment of each speaker’s contributions. Generates a TL;DR summary, bullet-point insights, and action items from the conversation. Supports multiple languages using Whisper's multilingual capabilities. Allows users to ask questions about the meeting and get context-aware answers. Displays everything in a simple Gradio web interface.

How we built it

We combined the power of open-source AI models and simple web technologies:

🎙️ Whisper (medium) from OpenAI for high-accuracy transcription.
👥 PyAnnote for real-time speaker diarization.
💬 VADER for sentiment analysis of each speaker’s statements.
📄 LLM-based text summarization & Q&A, using prompt engineering to extract summaries and action items.
🧠 MFCC (Mel-Frequency Cepstral Coefficients) for speaker feature extraction.
🖥️ Gradio for building the interactive frontend with live and upload-based workflows.
🚀 Deployed to Heroku for quick and easy access across devices.

Challenges we ran into

Real-time processing: Balancing Whisper's accuracy with performance during live transcription. Speaker identification: Combining diarization results with transcribed chunks required careful timestamp alignment. Audio preprocessing: Chunking long audio files without losing context or speaker identity. Deployment: Managing dependencies like ffmpeg, PyTorch, and large model files on Heroku with limited memory/time.

Accomplishments that we're proud of

Built a fully functional end-to-end pipeline for live and file-based meeting analysis.
Integrated transcription, speaker ID, sentiment analysis, and summarization seamlessly.
Developed a clean, intuitive UI that makes the tool usable for non-technical users.
Deployed the entire app to Heroku, making it portable and accessible.
Created something that’s actually useful and scalable beyond a hackathon!

What we learned

How to work with audio streams and chunked processing.
Real-world applications of Whisper, PyAnnote, and VADER.
Designing and deploying LLM-powered NLP pipelines.
The importance of user experience when building technical tools.
How to collaborate under time pressure and build something from scratch in a limited timeframe.

What's next for Vaani AI

Add support for video meetings (e.g., extract audio from Zoom/Meet recordings).
Integrate calendar + task manager APIs (e.g., Notion, Google Tasks) to sync action items.
Build a browser extension for in-meeting transcription.
Train a custom sentiment model for more nuanced emotional analysis (e.g., stress, sarcasm).
Implement speaker name assignment using manual or facial recognition hooks.
Add real-time transcription translation for global teams.