Inspiration
Every meeting ends the same way — people walk out unsure of who owns what. Studies show professionals spend nearly 5 hours per week just re-processing meeting notes and manually assigning tasks. We built MeetingMind because we were tired of that problem, and believed a single hand gesture could solve it.
The vision came from watching surgeons operate — they cannot touch a keyboard mid-procedure, yet they coordinate entire teams through voice and gesture. Why should meeting participants be any different?
What it does
MeetingMind is an AI-powered meeting platform that replaces Google Meet with an intelligent layer on top of video conferencing.
During a meeting, users can raise an open hand to start recording an important clip, and close their fist to stop. The system automatically:
- Transcribes the clip using JigsawStack STT with speaker diarization
- Sends the transcript to Gemini 2.5 Flash for analysis
- Extracts action items and assigns them to the correct person based on who said what
- Reads the summary aloud using Web Speech API
- Displays tasks instantly on a live Kanban board visible to all participants
No keyboard. No copy-pasting. No forgetting who owns what.
How we built it
Frontend: React + Tailwind CSS with a split-panel layout — video call on the left, live transcript and Kanban board on the right.
Gesture engine: MediaPipe Hands running entirely in the browser via WebAssembly, detecting 21 hand landmarks at 30fps with no backend dependency. Open hand = start recording, fist = stop and process, peace sign = replay summary.
Video layer: LiveKit for multi-user real-time video — self-hostable, no credit card required, handles WebRTC complexity.
AI pipeline: Audio chunks stream to ASP.NET Core backend every 2 seconds via MediaRecorder API. JigsawStack transcribes each chunk, then Gemini 2.5 Flash analyzes the full transcript and returns structured JSON with summary, action items, assignees, deadlines, and priorities.
Real-time sync: SignalR pushes Kanban board updates to all participants the moment AI processing completes — everyone sees the same tasks appear live.
Storage: Zilliz vector database stores meeting embeddings, enabling semantic search across meeting history.
Challenges we ran into
Gesture stability: MediaPipe detects gestures every frame, causing false triggers. We solved this with a debounce buffer — requiring a gesture to hold for 800ms before firing.
API compatibility: Qwen's API format differed significantly from OpenAI-compatible mode. After multiple failed attempts, we switched to Gemini 2.5 Flash which worked immediately with no format issues.
ElevenLabs block: Free tier was flagged for unusual activity. Replaced with Web Speech API which runs natively in Chrome with no API dependency.
CORS and CSP in extension context: Loading MediaPipe WASM from CDN is blocked by Chrome extension CSP. Solution was to bundle all WASM files and model assets locally inside the extension package.
Accomplishments that we're proud of
- Gesture-to-Kanban pipeline working end-to-end in under 36 hours
- MediaPipe running at full speed with zero backend calls for gesture detection
- Gemini returning clean structured JSON from messy spoken transcript consistently
- The moment when we said "xin chào" and a full meeting summary with 3 action items appeared on screen — that was the wow moment
What we learned
- Browser-native APIs (Web Speech API, MediaRecorder, WebAssembly) are surprisingly powerful and often more reliable than third-party services for a hackathon context
- Gesture UX requires significant debouncing — humans are not precise machines
- Structured JSON prompting with
responseMimeType: "application/json"in Gemini eliminates markdown stripping issues entirely - Building a meeting platform from scratch taught us to deeply appreciate what Agora and Daily.co abstract away
What's next for MeetingMind
- Chrome Extension: Inject MeetingMind as an overlay directly into Google Meet and Zoom — no separate tab needed
- Notion sync: Auto-push action items into team Notion workspace
- Multi-language: Full Vietnamese transcript support with diarization
- Mobile: Gesture detection via phone camera placed beside laptop during meetings
- Calendar integration: Auto-create follow-up meeting when tasks are overdue
Built With
- asp.net-core
- gemini-2.5-flash
- google-oauth
- jigsawstack-stt
- livekit
- mediapipe-hands
- react
- signalr
- tailwind-css
- web-speech-api
- zilliz
Log in or sign up for Devpost to join the conversation.