MindMeeting

UI
Login
Dash
Reg
Main

Inspiration

Every meeting ends the same way — people walk out unsure of who owns what. Studies show professionals spend nearly 5 hours per week just re-processing meeting notes and manually assigning tasks. We built MeetingMind because we were tired of that problem, and believed a single hand gesture could solve it.

The vision came from watching surgeons operate — they cannot touch a keyboard mid-procedure, yet they coordinate entire teams through voice and gesture. Why should meeting participants be any different?

What it does

MeetingMind is an AI-powered meeting platform that replaces Google Meet with an intelligent layer on top of video conferencing.

During a meeting, users can raise an open hand to start recording an important clip, and close their fist to stop. The system automatically:

Transcribes the clip using JigsawStack STT with speaker diarization
Sends the transcript to Gemini 2.5 Flash for analysis
Extracts action items and assigns them to the correct person based on who said what
Reads the summary aloud using Web Speech API
Displays tasks instantly on a live Kanban board visible to all participants

No keyboard. No copy-pasting. No forgetting who owns what.

How we built it

Frontend: React + Tailwind CSS with a split-panel layout — video call on the left, live transcript and Kanban board on the right.

Gesture engine: MediaPipe Hands running entirely in the browser via WebAssembly, detecting 21 hand landmarks at 30fps with no backend dependency. Open hand = start recording, fist = stop and process, peace sign = replay summary.

Video layer: LiveKit for multi-user real-time video — self-hostable, no credit card required, handles WebRTC complexity.

AI pipeline: Audio chunks stream to ASP.NET Core backend every 2 seconds via MediaRecorder API. JigsawStack transcribes each chunk, then Gemini 2.5 Flash analyzes the full transcript and returns structured JSON with summary, action items, assignees, deadlines, and priorities.

Real-time sync: SignalR pushes Kanban board updates to all participants the moment AI processing completes — everyone sees the same tasks appear live.

Storage: Zilliz vector database stores meeting embeddings, enabling semantic search across meeting history.

Challenges we ran into

Gesture stability: MediaPipe detects gestures every frame, causing false triggers. We solved this with a debounce buffer — requiring a gesture to hold for 800ms before firing.
API compatibility: Qwen's API format differed significantly from OpenAI-compatible mode. After multiple failed attempts, we switched to Gemini 2.5 Flash which worked immediately with no format issues.
ElevenLabs block: Free tier was flagged for unusual activity. Replaced with Web Speech API which runs natively in Chrome with no API dependency.
CORS and CSP in extension context: Loading MediaPipe WASM from CDN is blocked by Chrome extension CSP. Solution was to bundle all WASM files and model assets locally inside the extension package.

Accomplishments that we're proud of

Gesture-to-Kanban pipeline working end-to-end in under 36 hours
MediaPipe running at full speed with zero backend calls for gesture detection
Gemini returning clean structured JSON from messy spoken transcript consistently
The moment when we said "xin chào" and a full meeting summary with 3 action items appeared on screen — that was the wow moment

What we learned

Browser-native APIs (Web Speech API, MediaRecorder, WebAssembly) are surprisingly powerful and often more reliable than third-party services for a hackathon context
Gesture UX requires significant debouncing — humans are not precise machines
Structured JSON prompting with responseMimeType: "application/json" in Gemini eliminates markdown stripping issues entirely
Building a meeting platform from scratch taught us to deeply appreciate what Agora and Daily.co abstract away

What's next for MeetingMind

Chrome Extension: Inject MeetingMind as an overlay directly into Google Meet and Zoom — no separate tab needed
Notion sync: Auto-push action items into team Notion workspace
Multi-language: Full Vietnamese transcript support with diarization
Mobile: Gesture detection via phone camera placed beside laptop during meetings
Calendar integration: Auto-create follow-up meeting when tasks are overdue

Built With

asp.net-core
gemini-2.5-flash
google-oauth
jigsawstack-stt
livekit
mediapipe-hands
react
signalr
tailwind-css
web-speech-api
zilliz

Updates

Nguyen Trung Viet Anh started this project — Mar 21, 2026 07:09 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.