Inspiration
Every team has that one person who takes meeting notes. They're half-listening, half-typing, and fully stressed. And even when notes exist — they live in someone's Notion, never shared, never read.
But the real problem isn't notes. It's organizational memory. The backend team made a decision on Thursday. The frontend team finds out on Monday — in a blocking bug.
We wanted to fix that without adding another calendar invite.
What We Built
MeetWise is a browser-based meeting room where each participant's microphone is recorded individually by their own browser. When the meeting ends, Gemini 1.5 Flash transcribes and summarizes everything into a structured report — decisions, action items with owners, cross-team dependencies, and unresolved questions — posted to a shared workspace dashboard any team can read.
The Insight That Changed Everything
Every existing meeting AI tool wrestles with speaker diarization — figuring out who said what from a single mixed audio stream. It's a hard ML problem.
We realized: we don't have to solve it.
Since each person's browser records only their own microphone, and we know who they are from login, the attribution is free. No diarization model. No post-processing. The architecture itself solves the problem.
If $N$ speakers join a room, traditional tools process 1 mixed stream and run diarization with complexity roughly $O(N^2)$ in speaker confusion. We process $N$ clean, labeled streams — $O(N)$, trivially parallelizable, and accurate by construction.
How We Built It
Backend — FastAPI handles room state, audio uploads, and orchestrates
Gemini API calls. Rooms are identified by 6-digit codes. Audio files hit
/upload-audio, summaries are written to disk as JSON.
Frontend — Vanilla HTML/CSS/JS. No framework overhead. The MediaRecorder
API captures each participant's mic as a webm/opus blob and POSTs it on
meeting end.
AI Pipeline — Two-stage Gemini usage:
- Each audio file → transcription with speaker label injected via prompt
- Merged transcript → structured JSON summary (decisions, action items, dependencies, open questions)
Storage — Supabase for workspace/team/room metadata. Audio files are ephemeral and gitignored.
Challenges
HTTPS wall — MediaRecorder requires a secure context. localhost works;
LAN IPs don't. We hit this at hour one and had to restructure how teammates
tested together during the hackathon.
Gemini audio limits — Flash has per-file size constraints. We had to implement client-side chunking for longer recordings and handle the merge on the server before the summarization stage.
State without a database mid-hack — Room state (who joined, which audio files belong to which room) lived in-memory on the FastAPI server initially. Integrating Supabase mid-hackathon without breaking existing endpoints was the messiest hour of the build.
Prompt engineering for structure — Getting Gemini to return consistent JSON across varied meeting lengths and topics took more iteration than expected. We ended up providing a strict schema in the system prompt and validating the response before writing to disk.
What We Learned
- Architecture can solve AI problems that brute-force modeling can't — the diarization insight was the most valuable thing we took away.
- FastAPI + Supabase is a genuinely fast stack for hackathon backends.
- Browser APIs are powerful and underused.
MediaRecorderhandled audio capture with ~15 lines of JS. - Prompt constraints matter as much as the model you pick.
What's Next
- WebSocket-based live transcription during the meeting (not just post-meeting)
- Action item assignment integrated with tools like Linear or Jira
- Cross-meeting trend analysis: which decisions keep getting revisited?
Built With
- css3
- fastapi
- gemini
- html
- javascript
- mediarecorderapi
- python
- supabase
Log in or sign up for Devpost to join the conversation.