MeetWise-AI

Home page upon loading in.
Meeting Room for the user.
Recording the audio from the speaker's device.
Post-Meeting response

Inspiration

Every team has that one person who takes meeting notes. They're half-listening, half-typing, and fully stressed. And even when notes exist — they live in someone's Notion, never shared, never read.

But the real problem isn't notes. It's organizational memory. The backend team made a decision on Thursday. The frontend team finds out on Monday — in a blocking bug.

We wanted to fix that without adding another calendar invite.

What We Built

MeetWise is a browser-based meeting room where each participant's microphone is recorded individually by their own browser. When the meeting ends, Gemini 1.5 Flash transcribes and summarizes everything into a structured report — decisions, action items with owners, cross-team dependencies, and unresolved questions — posted to a shared workspace dashboard any team can read.

The Insight That Changed Everything

Every existing meeting AI tool wrestles with speaker diarization — figuring out who said what from a single mixed audio stream. It's a hard ML problem.

We realized: we don't have to solve it.

Since each person's browser records only their own microphone, and we know who they are from login, the attribution is free. No diarization model. No post-processing. The architecture itself solves the problem.

If $N$ speakers join a room, traditional tools process 1 mixed stream and run diarization with complexity roughly $O(N^2)$ in speaker confusion. We process $N$ clean, labeled streams — $O(N)$, trivially parallelizable, and accurate by construction.

How We Built It

Backend — FastAPI handles room state, audio uploads, and orchestrates Gemini API calls. Rooms are identified by 6-digit codes. Audio files hit /upload-audio, summaries are written to disk as JSON.

Frontend — Vanilla HTML/CSS/JS. No framework overhead. The MediaRecorder API captures each participant's mic as a webm/opus blob and POSTs it on meeting end.

AI Pipeline — Two-stage Gemini usage:

Each audio file → transcription with speaker label injected via prompt
Merged transcript → structured JSON summary (decisions, action items, dependencies, open questions)

Storage — Supabase for workspace/team/room metadata. Audio files are ephemeral and gitignored.

Challenges

HTTPS wall — MediaRecorder requires a secure context. localhost works; LAN IPs don't. We hit this at hour one and had to restructure how teammates tested together during the hackathon.

Gemini audio limits — Flash has per-file size constraints. We had to implement client-side chunking for longer recordings and handle the merge on the server before the summarization stage.

State without a database mid-hack — Room state (who joined, which audio files belong to which room) lived in-memory on the FastAPI server initially. Integrating Supabase mid-hackathon without breaking existing endpoints was the messiest hour of the build.

Prompt engineering for structure — Getting Gemini to return consistent JSON across varied meeting lengths and topics took more iteration than expected. We ended up providing a strict schema in the system prompt and validating the response before writing to disk.

What We Learned

Architecture can solve AI problems that brute-force modeling can't — the diarization insight was the most valuable thing we took away.
FastAPI + Supabase is a genuinely fast stack for hackathon backends.
Browser APIs are powerful and underused. MediaRecorder handled audio capture with ~15 lines of JS.
Prompt constraints matter as much as the model you pick.