Meet AI

Inspiration

💡 Inspiration

Meetings are the heartbeat of collaboration, yet they are often bogged down by manual note-taking and the struggle to recall key details later. We wanted to build something more than just a recorder—we wanted a participant. Our inspiration was to create a "Living Meeting" where AI doesn't just watch from the sidelines but interacts in real-time and handles the entire post-meeting lifecycle automatically.

🚀 What it does

Meet AI is an intelligent video conferencing platform.

Interactive AI Agents: Talk directly to AI mentors or assistants during your call using the latest OpenAI Realtime API.
Pro Audio Mixing: Every nuance of the conversation (both yours and the AI's) is mixed locally and captured in high-fidelity on the server.
Smart Post-Processing: Once the call ends, our automated pipeline (via Inngest) kicks in to transcribe the audio, intelligently label who said what (identifying "User" vs "AI"), and generate an executive summary with action items.
Contextual Chat: You can chat with your meeting afterward, asking questions about specific decisions or data mentioned during the call.

🛠 How we built it

We chose a high-performance stack to handle the complexity of real-time audio/video:

Streaming: Stream Video SDK for the robust video infrastructure.
AI Brain: OpenAI's Realtime API (WebSockets) for the voice interaction, Whisper for transcription, and GPT-4o for editing/labeling.
Backend Orchestration: Inngest handles our complex background workflows, ensuring processing never blocks the user experience.
Framework & DB: Next.js 14, Neon PostgreSQL, and Drizzle ORM for a lightning-fast fullstack foundation.

🧠 Challenges we faced

The biggest hurdle was Audio Synchronization. Video recording normally only captures the user's microphone. Getting the AI's voice (arriving via WebSockets) to "mix" with the user's voice so that the server-side recording captured both was a significant technical challenge. We solved this by building a custom audio mixer using the Web Audio API to combine local and remote streams before publishing them to the call.

📖 What we learned

We learned that the jump from "Text AI" to "Voice AI" is massive. Handling latency, echo cancellation, and state management in a real-time environment requires a deep understanding of browser audio internals. We also discovered how powerful asynchronous workflows (like Inngest) are for creating a seamless user experience.

🔮 What's next for Meet AI

We plan to introduce Multi-Agent Meetings, where different AI specialists can join the same call to collaborate. We also want to implement real-time sentiment analysis to provide feedback on the "vibe" of the meeting as it happens.