I'm writing this from Bahrain, where the ongoing conflict in the Middle East has turned everything upside down. School went fully online, everything got canceled, and life became unpredictable. I joined this hackathon late, and then when I finally had time to sit down and build - the war escalated. I had less than 48 hours. That's it.

What It Does

Perks Live is a real-time multimodal AI agent that watches your screen and listens to your voice simultaneously. You share your screen showing a messy workflow - spreadsheets, manual processes, disconnected tools - and have a natural voice conversation with an AI consultant that can see everything on your screen.

When you ask for a blueprint, the AI calls a tool which fires a webhook to an n8n automation pipeline. An AI Agent generates a complete, valid n8n workflow JSON file. The blueprint and importable JSON arrive in your inbox in under 60 seconds.

How I Built It

  • Gemini Live API - gemini-2.5-flash-native-audio-preview-12-2025 on v1alpha for real-time audio + vision streaming
  • Google GenAI SDK - @google/genai for Live API session management and tool declarations
  • Firebase Hosting - Google Cloud deployment
  • React 19 + Vite - frontend with WebGL shader background, Spline 3D scene, Framer Motion
  • n8n - self-hosted webhook receiver, AI Agent node, SMTP email dispatch

The Hardest Problems I Hit

Wrong model + wrong API version. Every connection attempt returned a 1008 disconnect. The fix was discovering gemini-2.5-flash-native-audio-preview-12-2025 on v1alpha.

clientContent vs realtime_input. I was sending audio with turnComplete: true on every 256ms chunk - telling Gemini the conversation was over constantly.

Browser Audio Autoplay Policy. AudioContext must be created synchronously before any await - the moment the call stack yields, the browser revokes the gesture token permanently.

AI-generated JSON was always broken. Gemini kept producing malformed n8n workflow JSON. I solved this by removing JSON generation from Gemini entirely and delegating to a dedicated n8n AI Agent node with a schema-aware system prompt.

Audio glitch on tool calls. The AI would say "blueprint has been fent" instead of "sent" because the previous audio stream was still playing when the new response started. Fix: flush the AudioContext immediately when a tool call fires.

What I Learned

The Gemini Live API is genuinely impressive. Talking to an AI that can see your screen and respond intelligently in real time still feels like the future, even after 48 hours of debugging it. Building something real under pressure - less than 48 hours, online school, a war outside - taught me more about shipping fast than any tutorial ever could.

What's Next

Multiple specialized agents working in parallel - a vision analyst, a workflow architect, a code generator - each callable by the other based on context. Persistent session memory, full conversation transcription, and deeper n8n integration where the AI queries your existing workflows and suggests consolidations.

Built With

Share this project:

Updates