Perks Live - The SME Strategy Architect

Architecture Diagram Image

I'm writing this from Bahrain, where the ongoing conflict in the Middle East has turned everything upside down. School went fully online, everything got canceled, and life became unpredictable. I joined this hackathon late, and then when I finally had time to sit down and build - the war escalated. I had less than 48 hours. That's it.

What It Does

Perks Live is a real-time multimodal AI agent that watches your screen and listens to your voice simultaneously. You share your screen showing a messy workflow - spreadsheets, manual processes, disconnected tools - and have a natural voice conversation with an AI consultant that can see everything on your screen.

When you ask for a blueprint, the AI calls a tool which fires a webhook to an n8n automation pipeline. An AI Agent generates a complete, valid n8n workflow JSON file. The blueprint and importable JSON arrive in your inbox in under 60 seconds.

How I Built It

Gemini Live API - gemini-2.5-flash-native-audio-preview-12-2025 on v1alpha for real-time audio + vision streaming
Google GenAI SDK - @google/genai for Live API session management and tool declarations
Firebase Hosting - Google Cloud deployment
React 19 + Vite - frontend with WebGL shader background, Spline 3D scene, Framer Motion
n8n - self-hosted webhook receiver, AI Agent node, SMTP email dispatch

The Hardest Problems I Hit

Wrong model + wrong API version. Every connection attempt returned a 1008 disconnect. The fix was discovering gemini-2.5-flash-native-audio-preview-12-2025 on v1alpha.

clientContent vs realtime_input. I was sending audio with turnComplete: true on every 256ms chunk - telling Gemini the conversation was over constantly.

Browser Audio Autoplay Policy. AudioContext must be created synchronously before any await - the moment the call stack yields, the browser revokes the gesture token permanently.

AI-generated JSON was always broken. Gemini kept producing malformed n8n workflow JSON. I solved this by removing JSON generation from Gemini entirely and delegating to a dedicated n8n AI Agent node with a schema-aware system prompt.

Audio glitch on tool calls. The AI would say "blueprint has been fent" instead of "sent" because the previous audio stream was still playing when the new response started. Fix: flush the AudioContext immediately when a tool call fires.

What I Learned

The Gemini Live API is genuinely impressive. Talking to an AI that can see your screen and respond intelligently in real time still feels like the future, even after 48 hours of debugging it. Building something real under pressure - less than 48 hours, online school, a war outside - taught me more about shipping fast than any tutorial ever could.

What's Next

Multiple specialized agents working in parallel - a vision analyst, a workflow architect, a code generator - each callable by the other based on context. Persistent session memory, full conversation transcription, and deeper n8n integration where the AI queries your existing workflows and suggests consolidations.

Built With

firebase
framer-motion
geminiliveapi
google-cloud
googlegenaisdk
n8n
react
tailwindcss
three.js
vite
webgl

Submitted to

Gemini Live Agent Challenge

Created by

Built the entire project solo in less than 48 hours — architecture, frontend, backend pipeline, UI design, and deployment. Designed and implemented the Gemini Live API integration from scratch including real-time audio streaming, screen frame capture, tool calling, and audio playback. Built the n8n automation pipeline for blueprint generation and email delivery. Deployed on Firebase Hosting. Created from Bahrain during an active regional conflict while school and all activities were suspended.

Hamzeh Alsarabi
High school student, Class of 2027. I develop AI automation workflows bridging technical innovation and real-world business efficiency.

Updates

Hamzeh Alsarabi started this project — Mar 16, 2026 05:41 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.