Inspiration

The foundation for this project was laid during a previous hackathon, where I built a GenAI-powered game show platform. I had developed unique, modality-specific rounds like "Songquiz," where players raced to guess AI-generated songs (using Lyrics from Gemini and the song from Suno/ElevenLabs) that started vague and became clearer over time, and "Image-Guess," a progressive visual puzzle where clues appeared sequentially.

While the mechanics were solid, the interaction still felt rigid—players were simply reacting to pre-generated content and typing answers. I wanted to break that barrier and make the experience feel natural, spontaneous, and truly "live."

The release of the Gemini Multimodal Live API provided the base. It allowed me to create the concept from a standard quiz show into a real-time collaboration. I decided to pivot the game show framework into Gemini Pictionary, replacing static inputs with a continuous voice stream where the AI isn't just a backend validator, but an active Host that listens, paints, and judges in the moment.

What it does

Gemini Pictionary LIVE transforms the classic drawing game into a dynamic single-player experience against an AI.

  • The Challenge: You are given a secret target word and a list of forbidden "Taboo" words.
  • The Live Host: You speak freely into your microphone to describe the object. A Gemini Live instance (The Host) listens in real-time. It adopts a game show persona, encouraging you but also strictly enforcing the rules.
  • The Twist: If you slip up and say a forbidden word, the Host doesn't just reject a text input. It verbally interrupts you mid-sentence, shouting "BUZZ" and deducts a life.
  • The Loop: As you describe, the Host uses tool-calling to generate images on the fly. Simultaneously, a separate visual AI (The Guesser) watches the canvas and attempts to guess the secret word based only on the generated images. If the AI guesses correctly before you run out of lives, you win the round!

How I built it

This project is a testament to the power of "Vibe Coding" within Google AI Studio. I wrote almost zero manual lines of code for this application.

Instead, I acted as the "Director." I took the distinct Neo-Brutalist styling and component architecture from my previous hackathon project and fed it into the model as context. I then prompted Gemini to architect the entire React and TypeScript application around those stylistic constraints.

The AI did the heavy lifting:

  • Full Stack Orchestration: It autonomously wrote the complex WebSocket logic to connect the browser's AudioContext to the Gemini Multimodal Live API.
  • State Management: It figured out how to coordinate the three different AI roles (Host, Painter, Guesser) and manage the game state (lives, rounds, confetti triggers) without me needing to debug race conditions.
  • UI Synthesis: It applied my "old styling" rules to new components perfectly, generating a polished, high-contrast interface on the first try.

Essentially, I provided the "Vibe" and the logic requirements, and Gemini built the actual software. It turns out the best way to build a Gemini-powered app is to let Gemini build it for you.

Challenges I ran into

  • Buzzing: When the answer of the Guesser was fed into the Quizmaster, he often buzzed thinking the user said the forbidden words.
  • Host Hallucinations & Tool Grounding: The Live model is happily continuing the conversation. I found that if the Host would not get a detailed descriptive return value after a tool call (e.g., "Image generated successfully, Wait for the guesser input..."), the Host would simply hallucinate the Guesser's result immediately.

Accomplishments that we're proud of

  • True Agent Autonomy: The agent genuinely run the show. The Host manages the pacing, the banter, the tool usage, and the rules entirely on its own. It feels less like a chatbot and more like a collaborative partner.
  • High-Fidelity "Vibe Coding": While the app was generated almost entirely via prompts in AI Studio, it completely breaks the mold of the typical "AI-generated" look. I steered clear of the generic "pinkish gradient" aesthetic and achieved a distinct, Neo-Brutalist design that looks like it was hand-crafted.

What I learned

  • Tool Responses Control Behavior: I learned that the output of a function call is just as important as the system prompt. Explicitly telling the agent what to do inside the tool response is mandatory (e.g., "Success: Image is generated, wait for the Guesser!").
  • Latency is King: The Gemini Flash model is a powerhouse for this specific use case. It balances the complex multimodal understanding required for audio and vision with the ultra-low latency necessary for a real-time game, making it genuinely viable for live consumer applications.

What's next for Gemini Pictionary

  • From Demo to Product: I plan to integrate this standalone prototype into a larger, production-ready gaming application.
  • Expanding the "Live" Suite: I want to take the previous concepts (like "Songquiz" and "Image-Guess") and upgrade them with this Live API architecture. We see a future where an entire suite of party games is powered not by buttons and text fields, but by natural, real-time voice interaction.

Built With

Share this project:

Updates