Cook Master is a native Unity Mixed Reality application powered by Gemini 3’s advanced multimodal intelligence, capable of recognizing real-world ingredients, generating personalized recipes for users to choose from, providing visualized step-by-step cooking guidance, and offering a multimodal assistant to support users throughout the cooking process. The system is built around three core modules: Perception, Intelligence, and Interaction.

Eyes: Gemini-Powered Passthrough Vision

Cook Master captures real-time video from the headset camera, enabling Gemini 3 to understand the physical environment and recognize ingredients directly on the countertop, even in messy, unstructured cooking scenarios. Based on this perception, Gemini dynamically matches available ingredients with suitable, personalized recipes.

Brain: Gemini Multimodal Reasoning

At the core of Cook Master, Gemini 3 serves as the reasoning and generation engine, performing:

  • Structured recipe generation (name, introduction, steps)
  • Gemini Multimodal content generation (textual instructions + visualized step images)

Gemini supports text and image multimodal inputs, allowing users to interact naturally at any moment and receive contextual, real-time guidance.

Hands: Controller-Free Natural Interaction

We designed a fully controller-free interaction system, mapping core functions to intuitive semantic gestures:

  • Open Hand: Start a voice conversation with Gemini
  • Thumbs Up: Capture a photo and engage Gemini with multimodal (vision + voice) assistance

This enables hands-free operation, allowing users to continue cooking naturally without touching devices or interrupting their workflow.

Cook Master is more than just a recipe app; it represents a future where Gemini-powered systems understand the physical world, reason in context, and provide intelligent, real-time assistance, transforming everyday activities into seamless spatial experiences.

Built With

Share this project:

Updates