Inspiration: The Messy Reality of Cooking

Cook Master is inspired by a common kitchen challenge: cooking is inherently messy and hands-on. We’ve all experienced the frustration of trying to unlock a phone with flour-covered hands, only to discover that the ingredients required for the next recipe step don’t match what we actually have.

We asked ourselves: What if your kitchen could see what you see? What if recipes adapted to you, instead of the other way around?

How We Built It

Cook Master is a native Unity MR application that integrates with the Meta Horizon OS ecosystem and advanced multimodal AI capabilities. The system is built around three core modules: Perception, Intelligence, and Interaction.

Eyes: AI-Powered Passthrough Vision

Cook Master captures real-time video from the camera, allowing the AI to instantly recognize messy ingredients on the countertop and match them with suitable recipes for the user to choose from.

Brain: Multimodal AI & Structured Output

The brain of Cook Master, powered by advanced multimodal AI, performs:

  • Ingredient recognition
  • Structured recipe generation (name, introduction, steps)
  • Multimodal content creation (textual step descriptions + visualizations)

The system also supports text + image multimodal input, enabling users to interact with the AI assistant at any point during cooking and receive instant guidance.

Hands: Controller-Free Natural Interaction

We designed a fully controller-free interaction system, mapping core functions to intuitive semantic gestures using the Meta Interaction SDK:

  • Open Hand: Wake up the AI assistant
  • Thumbs Up: Capture a photo and wake the assistant

This allows users to operate the system without putting down ingredients or cleaning their hands.

Voice: Immersive Speech Assistant

Once the assistant is activated, users do not need any physical input:

  • Wit.ai handles voice recognition for natural, seamless interaction
  • TTSSpeaker provides audio feedback, making the assistant more lively, immersive, and companionable

Challenges We Faced

JSON Structural Output Inconsistencies

The AI model sometimes inserted Markdown, explanatory text, or other non-structured content into the JSON output, causing deserialization failures. To address this, we implemented a string-cleaning pipeline to filter, trim, and validate the output before deserialization, ensuring reliable structured data.

Lessons Learned & Future Plans

Building Cook Master revealed the potential for Quest headsets to seamlessly integrate into real-world tasks and demonstrated the enormous possibilities of combining spatial computing with multimodal AI.

Future plans include:

  • Multi-turn Context Memory: Allowing the AI to remember users’ kitchen tools, ingredients, and dietary preferences, so recipes become more personalized and adaptive over time.
  • Ghost Guidance: Enhancing the system with context-aware ghost guidance at key steps or moments of user uncertainty, providing AI-driven visual cues that clarify the next action.
  • Social Cooking Mode: Enabling remote partners to join a shared MR space for collaborative cooking or remote guidance.

Cook Master is more than just a recipe app; it represents a new future—where devices empower humans by understanding and engaging with the real world.

Built With

Share this project:

Updates