Inspiration: The Messy Reality of Cooking
Cook Master is inspired by a common kitchen challenge: cooking is inherently messy and hands-on. We’ve all experienced the frustration of trying to unlock a phone with flour-covered hands, only to discover that the ingredients required for the next recipe step don’t match what we actually have.
We asked ourselves: What if your kitchen could see what you see? What if recipes adapted to you, instead of the other way around?
How We Built It
Cook Master is a native Unity MR application that integrates with the Meta Horizon OS ecosystem and advanced multimodal AI capabilities. The system is built around three core modules: Perception, Intelligence, and Interaction.
Eyes: AI-Powered Passthrough Vision
Cook Master captures real-time video from the camera, allowing the AI to instantly recognize messy ingredients on the countertop and match them with suitable recipes for the user to choose from.
Brain: Multimodal AI & Structured Output
The brain of Cook Master, powered by advanced multimodal AI, performs:
- Ingredient recognition
- Structured recipe generation (name, introduction, steps)
- Multimodal content creation (textual step descriptions + visualizations)
The system also supports text + image multimodal input, enabling users to interact with the AI assistant at any point during cooking and receive instant guidance.
Hands: Controller-Free Natural Interaction
We designed a fully controller-free interaction system, mapping core functions to intuitive semantic gestures using the Meta Interaction SDK:
- Open Hand: Wake up the AI assistant
- Thumbs Up: Capture a photo and wake the assistant
This allows users to operate the system without putting down ingredients or cleaning their hands.
Voice: Immersive Speech Assistant
Once the assistant is activated, users do not need any physical input:
- Wit.ai handles voice recognition for natural, seamless interaction
- TTSSpeaker provides audio feedback, making the assistant more lively, immersive, and companionable
Challenges We Faced
JSON Structural Output Inconsistencies
The AI model sometimes inserted Markdown, explanatory text, or other non-structured content into the JSON output, causing deserialization failures. To address this, we implemented a string-cleaning pipeline to filter, trim, and validate the output before deserialization, ensuring reliable structured data.
Lessons Learned & Future Plans
Building Cook Master revealed the potential for Quest headsets to seamlessly integrate into real-world tasks and demonstrated the enormous possibilities of combining spatial computing with multimodal AI.
Future plans include:
- Multi-turn Context Memory: Allowing the AI to remember users’ kitchen tools, ingredients, and dietary preferences, so recipes become more personalized and adaptive over time.
- Ghost Guidance: Enhancing the system with context-aware ghost guidance at key steps or moments of user uncertainty, providing AI-driven visual cues that clarify the next action.
- Social Cooking Mode: Enabling remote partners to join a shared MR space for collaborative cooking or remote guidance.
Cook Master is more than just a recipe app; it represents a new future—where devices empower humans by understanding and engaging with the real world.


Log in or sign up for Devpost to join the conversation.