Remy — Your Wearable AI Cooking Companion
Remy is a wearable AI cooking assistant that gives you real time feedback and instructions on anything you're learning to cook.
Inspiration
At this stage in our lives, some of us have grown up cooking, while others have never even microwaved a meal. Cooking is both a survival skill and a rewarding pastime (one that even the culinary illiterate aspire to master). While it’s unrealistic to have a master chef guiding you in the kitchen, we set out to create an AI sous-chef that does more than suggest personalized recipes. Remy walks you through each step, actively observes your progress, and provides real-time feedback as you cook.
The name Remy comes from the Pixar rat who proved that anyone can cook. And with the right AI companion, we believe that’s more true than ever!
What it does
Remy is a multi-device system with three core capabilities:
- Recipe Discovery — A Next.js web dashboard where you describe what you're in the mood for ("something high-protein with chicken") and Claude AI generates three personalized recipe recommendations filtered by your dietary preferences (nut-free, keto, no spicy, etc.).
- Real-Time Kitchen Vision — A Raspberry Pi camera streams live video to an NVIDIA Jetson Orin Nano, which runs YOLOv8 object detection to identify ingredients and cooking activity, plus a Vision Language Model (Ollama
qwen3-vl:2b) that analyzes the scene every few seconds and describes what it sees in natural language. - Smart Cooking Control — When you select a recipe, the dashboard sends a structured task queue over TCP to an ESP32 microcontroller that can coordinate cooking hardware, while a DS18B20 temperature sensor monitors your cooking surface in real time.
How We Built It
Architecture
With four devices, this project relies heavily upon effective communications protocols to function. We have one head-mounted raspberry pi which carries the camera along with an ESP32 combined with an amplifier, speaker, and microphone. This provides us with a flexible, compact package that gives the system a well-rounded sense of it surroundings. The heavy hitter of our hardware, the Jetson Orin Nano, handles all the edge compute required to give us relatively low latency for how much processing is being done. Connected serially to another base station raspberry pi, the Nano and the Pi coordinate the overall state of the cooking, and parse all the incoming communications from the peripheral devices.
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | Next.js 16, React 19, TypeScript, Tailwind CSS 4 |
| AI/ML | Claude AI (recipe generation), YOLOv8n (object detection), Ollama qwen3-vl:2b (scene analysis) |
| Backend | Python 3, OpenCV, picamera2, UDP/TCP sockets |
| Hardware | Raspberry Pi (camera + base station), NVIDIA Jetson Orin Nano, ESP32, DS18B20 temp sensor, PCM5102A DAC, INMP441 Microphone, PAM8302 Amplifier |
| Audio | ElevenLabs STT, Silero VAD, PyAudio, ESP32+I2S |
Recipe AI Pipeline
When a user asks "prompts the system should I cook?", the system:
- Sends the query + dietary preferences to Claude (
claude-sonnet-4-20250514) - Claude returns 3 structured recommendations, each with a name, description, and an ordered
recipeTaskQueueof cooking steps - The dashboard fetches food images from the Pixabay API for each dish
- On selection, the task queue is serialized as JSON and sent over TCP with a 4-byte length header to the ESP32
Scene Analysis Pipeline
We pass the video feed into the Jetson Nano every 7 seconds in order to generate a text description of what the user is seeing; we choose to do this on device due to the large latency from streaming actual video feeds, and to test out the edge capabilities of Nvidia's devices. This textual description is then appended to a large history buffer, which is used as scene analysis over time. This also allows us to identify when users have completed tasks in their recipes without them having to manually mark them, creating a more streamlined and enjoyable experience.
Challenges We Ran Into
Audio Quality. Streaming text-to-speech over a DAC into a small 8 ohm speaker brought a lot of unexpected challenges. Since the line-level DAC produces an output matched to higher impedance output devices, we scrambled to find an amplifier that could produce help the output signal match the speaker. Furthermore, since live audio is always a difficult proposition over unreliable internet, we had to implement ring buffers and anti-jitter logic on both ends of our audio pipeline to keep the sound relatively smooth.
Hardware Failure. At 4am on Sunday, our Arducam camera gave out on us and left us with no working video feed and five hours of hacking left to go just as we were beginning to integrate all the components of the project. Luckily, we substituted it with a Logitech Brio Camera that we had on hand, and it worked right out of the box!
Accomplishments that we're proud of
We're firstly so happy to have built such a complex hardware hack. It wasn't easy interfacing between 2 Raspberry Pi's, a ESP32, and a Jetson Nano through both wireless and wired means, and we learned a lot through it.
We're also proud to have learned so much on Nvidia's edge compute devices, and we've seen first hand just how useful they can be for low-latency use cases like live video analysis.
"Anyone can cook." — Auguste Gusteau
What's next for Remy
We want to incorporate more sensors in order to provide more context to the model. We also would like to have even more powerful edge compute, since image models are large and slow to run easily. Finally, we'd love to test our devices out to real people who are learning how to cook!
Built With
- next.js
- pixabay
- python
- typescript
- vercel
Log in or sign up for Devpost to join the conversation.