Inspiration
Everyone has a dish that takes them home; grandma's recipe, a family tradition; any meal that defined your childhood. Remy is your AI sous chef that helps you honor those memories. Using computer vision and voice guidance, it watches you cook hands-free, perfects every detail, and makes sure that when you take that first bite, it tastes exactly like home.
What it does
Remy is an AI-powered cooking assistant that uses computer vision to guide you through recipes hands-free. Using cameras mounted in your kitchen, Remy can:
- Monitor your cooking in real-time - tracks which steps you've completed by visually recognizing ingredients and actions
- Verify ingredients - scans your workspace before you start cooking to ensure you have everything you need
- Provides voice-guided instructions - reads out recipe steps at the right moment, so you never need to touch your phone with messy hands
- Catches mistakes - alerts you if you've skipped a step or missed adding an ingredient
Think of it as a sous chef that actually watches what you're doing and keeps you on track, making cooking more accessible and less stressful for everyone, from beginners to experienced home cooks.
How we built it
- Finite state machines to manage the CV processing pipeline and recipe step progression
- Gemini API for real-time image recognition of ingredients and cooking actions
- Amazon Alexa + VoiceMonkey for customized hands-free voice guidance and alerts
- Flask server as our backend API layer to orchestrate communication between components
- Firebase for database storage of recipes, user progress, and session data
- Jetpack Compose & Kotlin for building an intuitive Android mobile app
- Raspberry Pi cameras to capture live cooking feed and monitor the workspace
Challenges we ran into
- Constant poops due to carbonara
- Inconsistency of different tools and frameworks while interacting with each other
- Privacy challenges of the Amazon Echo bot and handling voice activation
- Connecting all components (including mobile app, APIs, CV) together smoothly
- Balancing tradeoffs between camera resolution with classification accuracy
- Attempted multithreading with the backend but having the camera stream running in a thread was challenging, so we migrated to using a finite state machine. (Flask server still runs in a thread though)
- All of the different APIs, systems, and libraries and integrating them into a control loop to manage concurrency with global states and time counters.
What we learned
- Keeping all parts on the same server improves performance by eliminating the need for tunnels or APIs, especially for images and other non-text data
- Unpredicted errors and behaviour of chatbots, and the importance of clear prompting to prevent unexpected output
- High depth of details in a large project that is dealing with any kind of physical product or process, due to logical and environmental errors
- How to integrate a backend with multiple processes running at once while also dealing with APIs



Log in or sign up for Devpost to join the conversation.