Remy | Devpost

Remy Cover Logo
Webhook Automation With n8n
Mobile App

Inspiration

Everyone has a dish that takes them home; grandma's recipe, a family tradition; any meal that defined your childhood. Remy is your AI sous chef that helps you honor those memories. Using computer vision and voice guidance, it watches you cook hands-free, perfects every detail, and makes sure that when you take that first bite, it tastes exactly like home.

What it does

Remy is an AI-powered cooking assistant that uses computer vision to guide you through recipes hands-free. Using cameras mounted in your kitchen, Remy can:

Monitor your cooking in real-time - tracks which steps you've completed by visually recognizing ingredients and actions
Verify ingredients - scans your workspace before you start cooking to ensure you have everything you need
Provides voice-guided instructions - reads out recipe steps at the right moment, so you never need to touch your phone with messy hands
Catches mistakes - alerts you if you've skipped a step or missed adding an ingredient

Think of it as a sous chef that actually watches what you're doing and keeps you on track, making cooking more accessible and less stressful for everyone, from beginners to experienced home cooks.

How we built it

Finite state machines to manage the CV processing pipeline and recipe step progression
Gemini API for real-time image recognition of ingredients and cooking actions
Amazon Alexa + VoiceMonkey for customized hands-free voice guidance and alerts
Flask server as our backend API layer to orchestrate communication between components
Firebase for database storage of recipes, user progress, and session data
Jetpack Compose & Kotlin for building an intuitive Android mobile app
Raspberry Pi cameras to capture live cooking feed and monitor the workspace

Challenges we ran into

Constant poops due to carbonara
Inconsistency of different tools and frameworks while interacting with each other
Privacy challenges of the Amazon Echo bot and handling voice activation
Connecting all components (including mobile app, APIs, CV) together smoothly
Balancing tradeoffs between camera resolution with classification accuracy
Attempted multithreading with the backend but having the camera stream running in a thread was challenging, so we migrated to using a finite state machine. (Flask server still runs in a thread though)
All of the different APIs, systems, and libraries and integrating them into a control loop to manage concurrency with global states and time counters.

What we learned

Keeping all parts on the same server improves performance by eliminating the need for tunnels or APIs, especially for images and other non-text data
Unpredicted errors and behaviour of chatbots, and the importance of clear prompting to prevent unexpected output
High depth of details in a large project that is dealing with any kind of physical product or process, due to logical and environmental errors
How to integrate a backend with multiple processes running at once while also dealing with APIs