Inspiration
We kept hitting the same moment. You're deep in cooking something, your hands have garlic all over them, and you suddenly need to know how many grams you just put in, or how long to cook the pasta, or whether you already added salt. Touching your phone means washing your hands, unlocking it, finding the tab you closed. The moment passes. Every cooking app we'd seen treated the kitchen like a dashboard. We wanted something that listened instead.
What it does
Sous AI is a voice-first cooking app. Tap the mic and talk to it like you would to a friend standing next to you at the stove. "A splash of olive oil." "Three cloves of garlic, chopped." "How long should I cook the pasta?" The app parses what you said, looks up real macros for every ingredient, and reads back a short confirmation. When you finish, the whole recipe lands in your cookbook with calories, protein, fat, carbs, and how long you were at the stove.
How we built it
The mobile app is Expo SDK 54 on React Native with a strict TypeScript state machine driving the voice loop. There are four states: Armed, Listening, Processing, Speaking. Each one owns exactly one audio consumer at a time, because the operating system will fight you if you try to share them.
The backend is FastAPI on Railway. When the phone posts an audio clip to /utterance, the server hands it to Gemini for parsing, looks up macros through Edamam, streams a short acknowledgment back through ElevenLabs, and writes the result to Supabase. Totals are just a sum per ingredient,
$$\text{total_cal} = \sum_{i \in \text{ingredients}} \text{edamam}(i)$$
with a fallback to an LLM estimate when Edamam can't resolve a name, flagged as estimated so we never claim precision we don't have.
The UI is its own thing. We called it Warm Editorial. Cream backgrounds, deep green ink, metallic gold used exactly twice per session, hairline rule-offs between ingredients instead of card borders. All animations are React Native's built-in Animated API with the native driver, so the concentric rings around the listening mic stay at 60fps on a mid-tier phone.
What we learned
A state machine at the center of an interactive system pays for itself within the first few hours. Ours caught three bugs that would have been invisible in a plain useState app.
Voice also changes how you design everything. Copy has to be shorter because users hear it instead of reading it. Latency matters more because there's no spinner to hide behind. Errors have to resolve themselves, because a user with wet hands can't tap a dismiss button.
And real-world food data is much messier than it looks. "Three cloves of garlic" works. "A drizzle of olive oil" does not, until you teach the parser what "drizzle" means in grams.
Challenges we ran into
The biggest one was the single audio consumer rule. The recorder and the text-to-speech player both want exclusive access to the mic and speaker. We spent hours tracking down why a playback ending would occasionally leave the mic in a half-armed state. The fix was a 300 millisecond buffer after playback before we re-enabled the mic, plus a finite state machine with no overlapping transitions.
Macros were the second. Edamam tanks a whole response if one ingredient fails. We ended up resolving ingredients one at a time so individual failures drop out of the list, and we tag any LLM fallback as estimated so the UI can surface that honestly.
The third was Gemini's clarification loop. When the model asks a question back, like "how much olive oil?", we need to route the next utterance as an answer to that question instead of treating it as a new ingredient. We added a pending_clarification field to the session state and threaded it through every turn.
Built With
- edamam
- edamam-ui:-lucide
- elevenlabs
- expo-av
- expo-router
- expo.io
- fastapi
- google-gemini
- jest
- jest-backend:-fastapi
- lucide
- postgresql
- pydantic
- python
- railway
- railway-ai-&-apis:-google-gemini
- react-native
- sql
- sql-mobile:-react-native
- supabase
- typescript
- uv
- uv-data-&-infra:-supabase
- uvicorn
Log in or sign up for Devpost to join the conversation.