It was Christmas eve and our team was cooking with our fellow intern friends. We found a nice recipe, and each settled down in a corner of the kitchen to begin prepping. But it only took ten minutes for the kitchen to evolve into a state of chaos - A: "Hey, what's the next step?" B: "Wait no, what's the current instruction? I don't want to overcook it." C: "I got you guys, just let me unlock the computer screen!" D: "No touching the computer without washing your hands!" E: "Sorry but I'm using the sink, give me a minute…" While we somehow managed to cook a nice meal in the end, a question stuck with us: how can we make the recipe checking process easier?

What it does

Our website, built with Flask and Jinja2, uses the Food API by spoonacular to parse the recipe link provided by the user, analyzes the ingredients needed and instructions to the recipe, arranges them in an easy-to-view flashcard style webpage, and reads them out to the user using Bing Speech API. The user is able to control the flow by saying "yes" "no" when prompted, which can control the app to go to the next step, repeat the current step, or re-list the ingredients.

How we built it

We pumped our main dev, Sooham with a lot of japanese synthesized motivational quotes. After converting input recipe links into markdown from html, we use a mashable Food API by spoonacular to get useful statistics on the recipe such as the list of ingredients, units etc, the steps of the recipe are parsed and broken down by paragraphs. We take the steps and ingredients acquired from the input link and batch them out for speech synthesis with Microsoft Azure Speech API, globally caching frequently requested links. Finally, the voice user interface is built using a state machine dependent on user voice input.

Challenges we ran into

1) Integrating with cloud services -- didn't have time. 2) Interfacing with Microsoft Azure Speech synthesis API on the server side whilst minimizing processing time for often searched recipes.

Accomplishments that we're proud of

1) The app was successfully built with new technology that we learned the day of. 2) We worked together. 3) We had fun.

What we learned

1) How to use Microsoft's APIs for speech synthesis and speech recognition. 2) We also learned javascript + html + css + Flask to write the web app. 3) Team work is difficult and meshing code together is hard. 4) The architecture behind waiting for user input while trying to ignore verbal conversation / quips is hard.

What's next for Robotouille

1) We could allow users to query for recipes through voice, instead of a provided url. 2) We would like to implement better semantic analysis of the recipes so that instructions can be split intelligently and redundant ingredient information is removed. 3) Bringing it on home mini/echo/alexa. 4) Intelligent task splitting, some things could be done in parallel (hard). Possibility for task splitting among groups. 5) Build user session into the app, allowing users to save their most frequent recipes. 6) Gordan Ramsay voice plugin. 7) Greater VUI functionality.

Share this project: