Inspiration
After messing up a dish or two in a romantic homemade Valentine's Day meal, I decided that enough was enough! The separation of ingredients and instructions are useful when you are shopping, but are a major hinderance when you are actually cooking! I decided to scrape recipes from my favorite cooking website and insert measurements into the instructions. I also performed word2vec embedding to identify clusters of recipes, which could be used to perform wine and menu design recommendations in a future project.
What it does
- Insert measurements into instructions of recipes to allow the home chef to have a better cooking experience.
- Cluster recipes from document-wise average word embedding
How we built it
- Bag or words - term frequency transformations to take a bayesian approach of finding the "root" ingredient in something like "1 tablespoon of cumin." For each word in the ingredient phrase, find the max tf value from the instructions. After separating "1 tablespoon of" and "cumin", I insert the measurement back into the instructions.
- Word2Vec embeddings to transform words into 150 dimensional tensors for clustering.
Challenges we ran into
LOTS of data cleaning! Clusters from word embeddings were not fantastic due to only having ~2000 documents. Scraping more recipes would have improved the clustering, but scraping and cleaning from a different website would have taken too much time to complete.
Accomplishments that we're proud of
- I grappled with various ways of inserting measurements into the instructions. I understood that the Bayes approach was likely to be the most challenging, but also the most promising. I moved forward with it and actually got it to perform well!
- Even with only about 2000 recipes, I was still able to get a cluster that had very similar recipes! The three most representative recipes in the cluster were all desserts!
What we learned
I learned a lot about web scraping and word embedding! I got the chance to play around with different embedding parameters and saw how they affect the clustering performance. As mentioned above, the biggest factor for cluster performance was data size - more documents/recipes translate to better performance of clustering.
What's next for Recipes for the Overwhelmed Home Chef
Several things! Direct next steps for this project include wine pairing recommendations and menu design recommendations. Eventually, I want to work on cross-modal embedding to map images of the food to the recipe!
Built With
- beautiful-soup
- k-means
- python
- word2vec
Log in or sign up for Devpost to join the conversation.