Categorisation of Hello Fresh's ingredients list in order to allow more detailed groups.
What it does
Our hackathon program runs through the list of ingredients provided by Hello Fresh and finds categories in a database that we put together using ndb.nal.usda.gov. In order to match the ingredient with a category we translated all non-English ingredients into English and then used an API to search for related words The related words would give us a broader term for the ingredient which could then be matched with our database.
See https://github.com/rafael-hantoush/hackathon-challenge/blob/master/task.ipynb for our code. And https://github.com/rafael-hantoush/hackathon-challenge/blob/master/results.csv for the appended ingredients.csv
How we built it
We used Python to do the data pre-processing, calling the APIs, combining the results with the ingredients list.
Challenges we ran into
- Google Translate API and other translation APIs required payment, other free apis were not as suitable for our project as they provided insatisfactory results
- multi word ingredients did not provide results in the related words api and had to be split
Accomplishments that we're proud of
- Using related words API
- Doing a text pre-processing (i.e. stemming the plural ingredients, using regex for words in parentheses, etc)
What we learned
- Everything takes so much longer than we thought
- Sometimes seemingly stupid approaches can work, too
What's next for Hello Fresh Data Challenge by Rafael&Linda
- Using a smarter algorithm to match the ingredients
- Linked data analysis
- Add more category information like nutrients, cuisine type etc. which would then allow for a more in-depth search for recipes (e.g. vegetarian dishes rich in iron; dessert high in vitamin C)