Recipe Classification for Diabetes Risk Factors

Panggih Kusuma Ningrum​, Elise Yang​, Ryan Young​, Diana ​

Inspiration

Our goal is to analyze online recipes and based on their ingredients, classify them as low, medium, or high-risk in developing Type 2 Diabetes, or how dangerous they may be to adult men who already has Type 1 Diabetes.

What it does

A linear regression model takes in ingredients and shows what the how ‘risky’ the ingredients are.

How we built it

The USDA provided dataset was used to build a linear regression model to classify different ingredients, showing how much risk they posed to those at risk for type 2 diabetes, or those who already had type 1. The predictor variables are carbohydrates, saturated fats, total fat, and sugar (grams) and the dependent variable is a risk level of 0 (low), 1 (mid), and 2 (high). The model produced a 77.18 percent test accuracy, meaning it would be able to reasonably classify ingredients as low, medium, or high-risk. With more data on high-risk foods, this accuracy can be improved.

The next step was to feed in the ingredients information and build a machine learning model to predict the potential risk of cooking recipes, based on a list of ingredients. We used natural language processing to analyze text from a website of cooking recipes and performed keyword mapping using regular expression to extract information from the text- based on the top 10 most harmful ingredients. This dataset contained over 19,000 recipes and we found 594 recipes that contained the 50 most harmful ingredients.

Challenges we ran into

The ingredients dataset contained 74% low-risk foods so it was difficult to achieve a high model accuracy score because there was not enough data to train the model on high-risk foods- which only made up 9% of the data. Additionally, we ran out of time afte losing files to complete the

Accomplishments that we're proud of

We developed a machine learning model that can help diabetic patients with their day-to-day lives and which can be reproduced to aid others with varying health problems. This project can help others enjoy food, and ultimately, life more.

What we learned

We learned enormously about how food affects others’ daily lives, how we can use machine learning to make it a bit easier for them, and that we can create a meaningful project in such a short time. We learned that problems will arise continuously throughout a project and our code will crash last minute and we will lose our files.

What's next for Recipe Classification for Diabetes Risk Factors

Risk levels will vary for women, children, and based on size and activity level, so the project can be adjusted on an as-needed basis. Future work will revolve around expanding to predict risk levels for high cholesterol, blood pressure, and other diet-based health issues.

Built With

Share this project:

Updates