Inspiration
This project has two different scopes. The inspiration for YumCam came from our daily struggle as college students, as we often run into problems of having our food go to waste since we don't utilize all of it effectively. But on a larger scale, this app can be used to help organizations, such as food banks, make decisions about which meals to make. If certain ingredients need to be used up sooner before they become spoiled or rotten, YumCam will help them find out which popular recipe they can make.
What it does
YumCam is an Android app that creates a list of ingredients, and then uses API's to find nutritional information about each ingredient, as well as recipes that incorporate all the selected ingredients. A user can enter the ingredients in two ways: by taking a photo of an item, or by saying the name of the item. If the user chooses to take a photo of the food item, then a computer vision algorithm will attempt to determine which type of food is shown in the picture and add it to the list. If the user chooses to speak the name of the ingredient, then they can use the speech-to-text algorithm, which will interpret the phrase and add it to the list. As ingredients are added into the list, their corresponding nutritional information is also stored. By long-clicking on an item, a graph showing the Daily Value percentage for key nutrients. Tapping an item in the list will toggle whether or not to include it in the search for recipes. Once the user has selected the ingredients they want included in the search, another API will search for recipes that include all of the ingredients. Up to 30 recipes will be shown in the app as a preview, including a picture and recipe title, and the user can click on any of them to load the full recipe.
How we built it
We decided to create an Android app, which played to the team's strengths in Java programming. There are 3 key parts to this project:
1) CustomVision Training: We were originally going to train TensorFlow to detect the food objects, but found that Microsoft unveiled a generic computer vision model to train with a GUI interface. We downloaded the images from an online image database, and tagged the images to be trained with CVS's models. We tuned the models by adjusting the precision threshold, balancing the precision and recall values as best as possible.
2) API Calls: In order to identify ingredients, determine nutritional information, and find recipes, we used pre-existing APIs by Microsoft and other companies.
- Microsoft Azure Cognitive Services API: Once the CustomVision training was complete, we setup a service in the Android app that would send image data to the trained algorithm, which returned the algorithm's prediction of the ingredient pictured, along with a confidence level. The resulting prediction would then be added to the ingredient list upon approval of the user.
- Nutritionix API: For each ingredient in the list, when the ingredient is added, the app calls the Nutritionix API to determine the nutritional information for the average, single serving of the ingredient. Using the FDA Daily Value standards, we use this nutritional information and converted the data into their corresponding Daily Value percentages. These percentages are shown in a bar graph for each ingredient listed.
- Food2Fork API: Once the user has created a list of ingredients, they can choose which ingredients to include (or exclude) from the recipe search. Only the selected ingredients are sent in the query to the Food2Fork API, which then finds recipes that includes all of selected ingredients. 3) Google Speech-To-Text: For any items that were not trained in our CustomVision, the user can still enter those items using the Google Speech-To-Text service for Android. Note that the Nutritionix and Food2Fork APIs have a much larger collection of searchable ingredients than the CustomVision service, so data from those APIs can still be gathered on the manually-entered ingredients.
Challenges we ran into
-We had intended to use Microsoft's speech-to-text for adding ingredients to the list manually; however, we ran into complications with sending packets of audio files to the API and ended up Google's builtin speech-to-text service instead. -In order to train the CustomVision well, we have to upload a lot of images of each item; however, we also wanted to have a variety of ingredients. Since we were limited to training 1000 images total, the computer vision was not as precise. -Another issue that we ran into was that the computer vision algorithm we are using is limited to detecting one ingredient at a time which slows down the user experience.
Accomplishments that we're proud of
-We successfully trained a computer vision model to detect different 20 ingredients -We successfully connected three APIs -We created an app with a cohesive design and wide-range of functionality
What we learned
-How to train a computer vision model using image data sets -How to manipulate -How to create better UI graphics -How to deal with dependencies using Gradle
What's next for YumCam
-Adding more ingredients that can be recognized -Making the computer vision more accurate by adding more image data sets -Enabling object detection so that multiple items can be added at once -Allowing the app to detect ingredients in real time
Log in or sign up for Devpost to join the conversation.