Food Mixer

Developers

Janeth Meraz and Kalen Frieberg jmeraz and kfrieber

Final Writeup / Reflection / Presentation

https://docs.google.com/document/d/1IDXhQ20QNW6tslVMaCD2G9YHSf5TNfumaQsHO_slATM/edit?usp=sharing

Video link

https://drive.google.com/file/d/1__WbD1myT8nvfo5DfeGAqjBII30lctUF/view?usp=sharing

Motivation

When making a new recipe it can be challenging to image how your dish will work out before hand without wasting expensive ingredients and your valuable time. In addition, when visiting a new restaurant with unknown dishes, a person may not know what a certain dish is. By being able to generate images of these dishes, a person will know what they are ordering before it comes, which can make the decision process easier and encourage culinary exploration.

Our project seeks to resolve this issue by pre-generating images of your proposed meals so you can be confident going in whether or not your dish will appear appetizing.

Our problem relates to both Classification and Generation, as it must both recognize the input labels and create new image outputs.

Description

Our program is a generative model using images of classified foods to learn labels and traits of those foods. On use the program will take inputs of labels and will output a composite image of those labels to be a used proscriptively. To test the accuracy of those output images we run the output of the model on our classifier to see how accurately our classifier can recognize each component of the generated image.

Related Work

Image generation is a widely researched topic which has become very popular in the mainstream media thanks to DALL-E. This model is a transformer language model taking both text and image as input when training. During testing, it takes in a phrase and generates an image to go with it, similar to how we are generating images of food. It's able to predict contextual details not mentioned in the given phrase, such as the word "sunrise" implying the possible presence of a shadow if there is an object in the image. They used human evaluation to evaluate the accuracy of the model, where they had >90% accuracy. The generalization capabilities of this model allow it to have impressive results in zero-shot image generation.

Data

We used the keras data set food101. The homepage can be found here

The data includes 101 classes of foods with 750 and 250 training and testing images per class respectively. The images will require reshaping and to standardize the resolution between the images.

Methodology

The model will include two major components. The first will be a CNN taking in images of the dataset and outputting the most likely labels. Secondly, there will be a generative model taking in label inputs and creating composite images of those foods.

We believe that the design will function because its simplicity will make it easy to implement and perform error analysis. In the case that the model is insufficient then we can fall back on more complicated designs. For classifier model we could use RNN or Bag of Words and for generative we can use graph neural networks.

Metrics

Our classification model will be used to measure success. The classification model will first be fine tuned prior to testing to output a high accuracy on the dataset (~80%). Then, the classification model will be used on the newly generated images from the Generative model, and the accuracy will be calculated from that.

If the accuracy is low on the classification model, then we will know that our generative model is not performing well.

The experiments we plan to run are to first use the training dataset to determine the accuracy, then we plan to create our own dish names that the model has not seen before.

Our accuracy spread is, base: 60%, target: 80%, stretch: 90%.

Ethics

The Dataset is likely biased towards common foods and dishes found in the US. While it does include dishes different cultures many of those are also well known within the US. Dishes that are less well known are not likely to be found within the dataset as it is limited to 101 classes.

Deep learning is useful because for unknown dishes there not be useful image of the food on the internet so generative images will be the only way to predict how your food will turn out.