Team name: Picky Eats

Team members: Brandon Lee (blee50), Erica Li (eli32), Daniel Chey (dchey), Seo Hyun (Lina) Lim (slim20)

Link to final writeup:

Checkin 1:


The current problem we have is that in a highly competitive and non-differentiable marketplace, where there are millions of customers and restaurants, we want to figure out how a restaurant’s image online affects their rating on apps like Yelp. We want to build a classifier to understand this problem, so that we can help restaurants build their image and online presence with the newest image advertisement techniques in our highly technological world today.

The paper’s objective is to figure out the quality of a restaurant’s image in relation to its Yelp rating. The first part of the paper focuses on creating a classifier that works to assess the quality of the restaurant’s photographic representation online, by giving a restaurant’s image as an input, and a Yelp rating of 1 to 5 stars as the output. The second part then focuses on determining which features of a restaurant’s images can lead to a higher rating.

We chose this paper because in such a highly competitive market place, even the slightest changes in a restaurant’s image can give it a marginally higher rating and lead to growth in revenues. Our goal is to determine first, how well an image can determine a restaurant’s rating, and our second goal is to determine what and how the different features of an image can impact a rating. In this way, we can see how we can perhaps make businesses more effective online, and how as consumers who are picky eaters, may have to become more wary as businesses learn to become more effective in their online presence in a way that might be more unrepresentative of their restaurant as a whole.

This problem is a classification, as we are building a classifier that classifies images (input) based on their Yelp restaurant ratings (output).

Related Work:

This article claims that high-quality images, photos, and illustrations can be used to build credibility of a company. Imagery helps attract people and convey the company’s story quickly. Encouraging customers to post images of the service on social media pages also shares success stories, and helps build a positive image.

This paper uses a deep convolutional neural network to investigate online advertising and construct prediction models in order to predict which image ads are likely to be successful. It is relevant to our project in predicting the measurement of an image’s effectiveness, but is different in that its measure of success would be “click rate” (due to its dataset being online ads) while ours is specifically “Yelp ratings”.

In this section, also include URLs to any public implementations you find of the paper you’re trying to implement. Please keep this as a “living list”--if you stumble across a new implementation later down the line, add it to this list. link (based on comments and other information of the restaurant) link


We plan on following the paper by utilizing the Yelp Academic Dataset.

The dataset is very large with a folder of over 200,000 images. While we would not be faced with not having enough training data, the dataset would need significant preprocessing as each business’s images need to be associated with their star rating. The images are also raw images that may need to be scaled down and/or cropped for efficient training. We may need to also consolidate the data by using a smaller number of images compared to the paper and possibly removing some star ratings such as those in between whole stars (e.g. 1.5 stars).


The paper outlines two parts to this project: classification and GAN.

For classification we will also base our model on ResNet-18 and modify its parameters and hyperparameters for our project. Training would be similar to the classification problems we had so far in the course. We will train on batches to optimize for lower loss and achieve higher accuracy.

For GAN, we will look into an architecture of a generative model with a discriminator, like the StyleGAN2. Training would involve minimizing the loss from the generative model against the discriminator.

GAN would be a bit of a challenge as we don’t have experience with it yet. We plan on approaching this after meeting our base goal with just classification. If we run into issues with the dataset/preprocessing, we could see if a dataset of Amazon reviews with images or replacing the images with Yelp reviews could be more feasible. If we have problems with our model not learning, we could try referencing a model other than the ResNet-18 such as the DenseNet.


We will be switching around different datasets; such as splitting up our dataset into categories like images from business-owners and images from customers.

For most of our assignments, we have looked at the accuracy of the model. Does the notion of “accuracy” apply for your project, or is some other metric more appropriate? Yes, we will use accuracy to test our model for determining if our prediction ratings match the actual restaurant review ratings.

The authors were hoping to find high accuracies in classifying simplified ratings for various images. The authors compared their modified accuracies and losses with original models with different hyperparameters, and the modified model had accuracies in the 90s.


Base goal: good classification on two datasets (customer dataset, owner dataset). Target goal: implementing GAN. Stretch goal: good classification on multiple datasets.


What broader societal issues are relevant to your chosen problem space?

We believe that a broader societal issue relevant to our chosen problem space is how in a highly digital world, people’s perceptions can easily be influenced and changed by a given and selective image. There may be more dangerous implications than simply classifying a restaurant’s customer rating based off of their posted images. We can see similar problems arising in, for example, online dating or online shopping, wherein a given image may be unrepresentative of a person and greatly impact a person’s chance at finding a partner, or for shops to perhaps misrepresent a piece of clothing to a consumer. The way that we view certain images and how that impacts our perception of restaurants, consumer goods, and other people, maybe have greater implications in a world that is becoming more and more digital. As people in an increasingly online world, we may have to become more wary of others, as businesses and individuals begin to understand what features in an image make a business or person more desirable by others, as potential issues may arise such as scams, catfishing incidents, etc.

What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain?

Our dataset will be Yelp photos (most likely the ones collected from restaurant information or reviews, since those are the ones potential customers look at in order to choose among restaurants). The photos should have some correlation with the Yelp ratings. Some concerns, however, would be the possibility of bias in the photos; For instance, the owner of the restaurant may have the intention of falsely attracting people to come to his/her store with photos that do not accurately represent the food (Food may be staged / decorated to look better than in real life). Another concern would be that customers’ photos may not always accurately reflect on the quality of the restaurant. Since most customer photos are taken by phone, the quality of the photos may not be the best (e.g. angle, colors, etc), and could bring down the prediction rating, while the actual rating may be much higher. Such cases may create many outliers that depend solely on the ones who took the photos.

Division of labor:

We will work on accuracy and oral presentation together. Erica and Brandon will be working on preprocessing owner's dataset, train/test, and poster. Lina and Daniel will be working on preprocessing customer's dataset, models, and final writeup. For hyperparameter and different datasets, each person will try different hyperparameters and keeping track of our experimentation on google docs.

Checkin 2:

Challenges: One of the challenges we encountered was combining our preprocessing and model codes. Since we are all working separately in different time zones (which is also one of our challenges), one group was working on preprocessing, while the other was working on the model. Thus, we had to make sure that the two were compatible and worked well together, which was definitely a difficult step in our project.

Another big challenge we faced was that our data file was too large, which made the running process too long during the preprocessing stage. Thus, we resolved this by doing a brief “preprocessing” stage for the preprocessing of our data, and separated our data into multiple datasets “drink.json”, “food.json”, “inside.json”, “menu.json”, “outside.json” which we can test on individually.

Another challenge was deciding on the number of layers we were going to use, and deciding on the overall structure of our model. We are still trying out different numbers for hyperparameters, and changing the structure within the call() function of our

Finally, one of the biggest challenges we faced was implementing the ResNet model. Although we expected this process to be relatively simple, it was very difficult to get it to work and are still addressing the issues. Therefore, we started out by writing our own model (using what we learned from our previous assignments), which ended up actually running and producing pretty good results (accuracy and loss). This ultimately made us change our base, target, and stretch goals, which are outlined in the last response (regarding the things changed in our project).

Insights: Our model currently has 3 convolution layers and 3 dense layers. We used a batch size of 100 and defined 5 classes each for 1-5 stars. We trained and tested our model for just one epoch on each of the different datasets so far.

The images with a star rating in between whole stars (e.g. 1.5) were omitted from training and testing.

The model’s accuracy is close to 70, and performs better than we had anticipated given that we have written our own model, but the accuracy is still not as high as what we would anticipate from a ResNet model for example. Since we have one model working so far, we expect to implement different models like ResNet and GAN and we expect the accuracies across these models to be a little higher.

Plan: We need to dedicate more time into researching more about implementing ResNet and GAN so that we are able to use different models to train and test. We also feel like we need to dedicate more time on adjusting our hyperparameters and the overall structure of our model (tweaking the number of convolution layers and linear layers) for the best outcome.

After more research, we found out that ResNet18 may not be the best model for our image classification project. The link shows that Image Classification Training on ResNet18 usually outputs the highest validation error. Thus, we’re thinking about using ResNet-101 instead.

Because we experienced difficulty with using the ResNet Model, our base, target, and stretch goal have changed as well. Our base goal is now implementing our own model and seeing decent results. Our target goal is now successfully implementing the Tensorflow ResNet model. Finally, our stretch goal would be to reach a good accuracy (close to Resnet) on our own model, as well as on a new model such as GAN.

Built With

Share this project: