Inspiration

As avid coffee enjoyers, our team wanted to understand which factors contribute most to a cafe’s rating on Yelp using the Yelp Open Dataset. After identifying these select features, we aim to help aspiring business owners predict the Yelp rating and potential success of their new, proposed cafe. Using insights derived from our ML model, we have also come up with actionable tips and insights that businesses can consider if they want to improve their store’s rating.

What it does and how we built it

Using the Yelp dataset, we've determined and extracted the features of cafes that are the most significant in predicting its Yelp score. The 2 most important features that we identified were the cafe's quality of food as well as its level of customer service. To quantify these values, we trained a DistilBERT model to parse Yelp reviews and assign a numerical score to the sentiment users expressed in their reviews towards the food and the service respectively. Finally, we built a linear regression model with all of the features we extracted to determine the weights of these features and their individual impact on predicting the cafe's overall Yelp score. Using this information, we can feed in a business owner's proposed features for their new cafe and our algorithm will output its predicted Yelp score.

Challenges we ran into

The dataset that we are using contains substantially large files which posed as an initial roadblock. Incrementally reading and storing from the large dataset slowed us down but did not stop us. We also experienced some delay when running our trained model on the remainder of our cafe's reviews.

What's next

Looking ahead, we hope to analyze the Yelp reviews even further to determine trending or popular drink items. Thus, if a business owner is looking to open a cafe, we can make even more targeted suggestions like what menu items they can include for a more successful opening.

Built With

Share this project:

Updates