We believe some people have such experience: when you tried a 5-star rating restaurant on Yelp, the restaurant finally disappointed you. This restaurant might be a good restaurant but not to your taste. One of the reason is that tastes of people vary and the reviews of a restaurant have different qualities which all affect the rating. An American may have totally different tastes towards Korean from an Indian. And we are all long to see how those who have similar tastes or eating habits feel about a restaurant. We want to value those reviews whose tastes are similar to you.
What it does
Ye!Pro is a web application that helps you to find out what best fit your taste. Based on Yelp’s data, we created an improved rating system that demonstrates you the reviews and scores from those who have similar tastes with you and those who are most authentic in eating as well.
How we built it
We redefine the ratings of reviews based on reviewers’ influence on a restaurant’s category. Each reviewers’ rating will be affected by an influence score (like weight), which we designed an algorithm to calculate. Therefore, the result of high rating restaurants will vary for different categories of users.
Our data comes from Yelp Dataset. This set includes information about local businesses in 10 metropolitan areas across 2 countries. There are more than 8.75 million rows of data and three tables which are business, review, and user.
EDA and Data Preprocessing
Due to the size of the dataset, we preprocessed our data on GCP BigQuery to reduce job running time. There are 150 categories associated with food in total. We selected only three categories (or taste) because of limited compute power. After we connected users with categories using SQL, we saved the query results into new tables in order to accelerate the process
How do we the relationship between users and their tastes?
Goal: Change the weight of individual score based on its writer’s (user) influence, review’s influence and writer’s matchness with target users. Review’s influence depends on number of useful, funny cools. Writer(User)’s influence depends on number of total views, fans, friends, compliments received and elites. Writer(User)’s matchness with target users: use covariance matrix to calculate coefficient.
Challenges we ran into
The size of the dataset we processed is very big. In order to reduce latency of our application, we have to scale down the size of our data because of the limit of compute power. Besides, it is challengeable to build the algorithm to improve the rating system in given time. We are the first who connect users with categories and find their relationships, so we are not able to test our model using the historical data. To test the model, we develop a method to label categories to some users for the purpose of training the model. Also, we need to consider about backend design, data storage, training and test the model at the same time.
Accomplishments that we're proud of
We have created an algorithm model that recalculated the ratings on Yelp. The algorithm model increased the accuracy of the rating for each restaurant based on people’s flavor & taste. The rating on Yelp for each restaurant is calculated based on equally-weighted individual rating for each restaurant, which could not accurately reflect the true rating for different groups of people with different flavors/taste. But our model has successfully solved this problem.
We have restructured the data on Yelp. Based on the data on Yelp, we have trained the data and refactored the attributes of data on Yelp, which could more efficiently utilize the data.
What we learned
- We have learned the whole process of a project from initialing to successfully finishing & closing-down. The experience really gave us the idea of how a real-life project look like and the challenges/obstacles associated with it.
- Teamwork. The success of a real-life project is not only about how skillful the team is, but also about interpersonal skills, team-building, cooperation, relationship between teammates and etc.
What's next for Ye!Pro
Right now, we only did a small trial by selecting a small range of categories and few restaurants due to the shortage of time, budget and equipment. In the future, the potential of Ye!Pro is huge. We can include more samples and more variable in our model to improve the efficiency and accuracy. More information will be shown on Ye!Pro’s webpage such as details of each review and recommendations for other similar restaurants. An external link can be built that will direct users to Google Maps.
But meanwhile, if you want to see the original score, it is totally fine. Ye!Pro will demonstrate the comparison of original scores from all other App for users. This can be easily done by just switching the mode.
What’s more, the range of service will increase. Ye!Pro can be expanded in order to provide a rating system covering a worldwide range of restaurants and all other facilities and activities. Ye!Pro can refer the data not only from Yelp but also from other rating app like Google Maps or Trip Advisor. Individual’s browsing record and footsteps will be tract, then the shift of preferred taste will be captured. Therefore, a more accurate rating system can be customized for everyone.
There’s also great potential that Ye!Pro can cooperate with other rating websites to help them build a more accurate system to demonstrate recommendations and promotions. Abnormal and flaws can be detected.