Inspiration

When I use Yelp, I often find myself sorting by review count because I know that the more reviews a business has, the more accurate its overall rating is. I'd like to provide Yelp users more insight into businesses by providing easily summarizable statistical information. This app is my first attempt at doing just that.

What it does

The app is fairly simple:

  1. It asks the user to choose a city
  2. It takes the chosen city and finds the top 10 most reviewed restaurants in that city For each of those 10 restaurants:
  3. It gets the restaurant name
  4. It uses the Yelp API to get the total number of reviews
  5. It gets the number of 5, 4, 3, 2, and 1-star reviews from the restaurant's Yelp page
  6. From (4) and (5) it calculates the mean and standard deviation of all reviews
  7. It assumes a normal distribution if the restaurant has more than 100 total ratings, otherwise it assumes a student's t-distribution
  8. From (6) and (7) it calculates a 99% confidence interval for the mean
  9. It displays the restaurant name, mean, and confidence interval to the user

How I built it

I built the app in Python using the following libraries: yelp, requests, lxml, scipy.

Challenges I ran into

I would have loved to use the Yelp API more extensively but it is difficult to get broader statistics on restaurants using it. It also was a little difficult finding a way to scrape the Yelp web pages to get the information I needed to perform basic statistical analysis.

Accomplishments that I'm proud of

I'm proud of learning the basics of how to use the scipy library. It is a very powerful tool and I'm happy I know the basics because I'm sure it will be very useful to me in the future.

What I learned

Besides scipy, I was able to brush up on some statistics and learned how to think about applying that information in a business setting. What information will customers find most useful? What is the best way to present that information?

What's next for Yelp Statistical Analysis

I would like to implement much more statistical analysis and present it in a user-friendly way. To this end, I'm learning Django in order to create a much more user-friendly, web-based app.

Built With

Share this project:

Updates