Yelp Data Script

Inspiration

We like food! However, we found it difficult to find good places to eat, and Yelp isn’t always the best. Plus, we noticed that it’s difficult to run a restaurant in Berkeley. We hoped to perhaps find a way to fix this problem using what we learned in class!

What It Does

Our project is one component of the project described above. We partnered with another project team in the class to make this project more feasible and manageable given the time constraints. The other team is taking care of the data analytics and machine learning part of the project. Our team was in charge of interacting with APIs to grab the data and filter it into a usable format for the other team.

How It Works

Our project is composed of a Bash script which grabs the analytics script from Github (where the other team would publish) and runs their file. We also have a python file that downloads the data and converts it to a csv for the analytics to work properly. Finally, it deletes all irrelevant files. So the final user experience involves pulling our project from Github, running the script, and after a bit of time, a results file will populate the folder where the project is held.

How We Built It

Luckily, the Yelp data was published to Kaggle, which has a very convenient API for python. So, we decided to download it from there. We wrote a script that downloaded the Yelp JSON files and ran through each line in the JSON. Each line is a dictionary, so it can interact with Python very conveniently. As such, we processed the data, keeping only the variables of the data that we are interested in, and used Pandas to convert the data from a Pandas dataframe to a CSV file. This file would then be referenced and used in the analytics script. The Bash script that automates everything is relatively simple. We pull the analytics script from Github, run all the files within that folder to keep everything manageable, move files we want to preserve, like the results file, and delete the folder at the end. It is structured so that what the user downloads on their computer willfully is all that stays on their computer at the end, excluding the results file of course.

Challenges We Ran Into

It was a little difficult to work with the JSON. Since the data was so large, we couldn’t immediately turn it into a dataframe and then filter the data. Instead, we had to use the JSON module inside python to filter the data beforehand, and then turn it into a dataframe, essentially only using Pandas to make the CSV file. Likewise, the Bash script hinged on how well we managed to communicate with the other team, and given how busy everyone was, that could be a bit difficult at times. However, due to everyone’s passion and commitment to the project, we managed to move at a pace that allowed for all of our projects to be completed on time!

Accomplishments We Are Proud Of

We are very proud of our python file, since when we tried to make things easy and run it with Pandas, it crashed our computer (too much memory usage). So, we used the JSON module to make the process ourselves from scratch. This was something we never worked with before, so it took our collective efforts to make it work, but we are very happy that we got it to work :)

What We Learned

We learned a lot about Bash Scripting and how to process data using python. It was a nice introduction to data engineering (or something similar to it) and how all these impressive programs work in the back.

What’s Next

We hope to take what we learned and apply it to other projects in the future. Learning not only the specifics about the project, but also the process of working on the project, will be valuable when working on other teams in the future. We may revisit our project in the future to improve it and perhaps add a way to search for the analytics of a particular business, rather than receive the report of all businesses.

Installing and Running

The project can be grabbed from GitHub using the link: https://github.com/alibnaqvi/yelp_api_project

You also need to have the Kaggle API set up on your computer, which can be done very easily with a few Google searches. You also need to have nbconvert and jupyter notebook installed, which can be done with Pip. Afterwards, all that is needed is to run the Bash script from the terminal!