Bruinwalk.com scraper

Inspiration

I am a college student at UCLA and I've found that how well I do in my classes depends greatly on how suited the professor's teaching style is to my learning style. Unfortunately, the official university website for students to give reviews of professors at UCLA is hard to navigate and makes choosing classes and finding out what I want to know about classes difficult.

What it does

The goal of this project is to build a program that for a given department at UCLA, the program will scrape bruinwalk.com for each department, every course offered by that department in the past, and every instance of that course taught by a specific professor, to find the number of occurence of keywords specified by the user of the program.

I worked on this project 2 weeks ago at the Agent:Hacker Hackathon. What I managed to complete last time was that for all the reviews of a course tied to a specific professor, the program scrapes through all these reviews and returns the number of occurences of each keyword specified in the Python program. Any number of keywords can be specified.

During this hackathon, I upgraded the program to be able to scrape all the reviews of a course given for every single professor that has taught the course before. I also designed the program to allow the user to easily store the data about any particular course that has been scraped in a MongoDB database.

How we built it

Python, Selenium WebDriver, Beautiful Soup. MongoDB Atlas, PyMongo, Mongo Shell.

Upgraded project from previous code using roughly same technologies. Did refactoring. Converted data scraped from website into json format in order to load data to MongoDB database.

Challenges we ran into

Inserting JavaScript into a Python file to interact with a search bar web element because bruinwalk.com used permanent overlays to hide the raw web elements.

Ran into an issue with installing PyMongo through pip because wheels was not installed and later founded out there was a missing .whl file. Fixed by installing wheels followed by manually downloading .whl file from internet and learning to point pip to specific file path to install file from within virutal environment.

Accomplishments that we're proud of

Dealing with a lot of setup issues independently, and greatly improving my Googling skills and use of Stack Overflow. Greatly improved my troubleshooting skills in setting up projects.

What we learned

Good ways for organizing a Python project so that sys.path does not need to be changed for importing. How to set up and interact with a MongoDB database.

What's next for Bruinwalk.com scraper

The creation of a web application for UCLA students to be able to access the data that has been put into the MongoDB Database which will involve using frontend and backend frameworks.

Built With

json
python
xpath

Updates

Yan Hauw started this project — Nov 21, 2021 12:23 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.