Inspiration

We were inspired by Disney's challenge to create a hack that related to film and entertainment in general. We found that some of our group was interested in analyzing data, and that others in the group were set on designing a web app. We realized that a way to synthesize a project from this challenge and these interests was to use data about movies to find relationships between them, and to make the data and results available through a web application. We decided to use data from Wikipedia to determine these relationships, since Wikipedia is an easy to use source of large amounts of free information.

What it does

MovieMapper generates a web of movies related to a central node, determining the relationships between movies and the strengths of these relationships using links between Wikipedia pages. Two movies are connected in the web app's display if one of their Wikipedia pages links to the other. The more their pages link to each other, the stronger, or more weighted, their connection is.

To generate this output, a user inputs the title of a movie into the search box on our web app. We utilize the google custom search API to find the Wikipedia article for that movie. From there, the URL of the article is sent to our python backend, which uses a python wrapper for the MediaWiki API to find pages related to the user's requested page using a modified breadth-first search algorithm. Information about these pages is formatted as JSON data and returned to the frontend, where it is used to create a visual for the user. When the user mouses over a movie, we use the omdb API to query IMDB for information about that movie and display it.

How we built it

We essentially divided our team into two teams of two people - placing the two more interested in data analysis in charge of the python backend and the analysis of the Wikipedia connections. We placed the other two, who were interested in web design, in charge of the javascript frontend. The most important external pieces of code used for the backend were flask, which allowed us to run a backend server in python, and the python wikipedia module, which wraps the MediaWiki API. The most important things used for the frontend were react, axios, styled components, and the omdb API.

Challenges we ran into

As we found out shortly after beginning our project, Wikpedia pages have a _ lot _ of hyperlinks - many of which lead to places such as the homepage or to account-related functions like editing pages. Our first challenge was sorting hyperlinks that led to other articles from hyperlinks that led to pages that weren't articles or were lists of articles. Our next challenge was determining which articles were about movies. We also had some difficulty designing a breadth-first-search-like algorithm that could find a certain number of nearby movies efficiently and without duplicates, and had challenges connecting our frontend and backend because of issues with asynchronous timing of get request returns in javascript.

Accomplishments that we're proud of

The consensus among our team is that we're proud of the project as a whole. Looking back, it's amazing how much code we wrote and how much functionality we created in roughly 24 hours, after losing some time on the first night to brainstorming and sleeping. We're also all proud of how quickly we were able to work through bugs and errors, never getting stuck on any particular problem for very long, and of how effectively we were able to focus on our task while working.

What we learned

This was a first foray into web scraping for us, and we learned a lot about how to deal with the imperfect data that web scraping returns from websites. We also learned a lot about integrating APIs into a project, and about coordinating between a frontend and backend development team. Something we noted was the importance of agreed-upon standards for communicating data between programs.

What's next for Movie Mapper

We'd like to continue developing and eventually deploy Movie Mapper. To do so, we would need to rework some of our code for improved stability and efficiency. In addition, we would like to rework the algorithm we use for finding related movies through links. At the moment, we only consider movie pages and links to movie pages, but we'd like to be able to draw links between movies that are "indirectly linked" by another page, such as a page about an actor, a director, or a genre.

Built With

axios
beautiful-soup
flask
google-custom-search
javascript
json
matlab
mediawiki
omdb
python
react
regex
web-scraping

Submitted to

HackGT 6 Into the Rabbit Hole

Created by

I worked on the back-end of the project. I helped to write the code that gets a graph of connected nodes from a website. I was also responsible for converting a large amount of scraped web data into the JSON file we use to check if a wikipedia page represented a movie.

Kian Vilhauer
I worked on the front-end, integrating both the web-app (written in React) with the backend. I also helped with assembling the backend through flask.

Jeffrey Luo
My first time working with React to create a front end. Used styled components to create a nice layout before adding functionality with google's custom search api and another another 3rd party api to populate fields with movie information when the name of the movie is searched in the search bar.

William Chen
CS major at Georgia Tech, 2022.
I worked on the back-end. It was my first time using Python, and I had to do some high-end stuff (in my opinion) regarding webpage scraping and the implementation of a breadth first search algorithm. Ultimately, I ended up contributing most of the code that takes in a website and returns a graph of connected nodes.

Andrew Li

Updates

Kian Vilhauer started this project — Oct 27, 2019 06:14 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.