We were inspired by Disney's challenge to create a hack that related to film and entertainment in general. We found that some of our group was interested in analyzing data, and that others in the group were set on designing a web app. We realized that a way to synthesize a project from this challenge and these interests was to use data about movies to find relationships between them, and to make the data and results available through a web application. We decided to use data from Wikipedia to determine these relationships, since Wikipedia is an easy to use source of large amounts of free information.

What it does

MovieMapper generates a web of movies related to a central node, determining the relationships between movies and the strengths of these relationships using links between Wikipedia pages. Two movies are connected in the web app's display if one of their Wikipedia pages links to the other. The more their pages link to each other, the stronger, or more weighted, their connection is.

To generate this output, a user inputs the title of a movie into the search box on our web app. We utilize the google custom search API to find the Wikipedia article for that movie. From there, the URL of the article is sent to our python backend, which uses a python wrapper for the MediaWiki API to find pages related to the user's requested page using a modified breadth-first search algorithm. Information about these pages is formatted as JSON data and returned to the frontend, where it is used to create a visual for the user. When the user mouses over a movie, we use the omdb API to query IMDB for information about that movie and display it.

How we built it

We essentially divided our team into two teams of two people - placing the two more interested in data analysis in charge of the python backend and the analysis of the Wikipedia connections. We placed the other two, who were interested in web design, in charge of the javascript frontend. The most important external pieces of code used for the backend were flask, which allowed us to run a backend server in python, and the python wikipedia module, which wraps the MediaWiki API. The most important things used for the frontend were react, axios, styled components, and the omdb API.

Challenges we ran into

As we found out shortly after beginning our project, Wikpedia pages have a _ lot _ of hyperlinks - many of which lead to places such as the homepage or to account-related functions like editing pages. Our first challenge was sorting hyperlinks that led to other articles from hyperlinks that led to pages that weren't articles or were lists of articles. Our next challenge was determining which articles were about movies. We also had some difficulty designing a breadth-first-search-like algorithm that could find a certain number of nearby movies efficiently and without duplicates, and had challenges connecting our frontend and backend because of issues with asynchronous timing of get request returns in javascript.

Accomplishments that we're proud of

The consensus among our team is that we're proud of the project as a whole. Looking back, it's amazing how much code we wrote and how much functionality we created in roughly 24 hours, after losing some time on the first night to brainstorming and sleeping. We're also all proud of how quickly we were able to work through bugs and errors, never getting stuck on any particular problem for very long, and of how effectively we were able to focus on our task while working.

What we learned

This was a first foray into web scraping for us, and we learned a lot about how to deal with the imperfect data that web scraping returns from websites. We also learned a lot about integrating APIs into a project, and about coordinating between a frontend and backend development team. Something we noted was the importance of agreed-upon standards for communicating data between programs.

What's next for Movie Mapper

We'd like to continue developing and eventually deploy Movie Mapper. To do so, we would need to rework some of our code for improved stability and efficiency. In addition, we would like to rework the algorithm we use for finding related movies through links. At the moment, we only consider movie pages and links to movie pages, but we'd like to be able to draw links between movies that are "indirectly linked" by another page, such as a page about an actor, a director, or a genre.

Built With

Share this project: