Think of a time when you could describe something, but just couldn't figure out the name. For example, you could recite the plot of a movie but couldn't come up with the title. Other people are great at answering these kinds of questions, but existing search tools like IMDb aren't so great--you can only search by Title, Actors, etc. We wanted to create an application that could answer these kinds of questions, and provide an intuitive interface: one that a user can interact with as if they were describing the plot to another person.

What it does

Qasis allows a user to search for movies based on a natural language description of the plot.

Sample queries:

  • family of superheroes fights a redheaded villain
  • farm boy battles the empire
  • kid from coal mining town builds a rocket

Should return:

  • The Incredibles
  • Star Wars
  • October Sky

How We built it

The top 1000 movies were scraped from IMDB using kimono lab's scraping tool and OMDB. These were then loaded into a Postres database and preprocessed (removed stopwords, stemmed, etc.). An information-retrieval algorithm then parsed keywords from the query, calculated Term-Frequency, Inverse-Document-Frequency scores for each of the movie synopses, and ranked by their relevance using python. A jQuery interface then rendered this data for the end user.

Challenges we ran into

We originally tried to build a React.js front-end but the learning curve was too steep for the time we had. Additionally IMBD does not publicly share movie posters so a work around had to be built with The Movie Database. Finally, computation speed was a bottleneck. For example, we were able to make marginal performance improvements to the scoring algorithm by including keyword synonyms/synset-lemmas (related words based on grammar hierarchies) but they came at a much greater speed cost, so in the interest of user experience we did not include that feature.

Share this project: