Problem:

Movie ratings and reviews are disordered and confusing. Different sites have vastly different ratings, and most sites are run by critics’ ratings rather than by your average person. These critics rate movies very differently than do average moviegoers. So, who would know best about how enjoyable a movie might be? That would be your average fellow moviegoer.

Solution:

To solve this problem, we used a web parser to parse Twitter for tweets about movies, and then applied sentiment analysis to find how the author felt about the movie. We then combined this information into an easy-to-view and interactive website that displays the ratings of popular movies and offers a feature for users to search for other movies.

How we built our app:

For the sentiment analysis, we looked at negation verbs, booster adjectives, a Lexicon dictionary, and punctuation to determine a numerical value for the sentiment of the tweet, from -1 to 1. We also took popular slang that means the opposite of the literal meaning of the technical phrase (i.e., shit versus the shit, ass versus bad ass, etc.)

Our web scraper for Twitter combined a lot of spam and irrelevant tweets that contained the keyword by coincidence. We took several steps to filter out spam and irrelevant tweets. Twitter’s API offered several different pieces of information about a Tweet, such as location, whether the author’s account is verified, whether the author’s account has a default profile image, whether the author’s tweets are protected, and other information. We attempted to filter spam and irrelevant tweets using this information, but there seemed to be little to no correlation between these pieces of information and the relevance of the Tweet. Eventually, we thought to filter tweets by looking for words having to do with movies (TV, movie, see, watch, story, film, soundtrack, and animate).

While our algorithm is effective and innovative, the way our beautiful website delivers our information is just as breathtaking. We used a Heroku app with floating particles and a logo at the top. We present the movies and ratings in movie poster format, complete with a visual star rating and a movie poster. We also embedded a search bar, that adds user searches to the Firebase for data in order to activate user searching, which works real time with our web parser and sentiment analysis.

  1. Web parser
  2. Filter
  3. Sentiment Analysis
  4. Display on website ##### Tools used: Heroku, Firebase, Python, Javascript, Flask, Twitter API, IMDb API ##### Technologies/processes used: Twitter web parser, sentiment analysis, natural language processing, website development, web design, statistical analysis, search technologies ##### Problems we encountered: We were greatly limited because we did not have access to Twitter Firehose. Twitter Firehose is Twitter’s premier API; it gives you access to nearly all of Twitter’s non-confidential data. However, it also requires an application months in advance and/or $400,000 annually. Because we did not have Twitter Firehose, we could only access tweets from the last seven days, and we had access to only about 500 tweets every fifteen minutes. This slowed down our app and limited our accuracy, two problems that would disappear with the use of Twitter Firehose. With the additional information that Twitter Firehose offers, we would also be able to better filter out spam and irrelevant tweets. ##### Future Plans: We decided to start with movies and with Twitter data. We plan to apply our web parser and sentiment analysis to Reddit, Instagram, Facebook, and possibly even Google Trends. Next, we plan to parse social media for books using the same approach as with movies. After that, we will expand to parsing social media for product reviews using sentiment analysis.
Share this project:

Updates