Twitter has an abundance of data, so much that it can be used for numerous purposes that are unimaginable at the time. We wanted to utilize this data to get reviews and suggestions from the people for some product.
What it does
Tweet review works on a given product/company/business etc, and gives us separate lists of tweets of positive and negative comments, story etc that involve the product. We also do keyword analysis on both the lists of tweets to get where the product is lacking or being good at. This will be very helpful for growing businesses and newly launched products to get quick suggestions and feedback of the product. We would be able to reach to people feeling without forcing them to fill out surveys.
There was always a business need in which we have to ask for feedback about our product or take surveys(where we get very low or targeted responses), or check reviews websites. With this product we can get real-time feedback and reviews about our product from the biggest and most active data-sets in the world.
How We built it
We used Apache Spark and hooked it to the twitter feed to extract out tweets for the requested product and do sentiment analysis to show the results. Spark, after processing the feed, uploads it to a database which is queried by a flask app to show the results in the web browser. User can see the tweets in Positive/Negative lists. From the list the user can click and check specific tweets, open them up on twitter and reply to them. Twitter stream is here streamed to Spark on a port in a specified format and more services like facebook, reddit, etc can be easily added by plugging their data to the same port. For categorizing the tweets into positive and negative we are analyzing the sentiment using NLTK with Naive Bayes classifier.
Portability and accessibility
We used pyspark and some bash and python scripts, which can be easily installed and configured to any cloud system with linux installed.
Challenges We ran into
This was the first time we have worked on Apache Spark after doing the big data course in college(which was taught completely differently), it was a challenge for us to learn a new system in a short time(yes, we started late). Also Streaming the twitter api to a port to Spark was a fun challenge that we enjoyed doing.
Accomplishments that we are proud of
We analysed the tweets for the company Google and saw some various interesting results, praising and complaining about some of the google products.
What we learned
Data Science, Twitter streaming API
What's next for Tweet Review
Doing keyword analysis on the data, to show the most important things in a short summary. Adding functionality so that people can change the topic of interest and fetch results for the same. Getting a domain name, polishing UI and publishing to a product(and maybe analyzing it).