It is important to evaluate feedback from customer of product, in order to improve it. But there are millions of products and more than billions of reviews. So for a conventional computer it is not possible to process or store the data. Hence, in real world use case, these kinds of project are built using distributed framework like Hadoop/MapReduce.
What it does
The project calculates overall positive, negative and neutral sentiment of product by using tokenism concept.
How I built it
We configured Hadoop in pseudo distributed mode. Setup eclipse as development environment. MapReduce is used as data processing framework.
Challenges I ran into
While configuring Hadoop and setting up development environment.
Accomplishments that I'm proud of
Successful execution of logic
What I learned
Hadoop Configuration. Hadoop Daemons. Distributed processing and storage MapReduce Distributed Hash table
What's next for SentmentAnalysis-AmazonReviews
We are using open source dataset of reviews, we can create a web crawler for this project, in order to scrape data from real products from sites like amazon, craiglist, etc. and run sentiment analysis batch job on it.