We were thinking of having the proper analysis of a product in systematic and product driven way.We went across which datasets to choose from as our idea was to choose the best proper solution hence the project started
What it does
We took Amazon.com Grocery and gourmet data for our case study.We are trying to provide brand wise analytical service for a product
How I built it
We have used IBM Bluemix for our application.Then we used object storage for putting the files in the container.We uploaded the grocery and gourmet json file.We start with ipython notebook upload the data source,We used to data sets one is the review data and other is the meta data.We then set a hadoop config followed by using pyspark.We do analysis on tea products and then merge the data together.We aggregated the data and gave a complete overview.
Challenges I ran into
Getting started was tough as we had little to less knowledge on Apache Spark learning curve was tough eventually did it.
Accomplishments that I'm proud of
Integrated large chunk for analysis of data sets
What I learned
How to use Apache Spark for a retail scenario.
What's next for Brand Analysis for product Search
We will extend it for more detailed analysis