We were thinking of having the proper analysis of a product in systematic and product driven way.We went across which datasets to choose from as our idea was to choose the best proper solution hence the project started

What it does

We took Grocery and gourmet data for our case study.We are trying to provide brand wise analytical service for a product

How I built it

We have used IBM Bluemix for our application.Then we used object storage for putting the files in the container.We uploaded the grocery and gourmet json file.We start with ipython notebook upload the data source,We used to data sets one is the review data and other is the meta data.We then set a hadoop config followed by using pyspark.We do analysis on tea products and then merge the data together.We aggregated the data and gave a complete overview.

Challenges I ran into

Getting started was tough as we had little to less knowledge on Apache Spark learning curve was tough eventually did it.

Accomplishments that I'm proud of

Integrated large chunk for analysis of data sets

What I learned

How to use Apache Spark for a retail scenario.

What's next for Brand Analysis for product Search

We will extend it for more detailed analysis

+ 24 more
Share this project: