Enterprises generates loads of transactional data daily. This data is very rich in insights and if mined correctly, it can help businesses to grow and expand. These insights lead to data driven decisions, which can take businesses to a new heights.

And, we believe that Apache Spark is a robust framework for data analytics and can support business decisions effectively. We wanted to showcase what Spark can do for businesses, using machine learning algorithms and analytics. However, we also wanted to ensure that the analytics should be presented in a user friendly manner so that the non-technical business owners can understand them easily and take appropriate actions.

Moreover, along with showcasing the strength of Spark, we also wanted to propose a technical architecture which can be deployed in practical scenario, to be used by business in day-to-day operation.

To incorporate this vision, we present Business 360°, a Web based Application with the power of Apache Spark.

What it does

Four core aspects which every organisation need to focus on, are Products/Services, Customers, Team and Competitors. Business 360° analyses your business data covering these 4 aspects by running machine learning algorithms and thereby, provides decision support insights for your business through a smart User Interface through a Web Application.

Through Business 360°, you can get insights which can help in increased sales and profit along with improvisation in their products and services. For customers, you can focus on targeted offers & campaigns based on segmentation and get means to increase customer base and ways to retain them. For the team members of your organisation, Business 360° helps in analyzing team performance and help in resource planning & allocation. Lastly, competitors can be analysed to understand their product features and help improve your product accordingly.


  • Sales Analytics - Provides details of top selling product categories based on Number of transactions and Gross Merchandise Value. Also provides list of product categories, whose sales are on increasing or decreasing trends, in order to take appropriate action. If on increasing trend, such products can be used to up-sell or cross-sell. And if on decreasing trends, reason can be determined to increase sales through promotions.

  • Product Analytics - For the products, provides daily sales trends, which can be used to infer trends in customer spending. Also provides sales predictions based on weather and week day, which can be used to determine inventories or promotions. It can also detect any anomalies in sales, for which business can do a causal analysis and act appropriately. It also identifies which customer segments prefers the particular product category, based on their age and income.

  • Marketing Analytics - Provides product recommendations, which identifies which products are bought together by customers. This can help in cross-selling or up-selling. Using Cluster Analysis, it provides details of customer segments who are more likely to buy the product based on previously used offers for that particular product category. Using RFM Modelling, it provides list of customers who are more likely to buy the product, which can be used for targeted offers and campaigns.

  • Feedback Analytics - Provides textual analysis of customer reviews which can help identify why the products are sold more or less. It also provides key features of your competitor's products which the customers prefer.

  • Complaint Analytics - Provides monthly trends of customer complaints, which can help in resource planning. Also provides complaints segmentation based on demographics, product type, medium and resolution. Team performance can be judged on how timely the complaints are responded to.

How we built it


IBM Bluemix Platform was used to build and host the Business 360° Application. The entire development cycle consisted of the below steps:

  • Input Data (Retail Transaction & Master Data, Historical Weather Data and Amazon Customer Reviews) was imported into IBM Object Storage
  • Using IBM Analytics for Apache Spark, Input Data was analysed and machine learning algorithms were run using Python Jupyter Notebook. Analysis was stored in Cloudant DB. Document based storage was chosen for easy retrieval of data
  • Open Weather Map API was used to get weather predictions for next 10 days, which helps in sales forecasting models
  • Node.js based Web Application was built and hosted on IBM Bluemix to interface with users, with the UI built using HTML5, CSS3, Bootstrap framework, Javascript and jQuery

Machine Learning Algorithms:

Below Machine Learning algorithms were used to derive the analytics:


RFM (recency, frequency, monetary) analysis is a technique used to determine quantitatively which customers are the best ones and can be targeted through customized offers, by examining how recently a customer has purchased (recency), how often they purchase (frequency), and how much the customer spends (monetary).

Technical Implementation: Use of Data Frame Analytics using pyspark SQL

Likelihood to Purchase Modelling

Propensity models, also called likelihood to purchase or response models, help predict the likelihood of a certain type of customer behavior and purchase patterns. This helps marketers optimize marketing strategies like promotional email/app notification frequency, discounts or offers/campaigns.

Technical Implementation: Use of KMeans Clustering

Natural Language Processing - Keyword Extraction

Keyword extraction is the automatic identification of terms which best describe the subject of a text based input.

Technical Implementation: Use of Frequency Distribution and Keyword Determination using NLTK Library. Rapid Automatic Keyword Extraction method was implemented using frequency distribution in order to get the key terms which represents the reviews.

Linear Regression

Linear Regression is an approach used to model the relationship between a scalar dependent variable and one or more explanatory (or independent) variables.

Technical Implementation: Use of Linear Regression Modelling methods of scikit-learn machine learning library

Anomaly Detection

Through Anomaly detection, also known as outlier detection, one can identify the items, events or observations which do not conform to an expected pattern or other items in a dataset.

Technical Implementation: Use of luminol package to detect anomaly

Recommendation Engine - Associative Rule

Associative rule based collaborative filtering is a technique to discover interesting relations between variables in large databases.

Technical Implementation: Market Basket Analysis using Associative Rules method to determine which products are bought together frequently

Data Analytics

Data Analytics is the science of examining raw data to gather insights, based on data segmentation, aggregation, etc.

Technical Implementation: Using pyspark SQL, get insights from data and stored in Cloudant DB for easy retrieval

Data Sets Used:

Challenges we ran into

Finding the right open data set in order to showcase the power of Spark on business data.

What we learned

Apache Spark is a robust platform for business analytics. What other platforms take ages to execute, same machine learning algorithms can be executed in a jiffy with Apache Spark

What's next for Business360

To add more types of modelling and machine learning algorithms like sentiment analysis, customer sentiment prediction, targeted offers based on sentiments on social media.

Share this project: