We wanted to find an application of the PoC data that both assisted individual PoCs and advanced ABInBev's operations across the board. While exploring the data, we realized that we could measure volume consumed off both the draught and the bottle sales. This lead us on our quest to understand the beer vertical and horizontal markets through the lens of consumption data.
What it does
FtN predicts the sales of ABInBev products based on factors such as location, type of venue, demographics, weather conditions, day of week, and day of yea. With FtN, ABInBev sales representatives stay proactive when advertising to PoCs. Currently, FtN predicts consumption of Budweiser Draught, Stella Draught, Bud Light Draught, Labatt 50 Draught, Budweiser Bottle, and Bud Light bottle.
How we built it
We created a Neural Network using unsupervised learning and semi-supervised feature selection. First, we cleaned and filtered the data in MySQL and reorganized it to identify potential predictors of consumption. Each potential factor was analyzed for statistical significance. The predictive power of the type of establishment was tested through a Chi-Squared Goodness of Fit Test. The day of the week and the weather were tested through Pearson Correlation Coefficient. Other variables like the quality of the lines and the amount of non-beer items sold were tested but did not show a relationship or predictive strength.
Once the factors were narrowed down, we took the 4 months of consumption data and divided them into 3 months of training data and 1 month of testing data. The resulting Neural Network was able to predict the consumption of the ABInBev products w/ over 70% accuracy.
Challenges we ran into
The PoS records was broken into four files, so we had to create a master PoS file w/ over 3 million records. This made analysis of consumption computationally expensive when relating data across the various data sources.
When analyzing the type of PoC data for statisical significance, one challenge was overcoming the different sample sizes of PoC types. There were over 40 Bar/Pubs but only 1 Adult PoC. Creating expected values for the consumption of PoC types with small sample sizes required a bit of out of the box thinking -- utilizing a Chi-Square test but with weighted expected values based on proportion of sample size instead of the product of averages.
Accomplishments that we're proud of
The sheer effort and technical work done to analyze variables and test potential factors for statistical significance in a semi-supervised feature selection process paid off when we found over 70% accuracy for our Neural Network despite the limited number of data points.
What we learned
We learned how to distribute tasks among team members to emphasize our strengths and cover for our weaknesses. We came in as a team with a wide set of skills between Database Administrators, a Statistician, and Computer Scientists. It was confusing at first but we were able to make a supply chain to efficiently accomplish tasks.
What's next for Flop the Nuts
FtN can grow in two primary ways. First, additional data points can be gathered and this will increase the accuracy of the Neural Network. Second, additional analysis of potential factors can be performed. Additional PoCs could be added to the sample to allow for better testing of statistical significance. A suggested beer purchase for PoC from distributors can be generated off the FtN predictive model.