Inspiration
What it does
The attached model and EDA notebook provides deep insight into the data provided. Using advanced visualization like scatterplot, pairplot, correlation matrix, heatmaps, subplots, etc, the model provides an holistic view of the data. The EDA firstly replaces cost , revenue with net profit, which is more clear and robust metrics towards calculating our result. Then we plot several pairplots, which are excellent visualization method for comparing 2 features on basis of a third one.{like comparing cusomer satisfaction and cost on baseline of cost factors}.
Feature Importance
An important part of our analysis was the feature importance. We used CatBoost Algorithm to find the highest and the least important features towards our net profit. After 699 iterations, we finally reached the optimal learning point and assigned feature importance to different given features in our dataset. According to result, Weekday is the most influential feature towards predicting our net profit, which is followed by Customer Type, Event Time, Customer Location, Cost factor and so on. Overfitting was prevented and optimal learning rate was taken while calculating our metrics.
Excellent Visualization
Excellent visualizations and comparisons are done to extrapolate hidden patterns and correlations between the features like Comparison of pizza type and size on basis of customer satisfaction, comparing performance of automation on our profit, identifying weak working hours which are leading to low customer satisfaction, comparing impact of different Customer type on revenue taking into consideration their different pizza tastes{multi-valued pairplots}.
Challenges we faced
The ACTIVITY_EN column was a bit-challenging as the actions were repeating after uneven instances, thus normal splicing the dataset{at every 7th instance} was not beneficial. Thus, leading to difficulty in calculating time difference in placing of a order and its delivery.
Insights from your analyses
=>Feature importance : Weekday, Customer Type, Event Time, Customer Location, Cost factor, ... => Wedneday is weakest performing day, and highest profit is earned at Tuesday => Medium Paparika is the most liked pizza, Large Paparika is the least => People are least satisfied at 17th hour, thus improvement should be done there. => 18th,12th and 19th hour are the busiest period in the whole day.{Sale of Funghi and Salami is the highest} =>Teenagers make the biggest portion of revenue(and they like Funghi pizza the most), followed by Student(Calzone), Senior(Speciale),Adult. => Distribution channel companies performance is variable in different at different places, eg-> Orderly performs great in Munich district 3 but fails miserably in district 2, same goes for the others => Automation is the least important feature amongst all, thus no more money should be spent on that.
business recommendations for process improvement
=> Less spending on more automation => Focus more on busy hours(18th,12th and 19th ) => During busy hours, more focus should be provided towards early preparation of highly ordered dishes like Funghi and Salami =>Distribution channels should be given their stronghold area and they should work there only. {Orderly performs great in Munich district 3, Town Express in District 5, etc} => Offers should be given on low-sale days like Wednesday and at low-foothold hours. => Teenagers provide highest revenue, so more student discounts can be given to increase their numbers. =>Lowest costumer satisfaction pizzas like Small Veg Pizza, Large Paparika should be removed from the menu, .
Analysis questions:
1) 207 (c) 2) 347(b) 3) 533 (c) 4)1998 (b) 5) 39 (c) 6) 42 ((a) 7) 33% (b) 8) Adult (c)
Log in or sign up for Devpost to join the conversation.