Inspiration
Sales are important for every company. Our mission is to empower entrepreneurs with profound insights into the intricate web of factors influencing sales. We aim to provide scientific and reliable analysis to help them make better decisions in future.
What it does
It manages to predict sales with data input.
How we built it
3 steps: 1.preprocessing data by dealing with missing data and data encoding 2.Eda in a few steps: data visualization, feature engineering, feature selection 3.Modeling and model evaluation
Challenges we ran into
- How to deal with missing data, whether to drop or impute values, and selecting the appropriate imputation method are uncertain. It requires domain expertise to understand the nature of data in order to preserve data integrity and avoid biased results.
- Feature selection significantly affects the model performance, the relationship between factors is complex, not always the same as our common sense. We spent a large proportion of time investigating possibilities.
- Identifying potential interdependencies within substantial information in the table also took us a long time.
Accomplishments that we're proud of
Instead of resorting to conventional practices such as completely dropping missing values or replacing them with zero or median, we implement KNN imputers to replace input’s NA value with the most likely number based on the non-missing observation within a specific column. By doing so, we can make the model more representable and accurate in predicting the dependent variable. Our experiment with different K values revealed that the R-Squared value was the highest when it equals to 5. This suggests that five nearest neighbors align well with the underlying pattern of the dataset.
What we learned
We learned there are multiple methods to deal with missing data and they cater to different features based on Different models to be used Ways to handle imbalanced data
What's next for Champion Group dataset project
The sales prediction model can be better improved if we can obtain more data like historical sales data over the past 5 years, consumer base or financial condition. If applicable, we can create a user-friendly interface or dashboard for stakeholders to interact with the model's predictions, so that users can visualize the result more clearly. We can also implement a mechanism for continuous monitoring by periodically updating new data.
Log in or sign up for Devpost to join the conversation.