In this track, we are provided with mainly a time-series dataset, and we are supposed to predict the future purchase of the three outlets given.
After some research, we decided to use a Random Forest Regression model to deal with the dataset and produce the result.
We also tried an Exponential Smoothing model, but due to the lack of a strong periodic oscillation or a trend, it gave inconclusive results. Therefore, a random forest regression model is more suitable.
What it does
The model predicts the number of sales of the items on a day based on the features of that day. The features include year, month, day, weekday and holiday. The prediction is made by forming a regression function with all the features, which will give the number of sales. After obtaining the number of sales, we convert it to number of purchase in the excel using the speculation provided.
How we built it
The model is built mainly with scikit-learn library, using Python on Jupyter Notebook.
Challenges we ran into
The given dataset contains limited features, therefore we had to come up with ways to split one feature into several features for the model training. The cleaning up of the dataset is also rather difficult due to the missing dates.
Accomplishments that we're proud of
The model is working and giving reasonable predictions.
What we learned
We learned how to deal with numbered data and the application of regression models.