Inspiration
We are passionate Machine Learning hackers! Throw any data at us, you will get information back.
What it does
Our project, firstly, reads the dataset into easily manageable Pandas dataframe. We then preprocessed it to build the feature set the two ML algorithms we have used can work on. These models predict the prices, one with really good accuracy and the other with a good amount of explanation!
How we built it
We used Pandas to import the CSV. For feature engineering, we took the Ingredients column and mapped it to a matrix of 1s and 0s to inform the model of all the ingredients present in the product. Country and Company were encoded to One Hot Vectors. We normalized the total_pack_size_ml_g column and finally merged all these columns into one big feature dataset.
For the model building phase, we worked on two of them. One was Random Forest for its explain-ability of feature importance and the other one was TensorFlow for its ability to handle complex feature set with a large number of features. We tuned the hyperparameters of each model to finally get MAE of 1.89 for TensorFlow and MSE of 1.4 for Random Forest. TensorFlow was optimized with AdamOptimizer and XavierInitializer for the weights. RandomForest was used with max_depth of 2, 200 estimators and max_feature of 3000.
Challenges we ran into
We iterated the model over a lot of algorithms like Linear Regressor, Support Vector Regressor and XGBoost, tuning each of these models and finally to come to a decision that there are other better models that work better for this use case and dataset, and all of these keeping the 24 hour deadline in mind.
Accomplishments that we're proud of
We were able to change and adapt to the issues we faced with every model we tried.
What we learned
TensorFlow, if you have a big feature set! RandomForest, you rock when it comes to making hoomans understand the working in that "black box" of yours.
What's next for Colgate Price Prediction
We have one explainable system that has a good accuracy and another one with really good accuracy but is a black box. We need to merge it into one system that gives the best of both worlds.
Built With
- amazon-web-services
- jupyter-notebook
- pandas
- python
- scikit-learn
- tensorflow
Log in or sign up for Devpost to join the conversation.