Inspiration
We were inspired by the presentation of the Colgate sponsor that occurred at the beginning of the Hackathon. In addition, machine learning is one of the most interesting topics that come out of Computer Science. The opportunity to both work towards a goal centered around machine learning in addition to learning more about the relationship between Computer Science and Colgate drove us to create this project
What it does
Taking in the variables of "Country of Make", "Company Brand", "Total size", "Unit size", and "Ingredients", the program will predict the price of the toothpaste based on the thousands of previous toothpaste cases as given by a dataset.
How we built it
Firstly, we searched through the multitude of machine learning algorithms that could be used with the dataset. We concluded that the method of linear regression (alongside other regression models) would give the most accurate prediction of price when given all other information. The over 14000 cases in the dataset would be our training data for our model. Because one of our members had experience in the programming language "R", we decided that the best way to quickly perform linear regression was to use this statistics-focused language. After performing this, we found each of the weights of each variable and used those weights to construct a formula in Python for the price of a brand of toothpaste.
Challenges we ran into
One of the most inexorable obstacles we ran into was that of time. Compiling the literal thousands of variables that make up our line of best fit took an incredible amount of time, even for the most powerful of our laptops. In addition, because the variable of "ingredients" was neither a numerical variable nor a categorical variable, we spent plenty of time converting the string of ingredients that make up a "ingredients" variable into a boolean variable capable of being used in our linear regression model. Our final notable challenge was that of the front-end design of our project, which involved a lot more steps than originally thought.
Accomplishments that we're proud of
Our Linear Regression model has an r-squared value of 0.88. As the maximum value is 1, and values of 0.85-1 refer to models that are highly accurate. Our line of best fit is thus accurate and will predict prices accurately when given all other variable information. In addition, we were actually able to get some sleep while working on this project.
What we learned
If you think you can finish code in a certain amount of time, it will almost always take a longer amount of time. Knowledge is power: the more you know about a language, the less you will have to search things up.
What's next for HackRU-Fall-2019
With the experience we have gained thus far from our project, mainly in the language of Python, we eagerly wait for another opportunity for our computing skills to be tested, and to use the knowledge we will have learned in future classes to create new and interesting projects.
Log in or sign up for Devpost to join the conversation.