What it does
This project completes the first steps of the machine learning and data analysis process, through collection of data, organization of data, cleaning of data, and the fitting of cleaned data to a regression model.
How I built it
Using the learner track curriculum that I have learned from for the past day, I have completed this project. Significant portions of time was spent cleaning and "wrangling" data, preparing it for use in a logistic regression algorithm.
Challenges I ran into
One of the main challenges I ran into was the time complexity when I initially began, where I was trying to clean thousands of data fields provided in the data set. While trying to fill empty fields with the mean value, the program was taking too long to run and calculate. To solve this, I instead set my na_value to its equivalent in the provided data set, and replaced the values with 0 instead (although this still harms overall data analysis).
What I learned
I learned completely new topics and python libraries by participating in this datathon, and was able to start on a competitor challenge as shown. I was able to implement data science tools to make predictions for the first time.
What's next for tamu-datathon-cp-challenge
With the new skills that I've learned this weekend, I look to completing the visualization of data and completing this challenge with predictions, along with other challenges from the TAMU Datathon.
Log in or sign up for Devpost to join the conversation.