Inspiration
I had no idea what to do, but I knew I wanted to participate. I was in such a time crunch, so I spent a bit of time searching for a project, and ultimately decided that the smartest thing for me to do was follow a tutorial, specifically this tutorial: https://www.youtube.com/watch?v=Hr06nSA-qww
What it does
This project is a very simple data training model, a linear model, which reads Olympic games data: countries, their medals, ages, years, and more. It demonstrates spotting errors in data as well as: identifying strong vs. weak correlation, how to make calculated predictions on a set of data, and most importantly, how to clean data.
How we built it
I built this using Google Colaboratory and a CSV file, as well as following and noting important points from the tutorial I followed.
Challenges we ran into
I struggled a lot with understanding the data being shown to me, especially graphically, since reading data has always been a challenge. I've also never had to cleanse data before or use so many numpy and pandas tools, so it was fun learning what these specific libraries have capabilities for.
Accomplishments that we're proud of
I am very proud to have completed this tutorial--even though incredibly simple and short--during an incredibly hectic class and work schedule.
What we learned
I learned how to import CSV file data into Google Colab as well as how to read and parse through it. I also learned what "cleaning data" means, and what to look for in order to do that. On top of that, I was able to review (after a few semesters) what "linear regression" looks like, and had the pleasure of being surprised by the fact that it is a data training model, which I did not know before.
What's next for DockerAIML Project
I hope to, when I have time, keep following more tutorials and updating a repository or folder with small projects to help me learn practical applications for ML and data training models in general. Vik, the man leading the tutorial, gave an overview list of what else can be added to this specific project, which I hope to someday explore:
- add more predictors
- note: didn't use columns like events, age, or eight
- try different machine learning models
- i.e. random forest, neural network
- test to see whether they perform better
- see athlete_events.csv to look at individual athlete stats
- reshape some of the columns
- measure the error more predictably to build a system like a 'back-testing' system
- train a model for different types of countries
- low vs high medal countries
Built With
- google-colab
- python
Log in or sign up for Devpost to join the conversation.