My brother and instructors played the role as the greatest inspirations for this project. I have always been fascinated with data science, statistics, and telling a story through data. I competed in one datathon in the past, but failed to submit a final product. This time I dedicated nearly my entire weekend to compete and consumed the correct amount of caffeine to stay motivated to the submission.
What it does
This product conducts a k-means clustering on counties across the US, comparing 40 factors relating to health, weather, cost, and happiness. There also is a map that shows these clusters by county. The user can filter the map to pick their preferences and then find the cluster that fits them. The map was created in Tableau and the K-Means clustering was done in R.
How I built it
This model was created using k-means clustering on a dataset from Kaggle regarding health, weather, and economic data. I used principal component analysis to figure out which factors carried the most importance and utilized these in the unsupervised clustering technique. K-Means clustering infers patterns in the data and allowed me to group counties in the United States. https://www.kaggle.com/johnjdavisiv/us-counties-covid19-weather-sociohealth-data
Challenges I ran into
This was all new to me. I attempted to implement k-means clustering after only reading about it a few weeks prior. Also, learning Tableau after completing my clustering proved to be a challenge. Lastly, it was a struggle with the time limitations and I had other commitments associated with my school.
Accomplishments that I'm proud of
I am proud that I have a somewhat successful clustering model and an aesthetic Tableau model with user chosen filters to accompany it. I learned a ton in this process and have built on the data science foundation that I have acquired from systems engineering courses.
What I learned
I learned many things, the greatest being the importance of having coaches and mentors that are deeply invested in me. I would not have accomplished the products listed above if not for my coach and teammate. I also built on my ability to code in r and learned more about machine learning. Overall, I am grateful for this experience because it broadened my horizons and is another step towards becoming a proficient data scientist.
What's next for Counties Clustering Search
Next will be to add more factors and figure out why clusters are clustered outside of the criteria I assigned. I would also upload the dashboard to a website and make it more personal for the end users. I could cluster each cluster following the user inputs and continue to personalize the output and visualization.