There are so many interesting ways to visualize data from services like TwitterEarth to TagGalaxy. We thought about the difficulties of efficient data visualization in high dimensions and decided to solve this well known issue in the most uncommon way possible.
Through partitioning of records into warring factions led by 2018's SPICIEST star. Ugandan Knuckles.
Together. We will find data way.
What it does
Performs principal component analysis on a dataset and applies K-means clustering to the result in order to partition the records into tribes represented by Ugandan Knuckles variations.
How I built it
All data mining principles were done through a python script running pandas, numpy, scikit-learn, and scipy.
Graphs were produced with plotly.
The frontend was a simple flask app being served off of Amazon EC2.
Challenges I ran into
Amazon AWS did not support Python3.6 which is honestly ridiculous and caused major setbacks.
I was hungry.
What I learned
Silhouette Averaging for optimal K selection as well as general insight into effective use of Python data cleaning / transformation tools.
Deeper understanding of the AWS services offered by Amazon.
How many variations a widely considered one-dimensional meme truly has.
What's next for DoYouKnowDataWey?
We fully intend to address scalability concerns for massive datasets.
In addition we would like to expand on the data analysis offer more meaningful statistics with regards to the observations.
And of course, we want to bring Ugandan Knuckles to even greater spicy meme heights that he so clearly deserves.