Inspiration
We love data and we love python so this was a wonderful challenge!
What it does
We have written a set of both functions and organized descriptive code notebooks that parse the medium-sized OppenheimerFund website clickthrough dataset, analyze aspects of its behavior, and provide some options for clustering the data to allow for better classification and prediction of user behavior.
How I built it
The data was provided in a python dataframe, so the pandas python library was a natural choice for manipulating and analyzing the data. Visualizations of the data were performed using Matplotlib, clustering was perform using K-means clustering from the scikit library, and all organization decisions and analysis were performed by us!
Challenges I ran into
While the dataset was only medium-sized it was still large enough to caused many operations to require very noticeable processing delays. Additionally, we had no prior experience working with this type of data before so understanding how to properly organize it was a fun challenge!
Accomplishments that I'm proud of
We were successful at manipulating the data to reveal some of its behavior.
What I learned
We put significant effort into figuring out how to ask complex questions of the data, but probably focused too much time on answering these questons rather than working on techniques for visualizing the data.
What's next for Naive clickthrough clusters
Given more time we would have developed visualizations of the data clustering. It would have also been nice to finish our experiments with using markov processes to more fully analyze correlation between successive events.
Log in or sign up for Devpost to join the conversation.