Inspiration

The visualization of time series data is of great research interest. This project was inspired by the ability of color to represent dense information in an intuitive and digestible way.

What it does

Given a list of elements with time stamps and categorical attributes (in this case people with insurance policies at certain times), this program visualizes the amount and composition of elements at a specific time using colors blended by MCA (Multiple Correspondence Analysis) factors.

In other words, it first automatically detects important distinguishing properties of the data, and assigns each one a color. It then represents these differences across points in the data by painting them a mix of colors based on the properties. If the points are distributed across time, it's possible to see trends in the colors that represent high-level changes of different magnitudes. Analyzing the extracted properties can give insights about how exactly the changes occur.

This method can be generalized to a wide variety of applications, as long as categorical attributes are recorded over time. This includes any logs, such as those of business processes, online transactions, daily life routines, etc.

How I built it

The project was built in Anaconda python, with mca and plotly.

Challenges I ran into

Originally, I wanted to do a simple Gaussian Mixture Model, but after preliminary visualizations, it fell through; the data did not behave as I expected (it didn't show large increases in activity only after "events" occurred).

Accomplishments that I'm proud of

The fact that you can actually see a trend in the composition of elements over time. I'm very surprised it worked. I believe it's because the data was synthetic that such a trend was visible.

However, because it was able to work for this toy example, if applied to real data, it does have the capability of indicating high-dimensional changes over time.

What I learned

I knew about PCA, but was not aware of its categorical analog MCA. I now also know that there is k-modes clustering, which is often used for categorical clustering as well.

What's next for categorical_time_series_visualization

More deeply analyzing the factors extracted through MCA to fully understand the trends, and feeding more data attributes to better model the underlying process. In addition, better colors can be chosen and mixed by switching to another color scale more suitable for viewing, such as Lab.

Built With

Share this project:
×

Updates