The companies that provide our communication services store a wealth of data representing us, the users. The stores of digital exhaust in which our data resides are generally unseen and unknown to us.
However, they are not out of reach. The GDPR regulations have pushed companies towards sharing data back with the users. The problem is that this data is returned in unfriendly ways. It's as if you're asking these companies for the $1,000 they unfairly took from you, and they deliver it as a truckload of pennies.
It's the same thing with your data. Messages from Google Voice are exported as a massive directory full of HTML files. Yes, you can open each one individually and read small snippets of conversation, but without a cohesive way of organizing this data, the export can be close to useless to most. Any real analysis requires significant computing effort.
This is where the Interpersonal Aggregator comes in. Imagine having a single database that houses all of your personal information across any source, and structures the data relationally in a useful way. No matter the format the data is provided to you by a big tech service, the aggregator can parse and integrate your personal information into your own private database.
This database serves as the core of a personal, digital twin. You can monitor exactly what kind of data these tech giants are storing about you, as well as make use of the data yourself to reflect and better understand your life.
The Quantified Self (QS) movement has a celebrated history of individuals hacking together their own reality mining rigs to gain self-knowledge and make positive changes in their life. While the community benefits significantly from the hard work of these individuals, setting up these kind of systems is challenging for even a well-educated and capable populous. This project is the start of enabling entirely new swaths of society to begin to benefit from the data they continuously create for, and provided to, the large tech companies that increasingly compose our critical social structures.
What it does
During CU's hack4impact event, I started the high level design of the project, including a data processing pipeline, initial database design, and completed proof of concept aggregation from two interpersonal informatics sources: Google Voice and Facebook Messenger.
The Python script has functions for parsing data exports from both Messenger and Voice, and uses the SQLAlchemy ORM to store the data in a SQLite database.
Once the data has been stored in the database, it is easy to query using the ORM in order to create dataframes for traditional analytics.
How I built it
I started by examining the data output from both Google and Facebook's export feature. I reverse engineered a generic data model that could fit both outputs, with possibilities for more generalization. I utilized beautiful soup to parse the Google output, and python's JSON module for the Facebook output.
What's next for Interpersonal Informatics Aggregator
-Adding more platforms -Publish a Python package -Expanding the database to include additional message content such as photos or files shared, and handle group messages better -Interpersonal analytics tools -Grass-roots community aggregation