A lot of attention has recently been paid to how much data large tech firms have about most of their users. We see it in the numerous national hearings Mark Zuckerberg is summoned to attend. We see it in the news media discussing interference in the recent elections by foreign powers. We saw it in the utter disaster that was Equifax and it's data breach. We see it in the falling approval ratings of tech companies. Only 22% of Americans trust Facebook to safeguard their information.

If we were countries, we would have intelligence firms working to catalog and understand all the data being collected to proactively understand how it might be used to harm us. The data would also be assessed for any learning potential and insights. Building the technology to provide both those services is what this is about.

What it does

Counter Social Insight has two core goals: 1) Help people understand what data companies such as Facebook have on them and the implications of them having that data.

2) Leverage that data to provide valuable insights to the user. Facebook is something many people have had for 10 to 15 years and it is something they have used daily throughout that time.

Essentially, we seek to be a private person's NSA with a specialty in signal intelligence.

How I built it

I chose not to build it on a genuine web server because of privacy concerns. I am using my real Facebook information download and it contains many private conversations between myself and close friends. However, it is otherwise functional and runs on a local installation of the Apache web server.

The site was built in PHP and MYSQL, although Python may be a better choice at some point. PHP was selected because SQL is a superb query language and that allows a lot of questions to be asked of the data in less time, which is essential in a hackathon. However, that may not be the best choice to allow for a robust plugin ecosystem down the road.

Challenges I ran into

1.6 GB of data is exceptionally unwieldy and thus most of it was not used. In addition, parsers needed to be written for all the various formats, so most of the data has yet to be touched.

Deciding the scope of the project was also difficult and in the end I tried to do a bit too much, as I did not get to about half of the various features. I also attempted a solo hack for this project and it is too large for that to effectively work.

+ 1 more
Share this project: