EchoChamber is a content recommendation algorithm that encourages diverse thought by suggesting content that deviates from the user's preferences.
The inspiration for this project came from the recent election cycle. Many analysts attributed the divisiveness of the masses to the echo chamber effect. Essentially social media have honed their recommendation systems to an extent where a user's experience is entirely influenced by their preferences. This means that users end up seeing what they want to see. This leads to lesser cross communication between netizens and more vitriol.
My aim with this project was to create a recommendation system where the user is encouraged to explore topics that are measurably different from user history. The hope is that users will be able to get a perspective on the content they consume online which will lead to better social/political discourse.
What it does
EchoChamber is based on imgur's API as a demonstration of the algorithm. First, it generates a space where imgur posts (or any textual content) can be represented with maximum distinction. Then it follows a user's browsing history across multiple posts and suggests topics that deviate from their past preferences. It proceeds as follows:
- Parse textual content (imgur post comments) for training (using the 'political' tag),
- Preprocess data (filter out common words etc.),
- Use Principal Component Analysis to reduce dimensionality of data points.
- Generate axes using Principal Components that best distinguish posts,
- Download/Parse/Preprocess random imgur content for testing,
- Project testing data/posts to axes previously calculated and obtain coordinates,
- Generate a Markov chain where the next recommendation depends on maximizing a distance metric from previous choices (i.e. obtain contrasting content).
- For each choice, update Markov chain and provide recommendation.
How I built it
EchoChamber was built using python on top of imgurpca. imgurpca is a modular and extensible machine learning library for imgur.com that I am developing. For VandyHacks III, I forked the repository, made modifications to core libraries, and wrote
Challenges I ran into
I ran into multiple developmental and theoretical challenges:
- I had to upgrade imgurpca library from python2 to python3 which required significant rewriting of code.
- Initially I approached this problem using a Feed Forward Neural Network which I wrote in its entirety (link below). However given bandwidth restrictions from the imgur API, I did not have sufficient data to train the network. So I switched to unsupervised learning.
- I had several options on the operation of the Markov chain that provides recommendations. EchoChamber works with arbitrary dimensions in the data. My first distance metric was the Cartesian distance between the hyperplane defined by a user's previous choices/browsing history and the rest of the projected points. A hyperplane is a line in 2 dimensions and a plane in 3 dimensions with known formulae for distance. However it becomes computationally expensive in higher dimensions. My second choice was to get Cartesian distances of points from the centroid generated from weighted past choices of posts by the user.
Accomplishments that I'm proud of
Over the weekend I read up on neural networks and wrote my own network from scratch for the first time.
What I learned
I learned about the utility of various machine learning methods while approaching this problem. From a development standpoint I learned about the importance of keeping compatibility/performance in mind.
What's next for Echo Chamber
The algorithm can be further improved through natural language processing to get a better idea of the user's preferences. I am interested to see how it would perform if recommendations are based on classification methods (like logistic regression) for similar/different content. In addition it can be ported to other APIs besides imgur.