Social media has taken an extremely large role in all of our lives, but there are many issue with it, mainly with the way that we consume content. Infinite scrolling, for one, makes it very frictionless for us to keep browsing, and so we end up staying longer on a site even if we aren't getting much benefit anymore. Other sites like facebook will constantly give you different content if you refresh, keeping you reading. These sites also end up becoming an echo chamber either through automatic content recommendations, or through the people that you follow, you only really hear opinions from those that you agree with.
In addition, with misinformation at an all time high, we need to encourage others to research into topics that they're unfamiliar with or want to learn more about. Browser serves as an alternative content delivery platform that seeks to solve these issues.
What it does
You can enter a topic that you're interested in, or choose for a list of trending topics. In this case, the source for posts is Twitter. As opposed to the traditional infinite scrolling of other social media platforms, Browsr displays posts in pages, which discourages long periods of browsing. In addition, posts for topics are cached, so refreshing will give you the same set of posts (though the website does randomize the order). These posts may then be updated periodically.
To encourage a consumption of a wide range of opinions, posts are labeled with the polarity of each post, and the website curates posts to cover a wide range of opinions in a wide range of sentiments. The level polarity is shown clearly to the user (but which side it's on is hidden) through a red glow, which allows the user to seek out more or less extreme opinions, but prevents the user from picking sides. But, polar comments aren't always valuable to conversation, so Browsr automatically filters out toxic comments.
In addition, to encourage researching, the website makes it as easy as possible by allowing the user to simply select a keyword that they'd like to learn more about and press control+q to add it to their list. They can also press the "plus" button to use the machine learning algorithm to automatically extract the interesting keywords out of a post. They then can enter how much time they have and be presented with a curated list of articles that match their interests. The website extracts the most valuable information through both traditional heuristics like article length and only using major news sources and machine learning based algorithms like subjective language detection and automatic text summarization.
How I built it
The website is hosted with Python and Flask. Redis was used to task queuing. Tweepy is used to interface with the twitter API, and much of the natural language processing is done through the TextBlob library. The toxic comment classification is done through a deep learning model called a Bidirectional LSTM. The LSTM was used as it has been shown to be powerful in processing time series data. A bidirectional LSTM processes the text in both a left to right and right to left fashion, giving it a wider range of context. A major part of this deep learning model are the text embeddings. A method is needed to convert the words, which are sparse categorical variables, to dense, less high dimensional vectors. Embeddings train a mapping from the categorical variables to dense vectors. Pretrained embeddings are embeddings that have already been trained on another task, so they already have meaning. The embeddings used in this project is a concatenation of the fasttext crawl vectors and Stanford's GloVe embeddings. The model was trained on the Jigsaw toxic comment dataset, which contains ~200000 comments.
Challenges I ran into
Building a fluid interface that incorporated all of the backend processing was difficult, so I needed to spend a significant portion of time developing that. The twitter API also has significant limits, especially in terms of rate-limiting.
Accomplishments that I'm proud of
The polarity curation worked much better than I expected. The interface that integrates it also works much more smoothly than I expected.
What I learned
I strengthened my skills in web development, especially in working with the backend processing.
What's next for Browsr
I want to fine tune some of the natural language processing algorithms used in this website.