Determining what is and isn't fake news is inherently an unsupervised learning task; an algorithm can't decide what is and isn't fake news to a reader susceptible to bias. Revealing the bias of articles, sources and their narratives can reduce this susceptibility.
What it does
The goal is to be able to take a collection of news articles and visualize the similarity in their narratives through a visual, hierarchical, clustering.
How we built it
We used Python, News API, Docker, Postgres, Kafka, SQLAlchemy and Zookeeper.
Challenges we ran into
We struggled with deciding on a project, leading to a shorter period of time that we were able to program. For some of us, it was the first Hackathon we'd ever been to and we weren't familiar with a lot of the programs the other members of our team knew how to use already.
Also a team member didn't arrive until the second day, unfortunately.
Accomplishments that we're proud of
We successfully pushed news article stubs into the kafka topic after some grueling devops.
What's next for Bias.io
We want to load data sources from the database model we created in our biasio-collect script. We'd also like to then subscribe to the kafka topic containing news article stubs and fetch the full article using the newspaper3k library. After, we want to augment the article with nlp based features, using spacy.io (including entity disambiguation, subjectivity analysis, and sentiment analysis) and finally, create a visualization showing the naturally formed hierarchical clusters of news articles and sources; hopefully showing a clear relationship between the assignment of articles in narrative clusters, to the political stance/bias of the news source. Bias.io will not end in yalehack.