Many students from lower-income backgrounds have trouble discerning what is, and what isn't fake news.
What it does
Uses web scraping to find articles relating to the article in question. Once this is done, we summarize and compare the article in question with the related articles using natural language processing. We can take this similarity score, multiply it by a normalized credibility score, and divide it by the number of credit scores that aren't zero. This gives us a relative credibility score for the article in question.
How we built it
We used a number of different libraries from nltk, to aiohttp. We built our own web crawler to find news articles from Google's News page and visit each to grab their contents. Along with this, we implemented natural language processing algorithms for determining the summarization and similarity of the text of these articles.
Challenges we ran into
Our main issue was just trying to learn and implement new techniques in such a short period of time. We were pressed for time, particularly with the Natural Language processing, to learn the processes necessary to start. However, by the end of the hackathon, we had successfully built a scoring mechanism for articles with underlying NLP.
Accomplishments that we're proud of
- Building a stunningly fast web crawler, especially for the ever-vigilant Google search engine, which is known for preventing bots from using its service
- Learning the key points of Natural Language Processing in such order
- Creating a method for summarizing an arbitrary article.
What we learned
We learned new techniques in NLP and gained experience creating an asynchronous web-crawler.
What's next for fake-news-detector
We would like to continue to refine our algorithm and make our code more modular. In particular, we would like to add support for other search engines, and learn more about NLP techniques to improve our credibility score's accuracy.