SAGE | Devpost

On the web store (not available to public)

Inspiration

Everyone seems to acknowledge the existence of fake news, but not everyone agrees on what qualifies as fake news. According to Monmouth University, about 52 percent of Americans "felt that online news websites regularly report fake news stories in the United States." We wanted to help Americans alleviate their fears of fake news and provide them with easy access to more reliable sources.

What it does

Sage is a companion that provides a second eye for verifying the validity of a webpage and can deliver further reading on a topic as appropriate.

How we built it

Back-end: We found datasets from Kaggle and University of Victoria, and combined them to form a csv file with over 30,000 training points. For example, news articles labeled as "fake" often came from outlets such as the Onion or outlets known to cite junk science (such as the claim that vaccines cause autism). These data were used as input data for a Long Short-Term Memory network, which was trained on 4 Tesla P100 GPUs on Google Cloud to make training actually possible. Our train/val/test split was 87%/3%/10%, and we played around with different configurations (including an additional convolutional layer, as suggested in this paper, and sizes to optimize performance and accuracy. We achieved 99% accuracy on the train, validation, and test datasets.

When trying to find further reading on the topic of the current webpage, we noticed that Google's search engine could do most of the heavy-lifting for us. Thus, we ran a Google search of the current webpage's URL, scraped the title of the webpage provided by Google, and ran the title through Google's natural language processing to identify the most important words in the title. Then, we did a Google search of those words and scraped the top results that came from another news outlet (that is, a different domain).

Front-end: We used Javascript and the library Reactjs to create the GUI. We used Chrome developer tools to render a react app dynamically onto web pages determined to be erroneous by our backend.

Challenges we ran into

We found out that training the model on GPU and then saving it to a .h5 file made it incompatible for running on CPU, so we had to find a way around it. Plus, we ran into memory issues when training even on the cloud GPU's, and it took a decent amount of work to resolve them. Additionally, the model seemed very prone to overfitting; this is something we hope to keep improving on. We also had to spend a lot of time connecting the front-end to the back-end, particularly because we kept running into version compatibility issues between the environment in which we trained our models and the environment in which we were deploying our models.

Accomplishments that we're proud of

For the intended use of this project, we believe that accessibility and ease of use is one of the most important aspects as people often do not have the time to verify news articles by themselves. Thus, the UI is quite important, and we believe that we have made it friendly and easy to use. We are also really happy with how much we learned by tackling challenging problems to work towards an interesting idea.

What we learned

We learned how to use Google Cloud and Google's natural language processing library. As three of our members were first time hackers, we spent time learning Python and commonly used libraries such as pandas and NumPy. As a whole, we also learned a great deal about natural language processing (and its limitations), deploying deep learning models, front-end development, managing training data, optimizing LSTM networks, dealing with versions, deploying with Google Cloud Shell, and memory management.

What's next for Sage

We want to continue improving our model before later optimizing for Twitter and Facebook, plus implementation to detect possible political bias. We also want to use computer vision to determine if people speaking are being truthful through their facial movements (also verifying the content of their speech), but this is a stretch goal and possibly a bit Orwellian.

Built With

chrome
flask
google-cloud
google-natural-language-processing
keras
pandas
python
react.js
tensorflow

Submitted to

HackPrinceton Fall 2019

Created by

Contributed to NLP / LSTM construction. Created the chrome extension reactjs and created the backend with flask. I handled deployed using Google App engine.

Simon Mahns
I worked on the back-end by collecting data and implementing NLP by training an LSTM in Google Cloud.

Ryan Zhang
Used Google's Natural Language Processing API to generate relevant searches for related topics. I learned how to use Pandas in Python and gained experience in data science and machine learning.

Alex Valtchanov
Computer Science Major at Princeton University, with certificates in Applied and Computational Mathematics & Statistics and Machine Learning
I worked on sanitizing the data, analyzing headlines for important words, and finding related sources on a topic.

Brandon Huynh

Updates

Simon Mahns started this project — Nov 10, 2019 02:27 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.