It is hard for the average person to know what news is fake or real. Often while consuming information on the web, we forget to question or verify the legitimacy of the news we read. This is the inspiration behind our project.
What it does
Our chrome extension runs on any news article and tells the user whether it is probably fake or legitimate. It also allows the user to flag articles green or red to express what they think of the article's truth or falsity. It has a sign-in feature to record the user's email, store their search queries and opinions. The program uses machine learning and a known dataset for legitimate news known as the LIAR dataset, a research paper on which was written by UCSB Prof. William Wang.
How we built it
We built the extension using the following:
- AutoML Tables (Google Cloud Platform) - Imported the LIAR dataset, trained our machine learning model and implemented this model.
- Firebase - Used Firebase's tools for user authentication, email integration, etc.
- Firestore - For storing the flag data and constraining it to one flag per article per user.
- Web scraping - We needed to scrape the articles for full sentences only for our model to work correctly. For this, we used an API known as Diffbot that accomplished the seemingly insurmountable task of picking complete sentences from a webpage.
- HTML parsing & APIs - We worked with several APIs and parsed HTML files to have all the nitty-gritties work.
We divided the work equally among ourselves, while also collaborating with each other when challenges came up. We seeked guidance from mentors at the event too.
Challenges we ran into
1. Using Google Cloud Platform and AutoML (12+ hours) - (i) Documentation was limited and there were too many options and features to understand. (ii) When an issue came up, there weren't a lot of resources to get help which required us to spend a lot of time scratching our head and trying different things for a solution. (iii) Training the ML model was time consuming. (iv) Figuring out which of the multiple ML tools to use from GCP was also a challenge. Spent hours setting the software up, trying a model, etc. only to abandon it all in the end for AutoML tables. (v) Integration with other tools (more on this later).
2. Firebase authentication (6+ hours) - As there were too many IDs and keys in Firebase to figure out and use, we ran into several errors. We consulted the Firebase team for several hours. Ultimately, we found that the issue was related the Chrome content security policy.
3. Firestore real-time database (5+ hours) - CRUD operations (Create, Read, Update, Delete) required us to understand many varied objects within Firestore and their usage. It took time to study how to query the database and implement the objects accurately with several constraints (e.g. a user can only flag an article once).
Accomplishments that we're proud of
For three out of four of the team members, this was our first hackathon. We are proud that we completed this application and implemented several advanced features in it in such a short time. We ran into several challenges but did not give up on implementing even complex features or tools such as training machine learning models using GCP (AutoML). We are also glad that we were able to integrate things across so many platforms successfully.
What we learned
We learned how to ask the right questions, communicate effectively and work as a team. We got a taste of what it's like to be a software engineer through facing challenges and consulting mentors. On the technical side, we learned a lot of new technologies, concepts and programming languages.
What's next for fake-news-detector-plugin
- We can improve our dataset and machine learning model by training it more extensively.
- We can utilize Firebase's integration with Mailchimp to send periodic emails to users to give them a statistical analysis of their searches.
- We can make use of the data collected through "agree/disagree" flags to invest in research on invidual perception and detect bias.
- We can potentially create a space for users that are browsing the same article to have a live chat.