Fake-or-real-news-classification

Inspiration

Relating to goal 16 of the 17 UN SDGs. The recent happenings in terms of war crisis in different parts of the world is the main inspiration for this project. Most notable is the war between Hamas and Israel where a lot of false information was being fed to the masses by different parties and stakeholders in the war to push a particular narrative. On twitter (now X ) for example, different pieces of information was being fed to us which clouded the true narrative of the war until twitter provided means for users to add context to content of news information being circulated and it turned out a large proportion were fake news. That is what inspired this project to try and create an avenue that solves goal 16 which aims to promote peaceful, inclusive societies for sustainable development, provide access to justice for all, and build effective, accountable, and inclusive institutions at all levels.

How this project contribute to goal 16

Promoting peaceful societies : False information can intensify tensions and fuel social upheaval, and fake news can help it spread. Conflict risk is decreased and correct information is promoted by identifying and lessening the effects of fake news.
Inclusive Societies: Some people or groups may become marginalized as a result of false information. Ensuring that accurate news is disseminated promotes diversity by giving everyone access to trustworthy and equitable information.
Access to Justice: Misinformation has the potential to hurt people and communities by resulting in unfair accusations. Accurate information is provided by identifying and combating fake news, which promotes a just and fair society
Effective, Accountable, and Inclusive Institutions: Creating reliable and responsible information verification systems is a necessary step in developing solutions for fast fake news identification. Using technology to thwart false information helps create organizations that are more resilient and welcoming.

What it does

The AI model classifies the content of the news into real or fake using knowledge learned from previous examples of both real and fake news information

How we built it

Using Python and TensorFlow, we developed a recurrent neural network (RNN) comprising a single LSTM layer along with vector embeddings to convert text into a numerical format for computational purposes. The dataset, sourced from Kaggle, underwent initial preprocessing to eliminate insignificant words, commonly referred to as stopwords. Subsequently, the text data was transformed into numerical representations using a text vectorizer. This vectorized text was then fed into the RNN module, which underwent training for a few epochs before the model was saved.

For deployment, we utilized Streamlit to create a user interface (UI) that prompts users to input the content of the news and specify the news category. The input data undergoes preprocessing, and the model performs inference on the new input, providing predictions based on the trained RNN.

Challenges we ran into

'Two major challenges were the dataset and the model itself. Every machine learning model is impacted by the data on which it was trained and the more the data, the better the model. Therefore it was a challenge to ensure that the data was as much as possible and this was solved by the kaggle dataset provided. Cleaning the data is also a critical step in the creation of the model and we had to ensure it was as thorough as possible. The challenge with the model was in terms of performance. At first traditional models like DecisionTreeclassifiers and ensemble based models with techniques like bag of words or Tfidf was utilized but the performance was not too great. Then the final decision was to switch to neural nets since the dataset was large enough and after going through the learning phase, we eventually produced a model with a laudable accuracy score of above 80%, and also confident predictions based on the confusion metrics

Accomplishments that we're proud of

The accuracy and performance of the model of the dataset
The generalization of the model to news outlets similar to source of data (US News outets)

What we learned

There are a lot of intricacies involved in the use of AI models i natural language processing and any solution must be fully vetted before it is released for public use. A model that is intended to be used in scenarios as this where the input variable is not all that controlled will require utmost precision in developing. During the testing of the model on news information from news sites, the ethical dilemma that arises when the model makes a wrong prediction also surfaces and it shows that in using AI solutions, there needs to be thorough testing and evaluation done.

What's next for Fake-or-real-news-classification

There are a plethora of things that still needs to be done

Curating/Sourcing more data for more generalization and increased performance
Using techniques for explaining AI inferences to give tangible reasons why the news was labelled the way it was
Deploying a more user-friendly solution like a chrome extension

Code Repo

https://github.com/oadeniran/news-classifier

Built With

python
streamlit
tensorflow

Updates

Adeniran Owolabi started this project — Dec 30, 2023 08:50 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.