TruthCheck

HomePage
AboutPage
OpenSourcePage
TermsAndConditionPage
PrivacyPolicyPage
ContactPage
FakeNewsDetection
AuthenticNews

Inspiration

The rise of misinformation and fake news has become a global challenge, impacting societies, democracies, and individual lives. Inspired by the need for trustworthy information and the power of open-source AI, we set out to build TruthCheck—a transparent, reproducible, and community-driven solution for fake news detection.

What We Learned

Natural Language Processing (NLP): We deepened our understanding of transformer models, especially BERT, and how transfer learning can be leveraged for text classification tasks.
Open Source Best Practices: We learned the importance of clear documentation, reproducibility, and proper crediting of datasets and code.
Deployment: We gained hands-on experience deploying machine learning models as interactive web apps using Streamlit and Hugging Face Spaces.
Collaboration: Working as a team, we improved our skills in version control, code review, and collaborative problem-solving.

How We Built It

Data: We combined two open datasets: the Kaggle Fake and Real News Dataset and the LIAR dataset, preprocessing and merging them for robust training.
Model: We used the pre-trained BERT-base-uncased model as our foundation, adding a BiLSTM and attention layer for enhanced context and interpretability. All layers were fine-tuned on our dataset.
Training: The model was trained using PyTorch, with careful attention to validation, early stopping, and metric tracking (accuracy, precision, recall, F1).
Web App: We built an interactive Streamlit app, allowing users to paste news articles and instantly receive a fake/real prediction, with a clean and user-friendly UI.
Open Source: All code, training scripts, and deployment instructions are open and available in our GitHub repository.

Challenges We Faced

Data Quality & Balance: Merging datasets with different formats and label distributions required careful preprocessing and validation.
Model Size & Deployment: Handling large model files and ensuring smooth deployment on resource-limited platforms like Streamlit Cloud and Hugging Face Spaces.
Dependency Management: Ensuring all dependencies were compatible and could be installed in cloud environments without compilation errors.
Reproducibility: Making sure every step—from data download to model inference—could be reproduced by anyone, anywhere.

Math & Model

We leveraged the power of transfer learning, where a pre-trained model ( f_{\text{BERT}}(x; \theta) ) is fine-tuned on our dataset:

[ \text{Prediction} = \text{Softmax}(\text{Attention}(\text{BiLSTM}(f_{\text{BERT}}(x; \theta)))) ]

where ( x ) is the input text and ( \theta ) are the BERT parameters, all of which are updated during training.

TruthCheck is our contribution to the fight against misinformation—open, transparent, and built for the community.

Built With

fine-tuning
huggingface
kaggle
python
pytorch
streamlit

Submitted to

Build Real ML Web Apps: No Wrappers, Just Real Models

Created by

I led the end-to-end backend development of TruthCheck, an open-source fake news detection platform. My contributions include:

- **Data Engineering:** Downloaded, cleaned, and merged the Kaggle and LIAR datasets, ensuring high-quality, balanced data for training.
- **Model Development:** Designed and implemented the hybrid model architecture, combining a fully fine-tuned BERT-base-uncased with BiLSTM and attention layers.
- **Training & Evaluation:** Wrote and optimized the training pipeline, including validation, early stopping, and metric tracking (accuracy, precision, recall, F1).
- **Web Application:** Built an interactive Streamlit app for real-time fake news detection, with a user-friendly UI and multi-page navigation.
- **Deployment:** Set up the app for deployment on Hugging Face Spaces and provided clear instructions for reproducibility.
- **Open Source Advocacy:** Ensured all code is original or properly credited, and that the project is fully open and reproducible in line with hackathon requirements.

Muhammad Adnan Tariq
I actively develop and deploy Streamlit applications, ensuring they're not just functional but also intuitive for users. My role extends to crafting comprehensive and user-friendly documentation – including guides, tutorials, and API references – which is crucial for maximizing application adoption and user success. This complete and clear documentation directly enhances user experience, making complex features concrete and easily understandable. By being concise and correct in both application design and documentation, I contribute significantly to delivering coherent and high-quality solutions.

Muhammad Khaqan Nasir
Certified ASP.NET Core Developer, currently study in the 7th semester of BSCS.