Inspiration In today’s fast-paced digital era, information spreads in the blink of an eye—but misinformation often spreads even faster. We noticed that existing fact-checkers either rely on slow, manual review processes or give a blunt, binary "True" or "False" verdict. But news is rarely black and white; it exists on a spectrum of reliability. We were inspired to build a tool that treats misinformation detection as a matter of confidence rather than absolute truth. By giving users a nuanced probability score instead of a rigid label, we want to encourage critical thinking and empower readers to make their own informed media choices. What it does Fake News Probability Checker acts as a digital lie detector for written media. Users simply paste an article's URL or a block of text into our web interface. Within seconds, the application analyzes the text's linguistic patterns, emotional tone, and structural formatting to output a Probability of Deception Score (e.g., "There is an 87% chance this article is highly biased or fabricated"). It also provides a breakdown of the text, highlighting specific "red flags" like excessive use of sensationalist language, capitalization anomalies, or poor source attribution. How we built it We took a full-stack approach, bridging advanced Natural Language Processing (NLP) with an intuitive web interface. The Dataset & Preprocessing: We aggregated data from reputable sets (like the ISOT Fake News and LIAR datasets). We cleaned the text using standard NLP pipelines: tokenization, removing stop words, stemming, and lemmatization using NLTK/SpaCy. The Machine Learning Model: We utilized Word Embeddings (like Word2Vec) and TF-IDF to translate text into numerical vectors. To calculate the percentage, we deployed a probabilistic classification model. For instance, using Logistic Regression, the probability P P that an article y y is fake (where
y
1 y=1 ) given its feature vector x x is calculated using the sigmoid function: P (
y
1 ∣ x
)
1 1 + e − ( w T x + b ) P(y=1∣x)= 1+e −(w T x+b)
1
(Where w w represents the learned weights mapped to linguistic features, and b b is the bias). The Architecture: We exposed our trained model via a robust FastAPI / Flask backend. The frontend was built using React to ensure a snappy, asynchronous user experience, allowing for real-time text analysis without page reloads. Challenges we ran into The Satire Trap: Teaching a machine to distinguish between malicious fake news and intentional satire (like The Onion) was incredibly difficult. Algorithms struggle with sarcasm, so we had to carefully expand our training data to include a "satire" subset. Overfitting to Topics: Initially, our model started memorizing specific political names and current events rather than learning the actual patterns of deceptive writing. We had to heavily regularize our model and apply strict cross-validation. Web Scraping Latency: Building a reliable web scraper that could bypass paywalls or cookie pop-ups to extract article text from a URL quickly without timing out the user's request required significant optimization. Accomplishments that we're proud of Successfully integrating a complex Python machine-learning pipeline into a seamless, user-friendly web application. Achieving a robust accuracy rate (over 85% on our testing data) while minimizing false positives. Building a system that actually outputs a continuous mathematical probability rather than a hardcoded, binary classification, which involved deep-diving into the math behind neural networks and logistic regression. What we learned Advanced NLP Techniques: We leveled up our understanding of how to transform unstructured text into mathematical representations that a machine can understand. Model Deployment: We learned how to containerize and deploy ML models so they can run inference quickly in a production environment. AI Ethics: We had deep discussions about the responsibility of labeling content as "fake." We learned that transparency—showing the user why the model made its decision—is just as important as the model's accuracy. What's next for Fake News Probability Checker We have big plans for scaling this tool: Browser Extension: Building a Chrome/Firefox extension that automatically calculates a probability score in the background as you scroll through Twitter/X, Reddit, or Facebook. Explainable AI (XAI): Implementing models like LIME or SHAP to highlight the exact words and sentences in the article that contributed most to the fake news score. Multi-lingual Support: Training the model on non-English datasets to help combat the spread of misinformation globally.
Log in or sign up for Devpost to join the conversation.