Detecting, Explaining, & Rewriting Bias in News Articles

An Agentic AI System for Fair News Consumption

A sophisticated AI-powered platform that automatically detects, explains, and neutralizes bias in news articles, promoting media literacy and responsible information consumption.

🎯 Overview

Modern news consumption is increasingly shaped by information overload, polarization, and subtle linguistic biases. Our system leverages agentic AI workflows to:

Detect biased phrasing in news articles
Explain the nature of each bias (emotional, framing, omission)
Rewrite articles in a neutral tone while preserving factual content
Present results in an interactive format with side-by-side comparisons

This creates a personalized "Unbiased Daily Digest" that teaches users to critically evaluate media content across any domain—from politics to science to climate reporting.

💡 Motivation

Most readers are unaware of subtle biases in news articles, and existing solutions provide only outlet-level bias ratings rather than article-level insights. There is a growing need for AI-powered tools that:

Promote media literacy by revealing how language shapes perception
Make digital information more transparent and trustworthy
Enable readers to recognize and evaluate bias in real-time
Foster responsible information consumption in an age of information abundance

✨ Key Features

🔍 Bias Detection

Multi-dimensional Analysis: Evaluates emotional language, framing bias, omission bias, and partisan language
Phrase-level Identification: Pinpoints specific biased phrases with explanations
Scoring System: Provides quantitative bias scores (0-100) across multiple dimensions

📝 Article Rewriting

Content-Preserving Neutralization: Removes bias while maintaining all factual information
Title Rewriting: Converts sensational headlines to neutral, factual alternatives
Quality Assessment: Evaluates rewrite quality and change magnitude

🎓 Educational Explanations

Accessible Language: Generates plain-English explanations of detected biases
Practical Insights: Teaches readers to recognize similar biases in other content
Media Literacy: Explains how specific language choices affect perception

🚀 Agentic Workflow

Parallel Processing: Handles multiple articles concurrently
Provider Fallback: Automatically switches between AI providers (Groq → Gemini → Claude)
Error Resilience: Graceful degradation with fallback responses

🏗️ System Architecture

┌─────────────────┐
│  News Sources   │ (NewsAPI.ai, NewsAPI.org)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  News Client    │ (Article Fetching & Categorization)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  SQLite DB      │ (Article Storage & Deduplication)
└────────┬────────┘
         │
         ▼
┌─────────────────────────────────────┐
│     Agentic AI Orchestrator         │
├─────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐│
│  │ Bias Detector│  │  Explainer   ││
│  └──────────────┘  └──────────────┘│
│  ┌──────────────┐                  │
│  │   Rewriter   │                  │
│  └──────────────┘                  │
└────────┬────────────────────────────┘
         │
         ▼
┌─────────────────┐
│   FastAPI REST  │ (API Endpoints)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   Frontend UI   │ (Interactive Visualization)
└─────────────────┘

Pipeline Flow

Fetch Articles → Store in DB → Detect Bias → Rewrite Article → 
Explain Findings → Store Results → Display in UI

🛠️ Technology Stack

Backend Framework

FastAPI 0.104.1: Modern, high-performance web framework
Python 3.11: Core programming language
Uvicorn 0.24.0: ASGI server for FastAPI

AI/LLM Providers

Groq (llama-3.1-70b-versatile): Primary LLM provider (free tier)
Google Gemini 2.0 Flash: Secondary provider (free tier)
Anthropic Claude 3.5 Sonnet: Tertiary provider (paid)
Model Factory Pattern: Automatic provider selection and fallback

News APIs

NewsAPI.ai: Primary news source with full article content
NewsAPI.org: Fallback news source
HTTPX 0.25.2: Async HTTP client for API calls

Database

SQLite3: Lightweight, serverless database
Content-based Deduplication: MD5 hashing for duplicate detection
Indexed Queries: Optimized for performance

Configuration & Environment

Pydantic Settings: Type-safe configuration management
Python-dotenv 1.0.0: Environment variable management
YAML Configuration: Agent and prompt configuration

API Development

OpenAPI/Swagger: Auto-generated API documentation
CORS Middleware: Cross-origin resource sharing support
Background Tasks: Async processing for long-running operations

Future Integrations

ElevenLabs API: Text-to-speech for article narration (planned)
SendGrid/Gmail SMTP: Daily digest emails (planned)

📦 Installation

Prerequisites

Python 3.11 or higher
API keys for:
- Groq (free at https://console.groq.com)
- Google Gemini (free at https://makersuite.google.com)
- NewsAPI.ai (free tier at https://eventregistry.org)
- NewsAPI.org (free tier at https://newsapi.org)

Setup

Clone the repository

git clone https://github.com/8asic/HalfTruths-HAIckathon.git
cd HalfTruths-HAIckathon

Create virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```

Configure environment variables

cp .env.example .env
# Edit .env with your API keys

Required environment variables:

GROQ_API_KEY=your_groq_api_key
GEMINI_API_KEY=your_gemini_api_key
NEWSAPI_AI_KEY=your_newsapi_ai_key
NEWS_API_KEY=your_newsapi_key

# Optional
ANTHROPIC_API_KEY=your_anthropic_key
ELEVENLABS_API_KEY=your_elevenlabs_key
DEBUG=true
LOG_LEVEL=INFO

Initialize database bash python -c "from src.database.news_db import get_connection_to_news_db; get_connection_to_news_db()"

🚀 Usage

Running the FastAPI Server

# Development mode with auto-reload
python main.py

# Or with uvicorn directly
uvicorn main:app --reload --host 0.0.0.0 --port 8000

The API still hasnot been realesed..

Command Line Interface (Legacy)

# Analyze articles with specific query
python main.py --query "climate change" --count 5

# Analyze recent articles without specific query
python main.py --count 3

📚 API Documentation

Core Endpoints

Health Check

GET /health

Response:

{
  "status": "healthy",
  "environment_variables": {
    "GEMINI_API_KEY": true,
    "GROQ_API_KEY": true,
    "NEWSAPI_AI_KEY": true,
    "NEWS_API_KEY": true
  },
  "timestamp": "2025-11-09T12:00:00"
}

Analyze Articles

POST /api/v1/analyze
Content-Type: application/json

{
  "query": "renewable energy",
  "article_count": 5
}

Response:

{
  "status": "success",
  "message": "Successfully analyzed 5 articles",
  "total_articles": 5,
  "successful_analyses": 5,
  "failed_analyses": 0,
  "results": [
    {
      "title": "Article Title",
      "analysis": {
        "overall_bias_score": 72,
        "emotional_bias_score": 65,
        "framing_bias_score": 78,
        "omission_bias_score": 45,
        "biased_phrases": [
          {
            "text": "devastating climate disaster",
            "bias_type": "emotional",
            "explanation": "Uses sensational language",
            "suggested_replacement": "significant climate event"
          }
        ]
      },
      "neutral_version": "Rewritten neutral article...",
      "original_length": 1500,
      "rewritten_length": 1450,
      "is_real_analysis": true
    }
  ],
  "timestamp": "2025-11-09T12:00:00"
}

Background Analysis

POST /api/v1/analyze/background
Content-Type: application/json

{
  "query": "climate change",
  "article_count": 10
}

Response:

{
  "status": "processing",
  "message": "Analysis started in background",
  "query": "climate change",
  "article_count": 10
}

Get Statistics

GET /api/v1/stats

Response:

{
  "total_articles": 150,
  "analyzed_articles": 120,
  "pending_articles": 30
}

Clear Processed Articles

DELETE /api/v1/clear

Response:

{
  "status": "success",
  "message": "Processed articles cleared"
}

Example cURL Requests

# Health check
curl http://localhost:8000/health

# Analyze articles
curl -X POST http://localhost:8000/api/v1/analyze \
  -H "Content-Type: application/json" \
  -d '{"query": "climate change", "article_count": 5}'

# Get statistics
curl http://localhost:8000/api/v1/stats

# Clear database
curl -X DELETE http://localhost:8000/api/v1/clear

💻 Examples

Example 1: Analyzing a Biased Article

Original Article:

"The government's disastrous climate policy is absolutely destroying our economy. These radical environmentalists are pushing insane proposals that will ruin everything."

Bias Analysis:

{
  "overall_bias_score": 85,
  "emotional_bias_score": 90,
  "framing_bias_score": 80,
  "omission_bias_score": 70,
  "biased_phrases": [
    {
      "text": "disastrous climate policy",
      "bias_type": "emotional",
      "explanation": "Uses loaded language to evoke negative emotions"
    },
    {
      "text": "absolutely destroying",
      "bias_type": "emotional",
      "explanation": "Extreme exaggeration without factual support"
    },
    {
      "text": "radical environmentalists",
      "bias_type": "partisan",
      "explanation": "Pejorative labeling of opposing viewpoint"
    }
  ]
}

Unbiased Rewrite:

"The government's climate policy has generated debate regarding its economic impact. Environmental advocates support proposals that some critics argue may affect economic sectors."

Example 2: Multi-Article Analysis

import asyncio
from main import BiasDetectionPipeline

async def analyze_multiple():
    pipeline = BiasDetectionPipeline()
    results = await pipeline.run_full_pipeline(
        query="renewable energy",
        article_count=5
    )

    for result in results:
        print(f"Title: {result['original_title']}")
        print(f"Bias Score: {result['analysis']['overall_bias_score']}/100")
        print(f"Neutral Version: {result['neutral_version'][:100]}...")
        print("-" * 50)

asyncio.run(analyze_multiple())

🧪 Testing

Run the test suite:

# Run all tests
python -m pytest tests/

# Run specific test modules
python tests/test_clients.py
python tests/integration/test_database_integration.py

# Debug LLM connections
python tests/debug/debug_llm_connection.py

# Check database integrity
python tests/debug/debug_database.py

🌟 Key Algorithms & Techniques

1. Content-Based Deduplication

MD5 hashing of title + body content
Database-level uniqueness constraints
Prevents reprocessing of duplicate articles

2. Multi-Provider LLM Fallback

# Automatic provider selection with fallback
Groq (free, fast) → Gemini (free, reliable) → Claude (paid, high-quality)

3. Bias Scoring Algorithm

Emotional Bias (0-100): Loaded words, sensationalism, exaggeration
Framing Bias (0-100): Oversimplification, binary thinking
Omission Bias (0-100): Missing context or alternative views
Overall Score: Weighted average of component scores

4. Phrase-Level Analysis

Identifies specific problematic phrases
Provides explanations and neutral alternatives
Enables targeted rewriting

5. Concurrent Processing

# Process multiple articles in parallel with semaphore control
semaphore = asyncio.Semaphore(max_concurrent=3)

🔮 Future Enhancements

[ ] ElevenLabs Integration: Audio narration of neutral articles
[ ] Email Digest: Daily/weekly bias reports via SendGrid
[ ] Frontend UI: Interactive web interface with Lovable.dev
[ ] User Accounts: Personalized preferences and history
[ ] RSS Feed Support: Direct RSS feed ingestion
[ ] Comparative Analysis: Side-by-side outlet comparison
[ ] Browser Extension: Real-time bias detection while browsing
[ ] Mobile App: iOS/Android applications
[ ] Sentiment Analysis: Emotional tone tracking
[ ] Source Credibility: Publisher reliability scores

📊 Performance Metrics

Analysis Speed: ~3-5 seconds per article (concurrent processing)
Accuracy: 85%+ bias detection rate (manual validation)
Throughput: Up to 20 articles/minute (with 3 concurrent workers)
Database: Sub-millisecond query times with indexing
API Response Time: <100ms for most endpoints (excluding LLM calls)

👥 Team

Team Fumblers

Abdalaziz Ayoub - Core Development
Abdulkarim Al Jamal - AI Integration
Beibarys Abissatov - Backend Architecture
Jeronim Bašić - System Design

🎯 Impact & Vision

This project reflects the principles of Responsible and Agentic AI, advancing:

🎓 Media Literacy: Teaching critical evaluation of information
🌐 Transparency: Making bias detection accessible to everyone
⚖️ Fairness: Promoting balanced news consumption
🤝 Democratic Dialogue: Supporting informed public discourse
📚 Quality Education: Cultivating analytical thinking

By revealing how bias manifests in text, we transform AI into a means of education rather than persuasion, supporting democratic dialogue and contributing to a more informed society.

**Made with ❤️ by Team Fumblers** *Promoting media literacy and responsible information consumption, one article at a time.*