TruthLens

🌍 Inspiration

Over 52% of U.S. adults get news from social media, where misinformation spreads rapidly (Pew Research). Political content today is shaped by polarization, AI-generated misinformation, and narrative framing.

Most tools try to declare what’s β€œtrue” or β€œfalse.”
TruthLens reveals narrative structure instead.

We quantify:

  • Cross-source semantic agreement
  • Narrative divergence
  • Emotional intensity
  • Source diversity

🧠 What It Does

Paste a headline or URL and TruthLens:

  • Scrapes parallel coverage across publishers
  • Generates 768-dimensional semantic embeddings
  • Clusters articles using Elasticsearch kNN
  • Computes a transparent confidence score
  • Visualizes narrative groups and bias distribution

The dashboard includes:

- **Confidence Score (0–100)**
- **Narrative Divergence**
- **Bias Spectrum**
- **Emotional Framing**
- **Source Diversity**

βš™οΈ How We Built It

1️⃣ Bright Data

  • SERP scraping
  • Article HTML extraction
  • Parallel coverage retrieval

2️⃣ Jina Embeddings (v3)

  • 768-dimensional vectors
  • Semantic representation of full article text

3️⃣ Elasticsearch (Elastic Cloud)

  • dense_vector storage
  • kNN vector search
  • Similarity scoring
  • Greedy clustering
  • Metric aggregation

4️⃣ Deterministic Scoring

confidence =
(0.6 Γ— agreement) +
(0.2 Γ— diversity) +
(0.1 Γ— (1 βˆ’ emotional_intensity)) +
(0.1 Γ— domain_age_factor)

Fully transparent. No opaque LLM reasoning.

🚧 Challenges

  • Choosing the right vector similarity threshold
  • Handling inconsistent article HTML structures
  • Designing a neutral, defensible scoring formula
  • Avoiding political labeling while preserving usefulness

πŸ† Accomplishments

  • Built a full ingest β†’ embed β†’ index β†’ cluster β†’ score β†’ visualize pipeline
  • Deep integration with Elasticsearch vector search
  • Real-time parallel media coverage analysis
  • Transparent, reproducible scoring system
  • Production-ready dashboard UI

πŸ“š What We Learned

  • Vector search is powerful for narrative detection
  • Agreement across independent sources is measurable
  • Emotional intensity often correlates with divergence
  • Deterministic AI builds trust

πŸš€ What’s Next

  • 🌐 World heatmap of coverage origins
  • πŸ“ˆ Historical divergence tracking
  • πŸ”” Narrative shift detection
  • 🧩 Chrome extension overlay

πŸ”§ Full Technology List

Frontend

  • Next.js (App Router)
  • React
  • Tailwind CSS
  • Lucide Icons

Backend

  • Next.js API Routes
  • TypeScript

Search & Storage

  • Elasticsearch (Elastic Cloud)
  • dense_vector
  • kNN search
  • Vector similarity scoring
  • Index mappings
  • Google Cloud
  • Aggregations

Embeddings

  • Jina Embeddings v3 (768 dimensions)

Data Acquisition

  • Bright Data API
  • SERP API
  • Web Scraper API

Analysis & Scoring

  • Custom sentiment analysis logic
  • Greedy threshold clustering algorithm
  • Deterministic confidence scoring formula
  • Elastic Cloud

Built With

  • bright-data-api
  • bright-data-serp-api
  • bright-data-web-scraper-api
  • custom-sentiment-analysis-logic
  • deterministic-confidence-scoring-formula
  • elasticsearch-(elastic-cloud)
  • elasticsearch-aggregations
  • elasticsearch-dense-vector
  • elasticsearch-index-mappings
  • elasticsearch-knn-vector-search
  • environment-based-secret-management-(.env)
  • greedy-threshold-clustering-algorithm
  • jina-embeddings-v3-(768-dimensional-embeddings)
  • lucide-icons
  • mcp
  • next.js
  • next.js-api-routes
  • node.js
  • npm
  • react
  • tailwind-css
  • typescript
Share this project:

Updates