About Our project — AI-Powered Fact Checker (VeriFact AI)

Repo: https://github.com/A-Justice-League/ai-powered-fact-checker/ Demo (deployed): https://myaipoweredfactchecker.vercel.app/


Inspiration

Misinformation spreads fast — social posts, screenshots, and long articles can contain dozens of factual claims that are difficult for a single person to verify quickly. The idea for VeriFact AI came from wanting a simple, fast tool you can use before you share something: paste text or upload a screenshot and get an explainable verdict with sources. For the hackathon we wanted to demonstrate multimodal verification (text + images), real search grounding (live web results), and clear explainability so the user can trust the verdicts.

Two practical drivers:

  • A need for explainable AI: decisions should show why a claim was judged true/false and which sources back that judgment.
  • Leverage modern multimodal LLMs (Gemini 3) + live web grounding (Google Search) to reduce hallucination and provide citation links.

What We learned

  • Prompt engineering matters. Strict JSON-output prompts (with schema examples) dramatically simplify downstream parsing and reduce errors.
  • Grounding metadata is gold. The groundingMetadata (search queries, groundingChunks, groundingSupports) coming from the Gemini+Google Search tool is the reliable bridge between natural language assertions and verifiable sources.
  • OCR is practical but noisy. OCR (Tesseract / Google Vision) extracts most text from screenshots, but layout and image quality cause character and segmentation errors which propagate to claim extraction.
  • UX polish increases trust. Small features — visible source domains, claim-level explanations, and a credibility gauge — make the product feel trustworthy to users and judges.
  • Performance/quotas are a real constraint. Live searches and LLM calls can be rate-limited; caching common queries is essential for a demo-ready product.

How the project is built (high level)

Tech stack

  • Frontend: React / TypeScript + Tailwind CSS (custom color palette from the logo).
  • Backend: FastAPI (Python) with Uvicorn.
  • LLM: Google Gemini 3 (using the Gemini API, with the google_search tool enabled).
  • OCR: Tesseract (open source) or Google Vision (cloud; higher accuracy).
  • Deployment: Vercel (frontend) + your backend host or serverless functions.

Key implementation details

1) Strong prompt pattern (strict JSON + grounding indices)

We instruct Gemini to return strict JSON describing each claim and to reference grounding chunk indices (so we map indices → urls server-side):

You are a fact-checking assistant. Analyze the following text and output STRICT JSON: an array of objects with keys:
- claim: short claim text
- verdict: "TRUE" | "FALSE" | "UNSURE"
- explanation: 1–2 sentence explanation
- grounding_support_indices: array of groundingChunk indices (for linking to sources)

TEXT:
---
<user text here>
---
Respond with only valid JSON.

When the request is made with the google_search tool enabled, the response includes groundingMetadata.groundingChunks — each chunk contains web.uri and web.title. The grounding_support_indices map to that array.


2) Example API flow (simplified)

Request (POST /analyze-text)

{
  "text": "The Eiffel Tower is 324 meters tall and was opened in 1889."
}

Backend:

  • Calls Gemini with tools=[google_search] and the strict JSON prompt.
  • Receives candidates[0].text (the JSON) and candidates[0].groundingMetadata.
  • Parse JSON, resolve grounding_support_indicesgroundingChunks[i].web.uri.
  • Return structured JSON to frontend (claims + verdicts + source list).

Response (simplified)

{
  "claims": [
    {
      "claim": "The Eiffel Tower is 324 meters tall",
      "verdict": "TRUE",
      "explanation": "Official height including antennas is 324m; UNESCO and official sources confirm.",
      "sources": ["https://www.toureiffel.paris/en", "https://en.wikipedia.org/wiki/Eiffel_Tower"]
    },
    {
      "claim": "It opened in 1889",
      "verdict": "TRUE",
      "explanation": "The tower was completed for the 1889 Exposition Universelle.",
      "sources": ["https://en.wikipedia.org/wiki/Eiffel_Tower"]
    }
  ],
  "credibility_score": 100
}

Credibility scoring (math)

A simple, transparent score used in the app:

Let (N) be the total number of extracted claims and let each claim (i) have verdict score (v_i) where:

  • (v_i = 1) for TRUE
  • (v_i = 0) for FALSE
  • (v_i = 0.5) for UNSURE

The basic credibility score is: [ \text{score} = 100 \cdot \frac{\sum_{i=1}^{N} v_i}{N} ]

A stronger weighted formulation that uses source reliability weights (w_{i,j}) for each source (j) supporting claim (i) is: [ \text{score} = 100 \cdot \frac{\sum_{i=1}^{N} \left( v_i \cdot \frac{\sum_j w_{i,j}}{\sum_j w_{i,j} + \epsilon} \right)}{N} ] where (w_{i,j}) encodes domain trust (e.g., high for recognized news/orgs, low for personal blogs) and (\epsilon) avoids division by zero.

This math is simple, explainable, and easy to show in the UI or documentation.


Challenges faced

  1. Parsing & Structured Output
  • Ensuring the model returns valid JSON reliably required careful prompt constraints and examples. Without that, parsing natural text outputs into structured claims was brittle.
  1. Grounding & Source Mapping
  • Mapping LLM text segments → groundingChunks required careful handling of indices and fallback strategies when indices were missing or partial.
  1. OCR quality & layout
  • Screenshots with multi-column articles, low resolution, or embedded images produced noisy OCR text. This caused some claims to be truncated or mis-extracted.
  1. Latency & Quotas
  • Gemini calls with live Google Search grounding have higher latency and quota limits. For a hackathon demo we added caching for repeated queries and used mocked responses for rapid local testing.
  1. Avoiding Hallucinations
  • Despite grounding, LLMs can still hallucinate. We mitigated this by showing the grounding results (queries used and source links) prominently, and by scoring claims conservatively (favoring UNSURE if evidence is weak).
  1. Ethical considerations
  • Displaying search results and snippets must respect copyright and terms of service. We limited quoted snippets and provided links instead.

Final thoughts

VeriFact AI was built to be pragmatic: combine a modern LLM (Gemini 3) with live web grounding to minimize hallucinations and provide verifiable citations. The hackathon sprint taught that explainability, stable prompts, and graceful UX for noisy OCR are the three pillars of a usable fact-checking tool. With a few production polish items (source weighting, OCR improvements, caching for performance), this project can be scaled into a reliable utility for journalists, students, and everyday users who want to check facts quickly and confidently.

Built With

Share this project:

Updates