SiteSage: Verified Intelligence for the Web

Neural Ninjas


Inspiration

SiteSage was inspired by a recurring problem we faced while studying, researching, and navigating complex digital content.

Important information is often buried deep inside long websites, government portals, research documents, and multi-hour YouTube lectures. While large language models are powerful, they frequently hallucinate, over-explain, or fabricate context when information is missing. This behavior is unacceptable in environments where trust, verification, and precision are critical.

We wanted to build a system that prioritizes honesty over fluency — an AI that confidently answers when information exists and explicitly says when it does not.

This need for trustworthy, source-grounded intelligence led to the creation of SiteSage.


What It Does

SiteSage is a browser-native AI extension that enables users to interact with websites, documents, and videos in a precise, verifiable, and context-aware manner.

It allows users to:

  • Ask questions directly on any website or document
  • Receive answers strictly grounded in the visible content
  • Instantly verify sources and references
  • Avoid hallucinated or fabricated responses

A core feature of SiteSage is its Chatbot Controller, which provides two modes:

  • Learning Mode — allows contextual explanations for better understanding
  • Organizational Mode — enforces strict verification and refuses to answer when information is missing

This ensures trust, clarity, and zero hallucination in professional and organizational settings.


How We Built It

SiteSage is implemented as a Chrome-compatible browser extension that operates directly within the user’s browsing environment.

Context Extraction

The system dynamically captures and processes:

  • Website DOM content
  • Uploaded documents
  • YouTube video transcripts
  • User-selected snapshots and notes

All extracted content is normalized into a unified semantic context used for response generation.

Dual-Mode Intelligence

Every response generated by SiteSage follows the constraint:

Response ⊆ Extracted Context

In strict organizational mode, if required information is absent:

Answer = ∅

This architectural boundary prevents hallucination by design.

YouTube Snapshot-to-PDF System

We developed a specialized learning tool for YouTube videos that enables deep, revision-oriented note creation.

During video playback, users can capture snapshots that include:

  • Video timestamp
  • Screenshot frame
  • User annotations

Each snapshot is saved directly within the chat interface, creating a continuous learning trail. Throughout the video, users can review all captured snapshots, edit annotations, or delete unnecessary ones.

At the end of the video, users have full control to finalize their selection and generate a structured PDF containing only the chosen snapshots and notes. This allows effective revision while avoiding information overload.

Multilingual Reasoning

SiteSage supports cross-lingual understanding:

Query_English → Content_Any Language → Answer_English

This ensures accessibility without compromising accuracy.


Challenges We Ran Into

Eliminating Hallucination

The most significant challenge was ensuring the system does not generate content when information is missing. This required strict context enforcement and removal of generative bias.

Balancing Flexibility and Precision

Designing an AI system that supports both exploratory learning and strict organizational verification led to the development of the dual-mode Chatbot Controller.

Browser-Level Integration

Injecting features such as real-time transcript analysis, snapshot capture, and chat-based storage into live web environments required careful performance optimization.

Information Overload Control

Allowing users to capture many snapshots while still providing a final review and deletion stage was crucial to maintaining usability.


Accomplishments That We're Proud Of

  • Built a zero-hallucination, source-grounded AI system
  • Successfully implemented dual intelligence modes
  • Created an end-to-end YouTube learning and PDF generation workflow
  • Achieved seamless browser-level integration

What We Learned

Through this project, we learned that:

  • Trust is more important than fluency in AI systems
  • Hallucination can be reduced through architectural constraints
  • Users value transparency over overly confident responses
  • Browser-native AI opens powerful new interaction paradigms

What's Next for SiteSage

Our next steps include:

  • Expanding document format support
  • Improving performance on large-scale websites
  • Enhancing collaborative and organizational features
  • Scaling multilingual capabilities

We aim to evolve SiteSage into a universal layer of verified intelligence for the web.

Built With

Share this project:

Updates