Inspiration

The idea for Sentiency started in a surprisingly fitting place: the bathroom at the Google campus hackathon venue.

Pinned on the wall was a “Tech on the Toilet” poster titled “Protect The User From Prompt Injection.” It turned a security problem that usually feels abstract into something immediate and real. We were surrounded by builders making AI products that could read, summarize, copy, paste, and act on behalf of users — and that poster made one thing click for us: if AI is becoming an action-taking interface, then prompt injection is not just a model problem anymore. It is a user safety problem.

That moment became the seed of Sentiency.

We started thinking about all the invisible ways malicious instructions can reach an AI system: hidden text on a webpage, poisoned clipboard content, image-based injections, or hostile content embedded in a live conversation. Most people never see these attacks happen. They just see the AI behave strangely after trusting the wrong input.

So we wanted to build something that protects users before the model is manipulated — something that works where people actually interact with AI: inside the browser, in real time.

We also wanted the project to feel bigger than a hackathon demo. The story behind Sentiency is simple: if AI is going to be used by everyone, then safety cannot be buried in research papers, security docs, or bathroom posters. It has to become a usable product.

What it does

Sentiency is a real-time Chrome extension that detects, classifies, and remediates prompt injection before it reaches an LLM.

It protects users across multiple attack surfaces:

  • Webpage content: scans for visually hidden or suspicious text in the DOM
  • Clipboard paste: intercepts pasted text before it lands in an AI input box
  • Clipboard image paste: analyzes pasted images for prompt-injection content
  • Copy events: checks copied text for suspicious instructions
  • Live LLM sessions: monitors assistant/user conversation flow for single-turn and multi-turn attack patterns
  • Manual scans: lets users scan selected text on demand

Sentiency combines local heuristics with multimodal LLM-based classification to identify risky content such as:

  • hidden instructions
  • obfuscated payloads
  • encoded or disguised prompts
  • image-based injection attempts
  • session-level manipulation patterns

When a threat is detected, Sentiency can:

  • warn the user
  • block unsafe content
  • sanitize dangerous spans
  • highlight suspicious text
  • reduce the chance that malicious instructions ever reach the model

Most importantly, it runs entirely in the browser with no backend server, which keeps the system lightweight and privacy-conscious.

How we built it

We built Sentiency as a Chrome Manifest V3 extension designed for real-time, in-browser protection.

Core architecture

  • Content scripts watch pages, clipboard events, selections, and AI chat interfaces
  • A background service worker handles messaging, commands, context menu actions, and extension-level coordination
  • A side panel + options page provide controls, settings, scans, and threat visibility
  • A shared threat pipeline merges local detectors with LLM classification into one risk decision

Detection pipeline

We designed detection as a layered system:

  1. Local heuristic detectors

    • hidden/visually concealed text detection
    • unicode anomaly detection
    • instruction-pattern detection
    • encoding / obfuscation detection
    • span extraction for highlighting and remediation
  2. LLM classification

    • Gemini-based structured JSON classification
    • text and image analysis
    • single-turn prompt injection detection
    • trajectory analysis over recent chat turns
  3. Threat scoring and remediation

    • taxonomy mapping
    • confidence-based severity scoring
    • configurable remediation behavior
    • logging and UI alerts

Tech stack

  • Chrome Extension (Manifest V3)
  • React 18 for UI
  • Shadow DOM for in-page interface isolation
  • Tailwind CSS for settings and panel styling
  • Webpack + Babel for bundling
  • Gemini REST API for multimodal classification
  • chrome.storage.local for settings, API keys, and threat history

We focused on making the product feel real: not just “detect a string,” but actually operate across the messy surfaces where prompt injection shows up in practice.

Challenges we ran into

One of the biggest challenges was that prompt injection is not a single attack pattern — it is a whole family of behaviors.

1. High sensitivity vs false positives

If the detector is too aggressive, normal webpage content or strong wording gets flagged. If it is too relaxed, hidden or obfuscated instructions slip through. Finding the right balance between precision and recall was one of the hardest parts.

2. Hidden text is surprisingly tricky

Attackers do not just write “ignore previous instructions” in plain text. They can hide content with CSS, opacity tricks, off-screen positioning, zero-size boxes, matched foreground/background colors, unicode tricks, or encoded blobs. Detecting that reliably required building several layers of heuristic logic.

3. Real-time UX constraints

Security tools are easy to design badly. We did not want Sentiency to feel noisy, slow, or annoying. Running detection in real time while keeping the experience fast and understandable was a constant design tradeoff.

4. Image-based prompt injection

Text injection is hard enough; image-based injection adds another dimension. We had to think about pasted screenshots, uploaded images, and hidden textual instructions embedded visually.

5. Browser extension timing and messaging

Chrome extension development introduces its own complexity: content scripts loading at the right time, messaging between the page and service worker, handling tabs that are not ready yet, and building something stable across different sites.

6. Session-level reasoning

Single-turn detection is useful, but some of the most dangerous attacks build up over multiple conversation turns. Modeling that “trajectory” of manipulation was much harder than just scanning one block of text.

Accomplishments that we're proud of

We are proud that Sentiency became more than just a concept — it became a working, end-to-end product prototype.

What we are most proud of

  • Building a real-time browser-native defense instead of a static demo
  • Supporting multiple attack surfaces: DOM, clipboard, copy, image paste, and live sessions
  • Running without a backend server
  • Combining local security heuristics with multimodal AI classification
  • Designing a system that is both technical and user-facing
  • Turning a niche security issue into something understandable and actionable for normal users

We are also proud of the story behind it. Sentiency was inspired by a bathroom poster at the hackathon venue, but we turned that spark into a product vision: AI safety that meets users where they already are.

That felt meaningful. A lot of teams build more AI. We wanted to build something that helps people use AI more safely.

What we learned

This project taught us that prompt injection is not just an LLM problem — it is a systems problem, a product problem, and a human trust problem.

We learned that:

  • Security needs to happen at the interface layer, not only inside the model
  • Many dangerous attacks are invisible to the user
  • Good security products must explain risk clearly, not just detect it
  • Real-world AI safety needs layered defenses, not one perfect classifier
  • Browser extensions are a powerful place to build safety tooling because that is where user interaction actually happens

We also learned that some of the best project ideas come from paying attention to the environment around us. In our case, a poster in a hackathon bathroom ended up becoming the foundation for a product we genuinely believe should exist.

What's next for Sentiency - Real Time AI Prompt Injection Detection

This is only the beginning for Sentiency.

Next steps

  • Expand beyond text and pasted images into PDFs, notebooks, JSON, and other uploaded file types
  • Improve support for more AI platforms and more browser contexts
  • Build stronger explainability so users can understand exactly why something was flagged
  • Add policy modes for different risk environments, from casual users to enterprise teams
  • Create a benchmark dataset for real-world prompt injection examples
  • Improve multimodal coverage for OCR-heavy and steganographic attacks
  • Explore team / organization-level deployment for shared protection
  • Move from reactive warning to preventive trust scoring across an entire browsing session

Our long-term vision is for Sentiency to become a real-time security layer for the age of agentic AI: a system that helps users trust what goes into their models, not just what comes out.

If the future of computing is AI-powered, then users need a seatbelt.

We want Sentiency to be one of them.

Built With

  • babel
  • chrome-extension-manifest-v3
  • chrome-manifest-v3
  • chrome-side-panel-api
  • chrome.storage.local
  • clipboard-api
  • context-menus
  • css
  • google-gemini-api
  • google-gemini-rest-api
  • html
  • html/css
  • javascript
  • mutationobserver
  • npm
  • postcss
  • react-18
  • react-dom
  • shadow-dom
  • tailwind-css
  • webpack
  • webpack-5
Share this project:

Updates