Sentiency - Real Time AI Prompt Injection Detection

Inspiration

The idea for Sentiency started in a surprisingly fitting place: the bathroom at the Google campus hackathon venue.

Pinned on the wall was a “Tech on the Toilet” poster titled “Protect The User From Prompt Injection.” It turned a security problem that usually feels abstract into something immediate and real. We were surrounded by builders making AI products that could read, summarize, copy, paste, and act on behalf of users — and that poster made one thing click for us: if AI is becoming an action-taking interface, then prompt injection is not just a model problem anymore. It is a user safety problem.

That moment became the seed of Sentiency.

We started thinking about all the invisible ways malicious instructions can reach an AI system: hidden text on a webpage, poisoned clipboard content, image-based injections, or hostile content embedded in a live conversation. Most people never see these attacks happen. They just see the AI behave strangely after trusting the wrong input.

So we wanted to build something that protects users before the model is manipulated — something that works where people actually interact with AI: inside the browser, in real time.

We also wanted the project to feel bigger than a hackathon demo. The story behind Sentiency is simple: if AI is going to be used by everyone, then safety cannot be buried in research papers, security docs, or bathroom posters. It has to become a usable product.

What it does

Sentiency is a real-time Chrome extension that detects, classifies, and remediates prompt injection before it reaches an LLM.

It protects users across multiple attack surfaces:

Webpage content: scans for visually hidden or suspicious text in the DOM
Clipboard paste: intercepts pasted text before it lands in an AI input box
Clipboard image paste: analyzes pasted images for prompt-injection content
Copy events: checks copied text for suspicious instructions
Live LLM sessions: monitors assistant/user conversation flow for single-turn and multi-turn attack patterns
Manual scans: lets users scan selected text on demand

Sentiency combines local heuristics with multimodal LLM-based classification to identify risky content such as:

hidden instructions
obfuscated payloads
encoded or disguised prompts
image-based injection attempts
session-level manipulation patterns

When a threat is detected, Sentiency can:

warn the user
block unsafe content
sanitize dangerous spans
highlight suspicious text
reduce the chance that malicious instructions ever reach the model

Most importantly, it runs entirely in the browser with no backend server, which keeps the system lightweight and privacy-conscious.

How we built it

We built Sentiency as a Chrome Manifest V3 extension designed for real-time, in-browser protection.

Core architecture

Content scripts watch pages, clipboard events, selections, and AI chat interfaces
A background service worker handles messaging, commands, context menu actions, and extension-level coordination
A side panel + options page provide controls, settings, scans, and threat visibility
A shared threat pipeline merges local detectors with LLM classification into one risk decision

Detection pipeline

We designed detection as a layered system:

Local heuristic detectors
- hidden/visually concealed text detection
- unicode anomaly detection
- instruction-pattern detection
- encoding / obfuscation detection
- span extraction for highlighting and remediation
LLM classification
- Gemini-based structured JSON classification
- text and image analysis
- single-turn prompt injection detection
- trajectory analysis over recent chat turns
Threat scoring and remediation
- taxonomy mapping
- confidence-based severity scoring
- configurable remediation behavior
- logging and UI alerts

Tech stack

Chrome Extension (Manifest V3)
React 18 for UI
Shadow DOM for in-page interface isolation
Tailwind CSS for settings and panel styling
Webpack + Babel for bundling
Gemini REST API for multimodal classification
chrome.storage.local for settings, API keys, and threat history

We focused on making the product feel real: not just “detect a string,” but actually operate across the messy surfaces where prompt injection shows up in practice.

Challenges we ran into

One of the biggest challenges was that prompt injection is not a single attack pattern — it is a whole family of behaviors.

1. High sensitivity vs false positives

If the detector is too aggressive, normal webpage content or strong wording gets flagged. If it is too relaxed, hidden or obfuscated instructions slip through. Finding the right balance between precision and recall was one of the hardest parts.

2. Hidden text is surprisingly tricky

Attackers do not just write “ignore previous instructions” in plain text. They can hide content with CSS, opacity tricks, off-screen positioning, zero-size boxes, matched foreground/background colors, unicode tricks, or encoded blobs. Detecting that reliably required building several layers of heuristic logic.

3. Real-time UX constraints

Security tools are easy to design badly. We did not want Sentiency to feel noisy, slow, or annoying. Running detection in real time while keeping the experience fast and understandable was a constant design tradeoff.

4. Image-based prompt injection

Text injection is hard enough; image-based injection adds another dimension. We had to think about pasted screenshots, uploaded images, and hidden textual instructions embedded visually.

5. Browser extension timing and messaging

Chrome extension development introduces its own complexity: content scripts loading at the right time, messaging between the page and service worker, handling tabs that are not ready yet, and building something stable across different sites.

6. Session-level reasoning

Single-turn detection is useful, but some of the most dangerous attacks build up over multiple conversation turns. Modeling that “trajectory” of manipulation was much harder than just scanning one block of text.

Accomplishments that we're proud of

We are proud that Sentiency became more than just a concept — it became a working, end-to-end product prototype.

What we are most proud of

Building a real-time browser-native defense instead of a static demo
Supporting multiple attack surfaces: DOM, clipboard, copy, image paste, and live sessions
Running without a backend server
Combining local security heuristics with multimodal AI classification
Designing a system that is both technical and user-facing
Turning a niche security issue into something understandable and actionable for normal users

We are also proud of the story behind it. Sentiency was inspired by a bathroom poster at the hackathon venue, but we turned that spark into a product vision: AI safety that meets users where they already are.

That felt meaningful. A lot of teams build more AI. We wanted to build something that helps people use AI more safely.

What we learned

This project taught us that prompt injection is not just an LLM problem — it is a systems problem, a product problem, and a human trust problem.

We learned that:

Security needs to happen at the interface layer, not only inside the model
Many dangerous attacks are invisible to the user
Good security products must explain risk clearly, not just detect it
Real-world AI safety needs layered defenses, not one perfect classifier
Browser extensions are a powerful place to build safety tooling because that is where user interaction actually happens

We also learned that some of the best project ideas come from paying attention to the environment around us. In our case, a poster in a hackathon bathroom ended up becoming the foundation for a product we genuinely believe should exist.

What's next for Sentiency - Real Time AI Prompt Injection Detection

This is only the beginning for Sentiency.

Next steps

Expand beyond text and pasted images into PDFs, notebooks, JSON, and other uploaded file types
Improve support for more AI platforms and more browser contexts
Build stronger explainability so users can understand exactly why something was flagged
Add policy modes for different risk environments, from casual users to enterprise teams
Create a benchmark dataset for real-world prompt injection examples
Improve multimodal coverage for OCR-heavy and steganographic attacks
Explore team / organization-level deployment for shared protection
Move from reactive warning to preventive trust scoring across an entire browsing session

Our long-term vision is for Sentiency to become a real-time security layer for the age of agentic AI: a system that helps users trust what goes into their models, not just what comes out.

If the future of computing is AI-powered, then users need a seatbelt.

We want Sentiency to be one of them.

Built With

babel
chrome-extension-manifest-v3
chrome-manifest-v3
chrome-side-panel-api
chrome.storage.local
clipboard-api
context-menus
css
google-gemini-api
google-gemini-rest-api
html
html/css
javascript
mutationobserver
npm
postcss
react-18
react-dom
shadow-dom
tailwind-css
webpack
webpack-5

Submitted to

MVHacks 9.0
- Winner 3rd Place

Created by

Created the chromium based extension for our project where text and images can be imputed and cleansed of malicious prompts if necessary.

Tcer 37
Naitik Gupta
Julian Juan
Eric Hou

Updates

Naitik Gupta started this project — Mar 28, 2026 09:05 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.