Inspiration

In today’s digital world, images are the new data leak. We screen-share medical records, email scanned contracts, and upload customer forms to the cloud without thinking. While we have mature firewalls for text, our visual data is largely unprotected.

A single accidental screenshot of a patient chart can violate HIPAA regulations and cost a hospital millions. I realized that existing solutions were either manual (too slow for real work) or relied on dumb "keyword matching" OCR that failed on complex, real-world documents.

This gap inspired GuardVision—a system built on the belief that privacy should be intelligent, instant, and invisible.

What it does

GuardVision is an intelligent privacy layer that "sees" and sanitizes sensitive data before it ever leaves your device or network. It operates on a Dual-Mode Architecture to solve the "Privacy Trilemma" of Speed, Accuracy, and Compliance:

  1. Individual Mode (Live MVP)
  • For Everyone: Researchers, developers, and everyday users.
  • Instant Redaction: Users drag-and-drop an image, and within seconds, Gemini 1.5 Flash identifies and redacts 18+ types of PII (Personally Identifiable Information).
  • Visual Reasoning: Unlike traditional regex tools, it detects complex entities like handwritten signatures, faces, and ID cards, understanding the context of where they appear on a page.
  1. Enterprise Mode (Architectural Vision)
  • For Hospitals & Banks: Designed for high-volume, regulated environments.
  • Zero-Trust Pipeline: We have architected a scalable backend (FastAPI + Redis + Celery) to handle bulk processing of thousands of records.
  • Audit-Ready: This mode (currently in roadmap) combines AI reasoning with deterministic rule-based checking (Microsoft Presidio) to create immutable audit logs for every redacted pixel, ensuring HIPAA/GDPR compliance.

How we built it (Gemini Integration)

We didn't just use Gemini as a chatbot; we used it as a Spatial Reasoning Engine.

  • Multimodal Analysis: We feed raw images (screenshots, documents, photos) directly into Gemini 1.5 Flash.
  • Coordinate Mapping: We engineered a prompt that forces the model to return precise bounding box coordinates [ymin, xmin, ymax, xmax] on a 0-1000 scale, effectively turning the LLM into a specialized object detection model for privacy concepts.
  • Contextual Intelligence: Traditional OCR sees a 9-digit number. Gemini sees a "Social Security Number" because of the surrounding context (e.g., "SSN:" label, form layout). This drastically reduces false positives while catching things regex misses.

*Tech Stack: * The frontend is a React/Vite app for speed, while the proposed enterprise backend leverages Python, Redis, and MinIO for robust data handling.

Challenges we ran into

  • Hallucinations vs. Safety: LLMs can sometimes be too confident. A near-miss on a bounding box means exposing half a name. We solved this by implementing a safety buffer algorithm that programmatically expands the returned coordinates by a calculated percentage, ensuring complete coverage of sensitive text even if the model is slightly off.

  • The "Context" Problem: Distinguishing between a "Doctor's Name" (public info) and a "Patient's Name" (private info) on the same medical form was tough. We had to iterate on our system prompts to help Gemini understand the role of different entities on a document.

Accomplishments that we're proud of

  • Real-Time Performance: Achieving <2 second latency for full-page analysis and redaction using Gemini 1.5 Flash.
  • Dual-Architecture Design: Thinking beyond the hackathon to design a system that could actually work in a hospital—splitting the "fast path" for users from the "audit path" for compliance.
  • AI-Native Approach: Proving that Multimodal AI is superior to traditional OCR for privacy tasks.

What we learned

  • We learned that privacy is a User Experience problem. If redaction is hard, people won't do it. By making it instantaneous and intelligent, we remove the friction, making safety the default option rather than an afterthought.

What's next for GuardVision

  • Enterprise Implementation: Fully building out the async processing pipeline (Redis/Celery) described in our roadmap.
  • Local Processing: Moving the inference to on-device models (like Gemini Nano) for a mathematically 100% private, offline solution.
  • PACS Integration: Plugins for medical imaging software to redact DICOM files automatically. Good

Built With

Share this project:

Updates