InvisiScan

PII Detection
Upload Image For visual Cue analysis
Visual Cue Analysis Result
Selection of cues to mask
Masking Result

Inspiration

We were inspired by the realization of just how much sensitive information can leak from seemingly ordinary content. While reviewing the GenAI Location Privacy challenge in the TikTok TechJam 2025 Information Document, we tested it ourselves: we snapped a photo near NUS and gave it to ChatGPT-5. The model’s prediction was eerily accurate. That experience made us realize how easily privacy could be compromised—whether through hidden location cues in photos or personal information in text. To raise awareness and give users control, we set out to build a tool that reveals and defuses these hidden risks.

What it does

InvisiScan lets users upload photos or text and automatically analyzes them for privacy risks.

For images, it detects location-revealing cues (like road signs or landmarks), highlights risky regions, masks them, and scrubs EXIF metadata.
For text, it identifies and redacts personally identifiable information (PII) such as names, addresses, or ID numbers.
The result: safer content that reduces the chance of unintended leaks while keeping everything else intact.

How we built it

We combined three layers:

LLM Geo-Hypotheses — Gemini 2.5 Flash suggests likely location cues from the image.
GroundingDINO Detection — Open-vocabulary object detection finds those cues in the image and returns bounding boxes.
Privacy Transform — EXIF metadata is stripped and smart pixelation masks the highlighted cues.

For text, we built a hybrid PII pipeline combining spaCy NER with intelligent regex patterns to catch sensitive data.

We stitched it all together with FastAPI endpoints, Playwright helpers for data scraping, and Pydantic models for clean orchestration.

Challenges we ran into

PII data was being misclassified, so we combined spaCy NER with regex for higher accuracy.
LLM predictions for geo-location were sometimes unreliable, requiring tuning and filtering.
GroundingDINO initially produced inaccurate bounding boxes, so we added box-size filtering and cue prioritization.
Balancing hackathon development with heavy coursework made time management a constant challenge.

Accomplishments that we're proud of

Built a working end-to-end pipeline in hackathon time that processes both photos and text.
Achieved strong cue detection, with ~50% of LLM guesses falling within 100 km of the ground truth in early tests.
Designed the system to be privacy-first, re-encoding images to strip EXIF and redacting PII by default.
Pulled this off under intense academic schedules, showcasing teamwork and rapid problem-solving.

What we learned

Privacy leaks extend beyond GPS tags—subtle visual cues and personal text data can be just as revealing.
Open-vocabulary detection like GroundingDINO pairs really well with LLM-inferred phrases.
PII detection benefits from hybrid methods rather than relying on a single model.
Building for privacy means balancing accuracy with usability—protecting without over-masking.

What's next for InvisiScan

Integrating SOTA SAM models for tighter, shape-aware masking.
Adding inpainting models to naturally fill masked regions instead of pixelating them.
Building an iterative loop to re-analyze masked photos/text and verify risk reduction.
Integration with Google StreetView API for more accurate results.

Built With

fastapi
llms
ner
nextjs
ocr
pydanticai
python
react
regex
spacy
transformers
typescript

Submitted to

TikTok TechJam 2025

Created by

Developed backend features and implemented the LLM + Grounding DINO integration. I was most proud of this contribution, as testing showed strong performance in detecting location cues from images (~51.9% of guesses within 100 km on a 27-image dataset). I also scraped Google Street View data to support model testing. Additionally, I integrated services for image masking and carried out minor code refactoring.

Sukumar Ganesan
Developed an explicit cue detection pipeline that combined OCR (EasyOCR), object detection (YOLOv8), and barcode/QR recognition (ZBar). The system successfully flagged sensitive localizable elements such as road and shop signs, license plates, faces, logos, and building numbers from input images. I was most proud of designing the end-to-end integration, as it automated the identification of multiple cue types in a single run. Also handled dependency setup and debugging (e.g., ZBar shared library issues) to ensure smooth deployment in a notebook environment.

Vedant Rai
I worked on the full-stack development of the project, using FastAPI for the backend and Next.js for the frontend, and implemented endpoints to handle diverse inputs such as text, documents, and images. For images, I worked on integration with object detection component and additional post-processing steps like EXIF data removal to enhance privacy. I also contributed to enhancement of the PII redaction system by experimenting with different model combinations (e.g., DistilBERT, Microsoft Presidio, spaCy), benchmarking their NER accuracies, and identifying the most effective pipelines.

Moukthika Muthukrishnan
My work focused on loading, testing, and evaluating models for the privacy-preserving image pipeline, including SAM for segmentation, EasyOCR for text detection, and OpenCV for faces and sign boards. I tested different configurations and masking effects to achieve high accuracy and effective obfuscation, ultimately accomplishing robust image masking that reliably hides sensitive cues like text, signs, and faces while maintaining overall image usability.

KEERTHANA S
deleted deleted