RedactKit - Share Text, Not Secrets

Share personal info without exposing sensitive data - chat safely with AI using locally-masked text, get your original secrets back when copying responses.

  • Event: TikTok TechJam 2025 Hackathon
  • Track: #7 - Privacy Meets AI: Building a Safer Digital Future
  • Team Kopibara: Supachod Trakansirorut, Phanuphat Srisukhawasu
  • Repositories: Swift | Python

Problem Statement

In today’s digital interactions, users often share sensitive personal information, known as Personally Identifiable Information (PII), with AI assistants, inadvertently exposing themselves to potential privacy risks.

RedactKit addresses this critical concern by providing a robust solution that detects and redacts PII locally on the user’s device, ensuring that no sensitive data is transmitted to external AI services. This approach upholds user privacy without compromising the functionality and benefits of AI assistance.

Project Overview

RedactKit is composed of two integrated components working in synergy:

  1. Python Machine Learning Pipeline: This component is responsible for training bespoke models specifically tuned to identify various types of PII. By leveraging lightweight machine learning models, it produces highly accurate detection models ready for deployment.
  2. iOS SwiftUI Application: This user-facing app applies the trained model to perform real-time PII detection and redaction directly on the device. It provides a seamless and interactive interface for users, ensuring sensitive data is masked immediately as text is entered or shared. Currently support iOS, iPadOS, macOS, and VisionOS!

RedactKit uses a combination of Named Entity Recognition (NER) and machine learning algorithms to identify multiple PII categories with outstanding precision, achieving an F1 score of over 95% on test datasets.

Core Feature: Intelligent Restoration ✨

RedactKit also includes a unique functionality that allows previously redacted PII to be injected back into the chatbot’s responses when users copy text. This ensures that while the PII is protected during interaction and transmission, it can be restored in final output if required for subsequent use, maintaining both privacy and convenience.

Supported PII Types

Type Examples Detection Method
Person Names John Doe, Sarah Smith NER-based recognition
Email Addresses user@example.com Regex combined with NER
Phone Numbers (555) 123-4567, +1-800-555-0199 Regex combined with NER
Physical Addresses 123 Main St, Anytown, CA 90210 NER-based recognition
Social Security Numbers 123-45-6789 Regex combined with NER
Credit Card Numbers 4111-1111-1111-1111 Regex combined with NER
Dates 01/15/1990, Jan 15, 1990 Regex combined with NER
API Keys sk-xxxx, AKIAxxxx Regex-based validation

Future of RedactKit

During this hackathon, we explored multiple ideas and use cases for RedactKit. Due to time constraints, some features remain as future improvements. Upcoming versions of RedactKit will focus on:

  1. Cross-platform support – Extending compatibility to Android (via Kotlin), iPadOS, and macOS with customized UIs for different use cases.
  2. Internationalization – Adding multilingual support beyond English.
  3. Expanded PII coverage – Moving beyond common PII types to include items such as passport numbers and flight details.
  4. Additional file formats – Supporting images and videos using OCR and object detection to identify and redact PII directly within visual content.

Development Matters

APIs Utilized

Employed OpenAI API to generate over 3,000 synthetic PII data samples (with placeholder), enhancing the diversity and quality of training data for the detection models, before injecting fake PII with Faker.

Assets

  • Visual Components: Includes a professionally designed logo, app icons tailored for iOS, and SwiftUI interface components built with accessibility and an inclusive color palette in mind.
  • Machine Learning Assets: CoreML model packaged as PIIDetectionModel.mlpackage, NeuroBERT-Mini tokenizer for token handling, plus curated training datasets and model checkpoints.

Libraries and Frameworks

Python / Machine Learning

Faker~=37.6.0  
openai~=1.102.0  
evaluate~=0.4.5  
datasets~=4.0.0  
transformers~=4.56.0  
coremltools~=8.3.0  
numpy~=2.3.2  
torch~=2.8.0  
matplotlib~=3.10.6

iOS / Swift

  • SwiftUI for declarative UI development
  • CoreML for on-device ML inference
  • Natural Language framework for text processing
  • SwiftData for data persistence
  • Foundation for core functionalities

Architecture

Machine Learning Pipeline

  1. Data Generation: Synthetic PII training datasets are created using a combination of OpenAI-generated samples and Faker library templates.
  2. Model Training: Fine-tuning the NeuroBERT-Mini model on the generated PII datasets to achieve optimal detection performance.
  3. Model Conversion: Transforming the trained PyTorch model into CoreML format for efficient on-device deployment.
  4. Validation: Model rigorously tested to deliver an F1 score of 99%, indicating near-perfect detection accuracy.

iOS Application

  1. Model Integration: The CoreML model is embedded within the app, enabling seamless real-time PII inference without network dependency.
  2. Text Processing: Implements tokenization and analysis in real-time to detect and redact PII as users interact with the app.
  3. User Interface: Built with SwiftUI to provide a smooth user experience enhanced by appealing animations.
  4. Data Persistence: Utilizes SwiftData to securely manage user data within the app environment.

Resources

Built With

Share this project:

Updates