ANONIFY

Anonify: A secure, customizable redaction tool that anonymizes data while preserving structure, enabling privacy across multiple formats.

Comment

Inspiration --> Our motivation for developing Anonify stemmed from the increasing need for data privacy in today’s digital world. With the rise in data breaches and privacy concerns, we wanted to create a tool that would empower users to protect sensitive information while maintaining data usability.

What it does -->Anonify is a secure tool that allows users to redact, mask, or anonymize sensitive data across various formats. It provides customizable redaction levels and accurately detects personally identifiable information (PII) using spaCy. The application runs locally via Docker for enhanced data security while enabling synthetic data generation for safe data sharing and compliance with privacy regulations.

How we built it -->We developed Anonify using React for the frontend, integrating spaCy for detecting personally identifiable information (PII), and deployed it as a Docker image for enhanced security.

Challenges we ran into -->We faced challenges in ensuring accurate PII detection across various formats and balancing complex functionalities with user-friendly design.

Accomplishments that we're proud of -->We are proud of creating a functional prototype that effectively redacts sensitive information while maintaining user control and data security.

What we learned -->We learned about effective collaboration in a team and the importance of user feedback in development.

What's next for ANONIFY-->Next, we aim to expand Anonify's capabilities to support more document formats, enhance synthetic data generation features, and improve usability and security. Additionally, we will focus on implementing image redaction capabilities to broaden the tool's applicability across various media types.

Built With

Submitted to

Code for Impact

Created by

1. Developed a Backend Server with Flask

Built a Flask-based server to handle user requests, providing an endpoint for uploading PDF files and processing them for text extraction and entity recognition.

2. Implemented PDF Text Extraction using PyPDF

Utilized PyPDF to extract textual data from PDF documents uploaded by the user, handling multi-page documents and preparing the text for further processing.

3. Text Preprocessing

Cleaned and structured the extracted text to ensure it was ready for analysis by removing unnecessary characters and formatting.

4. Integrated Named Entity Recognition (NER) with spaCy

Integrated spaCy's pre-trained NER model to automatically detect and identify entities (such as names, dates, organizations) from the extracted text.

Swaraat Chatterjee
Ritika Keshri
Nabeel Wasif
Aditya Raj

Updates

Ritika Keshri started this project — Sep 29, 2024 09:39 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.