Inspiration

In today's data-driven world, ensuring privacy compliance is critical. Many organizations struggle to efficiently remove sensitive data from documents while maintaining their usability. RedactAI was born out of the need for a seamless, automated solution that protects personally identifiable information (PII) and adheres to regulations like GDPR.

What it does

RedactAI extracts text from documents using Azure AI Computer Vision, then scans for sensitive data with Azure AI Language Services. It replaces PII with generic placeholders, allowing users to review both the original and sanitized versions side by side. The cleaned-up document is securely stored in Azure Blob Storage for future use.

How we built it

  • Frontend: Developed using React to provide an intuitive interface for document upload and comparison.
  • Backend: Built with FastAPI to process document submissions and communicate with Azure services.
  • AI Processing: Leveraged Azure AI Computer Vision for OCR and Azure AI Language Services for PII detection and redaction.
  • Storage: Used Azure Blob Storage to securely store redacted documents.

Challenges we ran into

  • Fine-tuning PII detection to minimize false positives and false negatives.
  • Integrating multiple Azure AI services while maintaining performance and accuracy.
  • Ensuring redacted text remains readable and structured properly within the document.

Accomplishments that we're proud of

  • Successfully built an end-to-end pipeline for secure document redaction.
  • Seamlessly integrated Azure AI services for OCR, PII detection, and redaction.
  • Created a user-friendly web application with real-time document comparison.

What we learned

  • Best practices for implementing AI-powered text extraction and redaction.
  • Optimizing cloud-based services for efficiency and scalability.
  • Balancing accuracy and usability when replacing sensitive data in documents.

What's next for RedactAI

  • Support More File Types: Extend compatibility to formats like DOCX and TXT.
  • Enhanced Data Scrubbing: Improve PII detection precision and seamless integration back into documents.
  • Analytics Dashboard: Provide insights into scrubbed data types and trends.
  • User Authentication: Implement access control to protect sensitive information.

Built With

  • azure
  • azure-ai-computer-vision
  • azure-ai-language-services
  • azure-blob-storage
  • fastapi
  • python
  • react
Share this project:

Updates