Inspiration
We initially considered building a playful, kid-friendly privacy tool themed around the Cookie Monster. It was fun and engaging, but ultimately we pivoted to something more universally applicable: AnonymousDoc AI. This direction allowed us to tackle real-world privacy challenges in education, healthcare, and legal contexts, where sensitive documents need to be redacted quickly and responsibly.
What It Does
AnonymousDoc AI is a privacy-first document redaction tool that:
Accepts PDF uploads via a clean, intuitive interface
Analyzes text using NLP and regex to detect sensitive entities (names, emails, locations, etc.)
Scores the document’s sensitivity level
Previews both the original and redacted versions
Generates downloadable redacted PDFs with visual overlays
It’s designed to be modular, teachable, and hackathon-ready — perfect for rapid prototyping, contributor onboarding, or CTF-style challenges.
How We Built It
Frontend: Started with vanilla HTML/CSS/JS, then migrated to React + TypeScript using Vite for modularity and maintainability
Backend: FastAPI with spaCy for entity detection, PyMuPDF for PDF parsing and redaction, and CORS middleware for seamless integration
Workflow: File upload → NLP analysis → sensitivity scoring → redaction overlay → export
We emphasized reusable logic, clean state management, and pedagogical clarity throughout.
Challenges We Ran Into
Getting PyMuPDF to reliably extract and redact text across diverse PDF formats
Handling image-based PDFs with no extractable text
Managing CORS and multipart form data in a local dev environment
Designing a scoring rubric that balances direct and indirect identifiers
Debugging async file handling between frontend and backend
Each challenge became a learning artifact — documented, modularized, and turned into flashcards or checklists for future contributors.
Accomplishments That We're Proud Of
Built a fully functional privacy tool from scratch in a short time frame with little experience
Migrated from vanilla JS to React + TypeScript with reusable components
Created a backend that not only analyzes but transforms documents
Designed a scoring system that’s explainable and extensible
Documented every major bug fix and architectural decision for teaching
What We Learned
How to bridge frontend-backend workflows with FormData and streaming responses
How to use spaCy and regex together for layered entity detection
How to scaffold modular components that are both functional and pedagogically useful
How to turn technical setbacks into reusable learning resources
How to design for trust, transparency, and privacy in UX
What's Next for AnonymousDoc AI
Expand entity detection with multilingual models and OCR for image-based PDFs
Build a contributor dashboard with flashcards, checklists, and onboarding flows
Integrate with secure cloud storage for document retention and audit trail
Expend to more file types ( etc docx)
Built With
- cors
- css
- fastapi
- html
- javascript
- pymupdf
- react
- spacy
- streamingresponse
- typescript
Log in or sign up for Devpost to join the conversation.