Inspiration
We were discussing how easy it is to accidentally send documents with sensitive information onto the internet and wanted an app that would censor all private information in a document with the click of a button!
What it does
DocuShield takes a Microsoft Office word document as input and with Natural Language Processing it detects possibly sensitive information and removes it. Then, the app exports a copy of the censored document.
How we built it
We used Python as our coding language and GitHub for version control. Originally, our program was using ReGex filters to find patterns in text, but we found this to be limiting. A more dynamic approach was to use the Natural Language Processing of the Spacy Python library. Then, used the tkinter library to create a simple GUI, and docx to manipulate Microsoft Office documents.
Challenges we ran into
Our biggest challenge was getting the different Python libraries we utilized to install and work properly on each of our laptops. This became problematic, as we were forced to change the versions of Python and it lead to a lot of headache in the early hours of the morning. Besides that, our second biggest challenge was to find a dynamic and flexible way to find sensitive information in a document without explicitly coding patterns to find.
Accomplishments that we're proud of
- Starting the project by using "hard-coded" Regex filters, and ending the program with an AI that covers more patterns in documents than we could've ever imagined.
- Building a solid GUI for the user to navigate easily.
- Being able to break the project into separate portions and multitask by using GitHub
What we learned
- Spend time setting up all coding environments before the event begins
- How to leverage Python packages to do just about anything!
- That creating and modifying PDF files through Python is incredibly difficult
What's next for Docushield
- Training a custom model for the App to utilize instead of using a pre-trained model from the Spacy library
- Nuking Linux-Machine in celebration!!!
Log in or sign up for Devpost to join the conversation.