According to European laws, all the legal court proceedings have to be anonymized before publishing them or for audits. Currently, this is being done manually by interns & other lawyers. The time required for this process could be drastically reduced by the use of software like [Redacted].
What it does
We present a desktop app, [Redacted] that could be used to anonymize documents. [Redacted] recommends the personally identifiable information that needs to be removed for the document to be ready for publishing. In addition, there is an option to manually change some of the annotations provided by [Redacted]. This information is fed back into the model which helps in improving the model.
All this happens on your local machine (cross-platform on Mac OS, Windows & GNU/Linux) without leaking any data out of your machine to the cloud (or the big corporations).
How we built it
We built a cross-platform desktop app using Electron. The frontend is developed using React with components for annotating PDF documents. The personally identifiable information to be removed is suggested by the machine learning model built using spaCy. The UI also allows you to make changes to the suggestion. These suggestions are used to improve the model through online learning. After the document is approved, the anonymized document can be exported for use. The whole backend including serving the model is developed using Flask in Python.
Challenges we ran into
- Getting the annotations to work on PDF was also very challenging.
Accomplishments that we're proud of
- Getting the cross-platform desktop app to work.
- The tool is quite friendly to use & close to commercial annotation software in terms of UX.
What we learned
- Anonymizing legal documents is labor-intensive & mostly manual.
What's next for [REDACTED]
- Testing it out on real-world use cases for Kellerhals Carrard, Swisslex, Schellenberg Wittmer & NKF.