Inspiration

We don't like how AI has unrestricted access to our published creations and data. Personal stories drove us to action: AI systems have used our code and presented it as their own, created easily accessible and detailed profiles by scraping our social media without consent, and stolen work from artists, musicians, and writers. We wanted to create a tool for creatives that converts their creations into a format that AI cannot easily access, plagiarize, or recreate.

What it does

SNaiL is a web application that protects PDF documents from being easily consumed by AI training models. Users can upload their PDFs and download processed versions that maintain identical visual appearance while embedding invisible watermarks/code. These watermarks act as provenance signals that can redirect AI models away from the data, making it significantly harder for unauthorized AI systems to process and learn from your content.

How we built it

We built SNaiL as a Flask web application with secure authentication (Okta/Auth0), per-user file isolation, and a PDF processing pipeline. The core technology uses PyMuPDF to preserve exact page layouts while injecting invisible watermark text into the document's text layer. We implemented atomic file operations, SQLite database storage for document metadata, and filesystem-based session persistence. The architecture ensures that each user's documents remain completely isolated while providing seamless processing and download capabilities.

Challenges we ran into

The biggest challenge was developing a PDF processing method that preserves visual fidelity while effectively deterring AI consumption. We experimented with glyph substitution techniques but found they could break on complex PDFs, or generative AI was still able to process the document or ignore our watermarks/code. We also faced hurdles in implementing a technique that is invisible to humans but still deters generative AI. Managing file persistence and atomic operations while maintaining fast processing speeds required careful optimization.

Accomplishments that we're proud of

We successfully created a production-ready application that maintains perfect visual document integrity while adding protective measures. Our authentication system is vendor-neutral and secure, supporting multiple OIDC providers. We achieved true per-user isolation with zero data leakage between accounts. Most importantly, we built a tool that empowers creators to take a stand against unauthorized AI training while maintaining their documents' usability and professional appearance.

What we learned

We gained deep insights into PDF structure manipulation and learned how to balance security with usability. Working with OIDC authentication taught us about modern identity management, while implementing file isolation showed us the importance of secure multi-tenant design. We also learned about the complex relationship between document format preservation and content protection, discovering that simple solutions often work better than complex ones.

What's next for SNaiL

We plan to expand support beyond PDFs to include images, videos, and other creative formats. Future versions will incorporate more sophisticated protection techniques, including advanced glyph substitution with embedded fonts for maximum compatibility. Our ultimate goal is to establish SNaiL as the standard tool for content creators who want to maintain control over how AI systems interact with their work.

Built With

Share this project:

Updates