Inspiration

In today’s data-driven world, students conducting research face the challenge of harnessing the power of AI analytics without compromising individual privacy. Inspired by how healthcare institutions anonymise patient data for secure analytics, we realised similar techniques could be applied to academic research and surveys on personal topics. By anonymising survey data, students can explore sensitive issues and gain meaningful insights about their communities without jeopardising anyone’s privacy. This approach enhances research while fostering open participation and protecting identities, which inspired us to create Veracrypt—a platform that makes privacy-preserving data analytics accessible to everyone

What it does

Veracrypt is a privacy-preserving data analytics platform that protects sensitive information while enabling powerful insights.

  1. Synthetic Data Generation: Upload datasets for analysis without exposing raw data. Veracrypt creates a synthetic dataset that preserves statistical accuracy and correlations while protecting individual identities.
  2. Secure Survey System: Conduct encrypted, time-limited surveys. Responses are anonymised and converted into synthetic data for safe statistical analysis, ensuring respondent privacy.

By blending synthetic data, encryption, and secure session management, Veracrypt makes privacy-preserving analytics accessible and safe.

How we built it

Tech Stack

  • Frontend: Streamlit with custom styling using CSS and Tailwind for a responsive interface
  • Backend: Python with PyMongo for database interaction, python-dotenv for environment management, and Cryptography & dnspython for secure operations
  • Database: MongoDB Atlas for scalable cloud-based storage
  • Synthetic Data Generation: Techniques including Pairwise Pearson Correlation, Kernel Density Estimation (KDE), Inverse Transform Sampling, Iterative Adjustment with Dynamic Step Size, and Min- Max Clipping for accurate data simulation
  • Survey System: Built using Streamlit for real-time interactivity, enhanced with custom CSS and Tailwind for improved design
  • Encryption: Symmetric encryption powered by Fernet from the Cryptography library for secure data storage and transmission

Challenges we ran into

  • Achieving statistical accuracy in synthetic data while ensuring complete anonymity
  • Ensuring accurate correlation between categorical and numerical data to preserve key relationships in - the synthetic dataset
  • Finding and testing manipulation techniques to determine the most effective methods for different types of datasets
  • Implementing a secure key management system for encrypted surveys
  • Optimising performance for large datasets while applying privacy-preserving transformations
  • Balancing stringent security protocols for user safety

Accomplishments that we're proud of

  • Achieved near-perfect statistical correlation between original and synthetic datasets, preserving data utility while ensuring privacy
  • Developed a sophisticated encryption system with a zero-knowledge architecture for maximum security
  • Built an innovative session management system for time-limited survey access and controlled data collection
  • Created a seamless workflow for secure survey creation and response collection, enhancing user - engagement and data security

What we learned

  • Advanced privacy-preserving data transformation techniques for generating high-quality synthetic datasets
  • Best practices in cryptographic security, particularly in key management and encryption protocols
  • The importance of user safety and privacy in building trust for privacy-focused applications
  • Strategies to balance data utility with privacy protection, ensuring meaningful insights without exposing sensitive information
  • Effective methods for validating synthetic data quality to maintain statistical relevance and accuracy

What's next for Veracrypt

  • Supporting complex data types such as images, text, and time-series data for broader use cases
  • Developing automated privacy impact assessments to simplify compliance with data protection regulations
  • Creating an API for seamless integration with enterprise systems and workflows
  • Building advanced visualisation tools for better validation and understanding of synthetic data
  • Expanding survey features with customisable privacy settings to fit diverse research needs
  • Launching a marketplace for pre-trained privacy-preserving models, making it easier for users to implement secure AI solutions

Built With

Share this project:

Updates