-
-
Chat interface with sample text input and Deep Seek response (light mode)
-
Chat interface with sample text input and Deep Seek response (dark mode)
-
Upload documents interface that allows multiple files at a time (supports .txt, .pdf, .xlsx and .docx files).\
-
Account details interface with create group, request to join a group and accept requests from users functionality.
-
User sign-up interface. This particular image shows an improper password configuration.
-
User login interface.
-
Sample data Stored within users SQL database.
-
Sample data Stored within groups SQL database.
-
Sample docucrypt-encipher-server-1 logging capabilities.
-
Docker Containers.
-
Docker Images.
-
Dockers Volumes.
A Comprehensive Overview of a Classification-Aware Document Management System
The project is built as a secure, classification-aware document retrieval and querying system that leverages modern software architecture principles and state-of-the-art technologies. By combining a robust Rust-based backend with an intuitive frontend and advanced natural language processing capabilities, the system is engineered to handle sensitive documents efficiently while enforcing strict security protocols and clear classification boundaries.
Core Architecture
At the heart of the system is a microservices architecture orchestrated through Rust’s Axum framework. The backend is divided into four primary components:
- Axum-based API Server
- Handles all HTTP requests and routing
- Implements async request processing
- Manages authentication and authorization
- Coordinates between services
- Ollama LLM Service
- Runs the deepseek-r1:1.5b language model
- Provides document summarization
- Handles natural language queries
- Supports GPU acceleration for improved performance
- PostgreSQL Database
- Stores user credentials and metadata
- Manages group relationships
- Maintains classification levels
- Handles document metadata storage
- Nginx Reverse Proxy
- Manages SSL/TLS termination
- Handles request routing
- Provides load balancing capabilities
- Implements security headers and CORS
This modular design not only simplifies maintenance and scaling but also ensures that each component can be optimized for its specific function. On the frontend, a combination of React and Tauri is used to build a cross-platform application deployable as WebAssembly across Windows, Mac, and mobile devices. The integration of React with Tailwind CSS results in a clean and user-friendly interface that enhances overall usability.
Security Implementation
Security is a central concern in this project. The system employs JWT-based authentication, where tokens include essential information such as:
- Subject claim (representing the username)
- Expiration timestamp
- HMAC-SHA256 signature created using a secret key
This ensures that each token is both authentic and time-bound. The authentication flow is rigorous:
- Passwords are hashed using Argon2id—with salts generated to enhance security.
- Tokens with a 24-hour expiration are issued.
- A middleware pipeline continuously validates these tokens for access to protected routes, ensuring that only authorized users can access sensitive endpoints.
Encryption and Data Protection
Beyond authentication, encryption plays a crucial role in safeguarding data:
- Password Security: Utilizes Argon2id for hashing, complete with dynamic salt generation and hash verification.
- Data Protection: Sensitive data is encrypted using AES-256, coupled with secure key management practices.
This dual approach ensures that both user credentials and document content are stored and transmitted securely.
Document Processing and the GraphRAGManager
The core document processing engine is encapsulated in the GraphRAGManager, which implements a sophisticated Retrieval-Augmented Generation (RAG) system. Key features include:
- Multi-classification support: Documents are segregated by classification levels such as "UNCLASSIFIED", "CUI", "SECRET", and "TOPSECRET". Each classification level is managed within its own graph database instance.
- Embedding Generation: The system uses the all-MiniLM-L12-v2 model to generate document embeddings and employs cosine similarity calculations via the ndarray library, with a similarity threshold set at 0.7.
- Graph Database Implementation: Built on RocksDB, it establishes vertex-edge relationships for document similarity, storing metadata, embeddings, and relationships efficiently.
Detailed Graph and Query Processing
The GraphRAGSystem is designed for precision:
- Document Vertex Structure: Each document is assigned a unique UUID and its embeddings are stored as float32 vectors.
- Similarity Graph Construction: Relationships are established using weighted edges based on cosine similarity, with bidirectional connections and threshold-based pruning (SIMILARITY_THRESHOLD = 0.7).
- Query Processing Pipeline: The system generates embeddings for query text, performs a k-nearest neighbor search (TOP_K = 5), aggregates relevant context for LLM prompting, and applies classification-aware filtering to ensure appropriate access.
Zero-Shot Classification with RustBERT
To further enhance document categorization and query routing, zero-shot classification is integrated using RustBERT. This dynamic approach:
- Categorizes documents into predefined tags.
- Supports multi-label classification scenarios.
- Is utilized during both document ingestion and query processing to maintain accurate categorization and routing.
API Endpoints and Database Schema
A robust set of API endpoints supports core functionalities:
- Authentication Endpoints:
/register(POST): Handles user registration with Argon2 password hashing and stores credentials in PostgreSQL./login(POST): Validates credentials, issues JWT tokens, and implements rate limiting.
- Protected Routes:
/query(POST): Manages LLM queries with classification awareness, incorporates timeout mechanisms, and returns structured responses with timing metrics./users/clearance(PUT): Updates user clearance levels with role-based access control.
- Group Management Endpoints:
/groups/users/add(POST): For admin-level user addition./groups/users/join(POST): For user self-join with password./groups/admins/promote(POST): For admin promotion./groups/tags(GET): For tag retrieval.
The PostgreSQL database schema includes tables for users and groups, with fields that support unique identification, password storage, group memberships, and clearance level arrays for classifications (Unclassified, CUI, Secret, TopSecret).
LLM Integration and Infrastructure
The integration with the Ollama LLM service further enhances system capabilities:
- LLM Integration: Utilizes the deepseek-r1:1.5b model, containerized with GPU support and optimized through custom initialization scripts, connection pooling, and health checks.
- Infrastructure: Employs Docker multi-stage builds, persistent volume management, and GPU passthrough configurations. Nginx is used for TLS termination, reverse proxy functionality, and custom domain routing.
Performance Optimizations and System Resilience
Performance is optimized at several layers:
- Database: Utilizes connection pooling, prepared statements, and transaction management.
- Memory Management: Efficient embedding storage and smart pointer usage (Arc) are implemented.
- Async Implementation: Based on the Tokio runtime with efficient task scheduling and timeout mechanisms.
These optimizations ensure the system remains responsive and scalable.
Security, Error Handling, and Monitoring
The project emphasizes robust security measures across multiple layers:
- Network Security: Enforced through TLS, CORS configurations, and secure headers.
- Access Control: Based on role-based permissions and classification levels.
- Data Security: Via secure password hashing and token encryption.
- Error Handling: Features comprehensive error types, structured responses, logging, and tracing.
- Monitoring: Utilizes health check endpoints, a tracing layer, and detailed performance metrics to ensure continuous oversight.
Backend Conclusion
In summary, this classification-aware document management system represents a cutting-edge solution that combines rigorous security practices, advanced natural language processing, and efficient document retrieval. Its modular microservices architecture, robust encryption, and dynamic classification mechanisms make it an ideal platform for managing sensitive documents. The integration of modern development frameworks and performance optimizations ensures that the system is secure, scalable, and capable of meeting the demands of today’s complex digital environments.
Frontend Architecture and Implementation
The classification-aware document management system's frontend is built with a modern, hybrid approach that seamlessly combines web flexibility with native performance. By integrating React with Tauri, the application delivers a robust desktop experience that capitalizes on the rich ecosystem of web technologies while providing the enhanced security and efficiency of native applications.
Advanced React Integration
Leveraging React 18's cutting-edge features—such as concurrent rendering and automatic batching—the frontend ensures a smooth, responsive user interface. Global state management for key functions like user authentication and theme preferences is efficiently handled through React Context, while local component state manages specific UI interactions. This clear separation of concerns promotes clean code and rapid, efficient state updates throughout the application.
Consistent Styling with Tailwind CSS
Tailwind CSS is employed as the primary styling framework, offering a utility-first approach that accelerates development without sacrificing consistency. Custom design tokens and components, configured via Tailwind’s system, ensure that the interface adheres to organizational branding and accessibility standards. With responsive design built into Tailwind’s breakpoint utilities, the application delivers a fluid experience across various devices and screen sizes.
Robust Security Measures
Security is a core pillar of the frontend architecture. All API communications are managed through a dedicated service layer that handles authentication tokens, implements retry logic, and gracefully manages errors. The Tauri integration further strengthens security by enforcing strict permissions, limiting system access, and reducing exposure to potential vulnerabilities. Additionally, WebAssembly modules are utilized for resource-intensive tasks like document preprocessing and local search, ensuring that performance is maintained without compromising security.
Feature-Rich User Experience
The frontend offers a comprehensive set of features designed for efficiency and ease of use:
- Document Viewer: A sophisticated viewer that clearly marks document classification levels.
- Real-Time Collaboration: Tools that enable seamless interaction and document sharing among users.
- Advanced Search: A powerful search interface that respects user clearance levels.
- Document Management: Intuitive controls for uploading, categorizing, and managing documents, with clear visual indicators for classification and access restrictions.
- Notification System: Real-time alerts for document updates, classification changes, and system events keep users well-informed.
Performance and Offline Capabilities
To maintain a consistently high performance, the frontend incorporates progressive loading techniques that ensure responsiveness even during complex or resource-heavy operations. Offline functionality is achieved through the strategic use of local storage and IndexedDB, allowing users to access previously loaded documents without network connectivity while maintaining strict security for classified information.
Streamlined Development Workflow
The development lifecycle is supported by a comprehensive testing suite that includes:
- Unit Tests: Implemented with Jest for isolated component validation.
- Integration Tests: Leveraging Testing Library to ensure cohesive interactions.
- End-to-End Tests: Using Playwright to simulate real-world usage scenarios.
Moreover, the build process is optimized using Vite, which enables rapid development cycles and efficient production builds across all supported platforms.
Future Works
This section outlines potential future enhancements for the classification-aware document management system, focusing on advanced encryption techniques, intelligent monitoring, and additional security and performance improvements.
Fully Homomorphic Encryption (FHE)
One of the most promising avenues for future work is the integration of Fully Homomorphic Encryption (FHE) into the system. FHE allows computations to be performed directly on encrypted data without the need for decryption, thereby preserving confidentiality throughout data processing. By applying FHE on top of AES-256 encryption, the system can:
- Enable Secure Data Analytics: Perform operations and analytics on sensitive datasets without exposing raw data, ensuring compliance with strict data protection regulations.
- Enhance Privacy: Allow third-party computations (such as AI model training or querying) without compromising the underlying classified information.
- Facilitate Secure Multi-Party Computation: Support scenarios where data from multiple sources must be analyzed jointly while maintaining strict isolation between different clearance levels.
AI-Driven Log Monitoring and Anomaly Detection
Another future enhancement involves leveraging AI to monitor system logs and identify potential misuse. An AI model can be trained to analyze query patterns and flag suspicious activities based on the following criteria:
- Detection of Unauthorized Data Access: Monitor logs for queries that have a high probability of accessing information beyond a user's clearance level.
- Real-Time Alerts for Bad Actors: Automatically flag and alert administrators when abnormal query behaviors or repeated attempts to access restricted data are detected.
- Adaptive Learning: Continuously refine the model using feedback and historical data, improving its accuracy in distinguishing between legitimate queries and potential security breaches.
This proactive approach to log analysis can help administrators swiftly identify and mitigate risks, enhancing overall system security.
Enhanced Encryption Architecture
To further secure data throughout its lifecycle, future work may include a layered encryption strategy:
- AES-256 Encryption: Continue to encrypt all uploaded data with AES-256 to ensure strong baseline security.
- FHE Overlay: Implement FHE on top of AES-256 to enable secure computations and data processing without decryption.
- Key Management Innovations: Explore advanced key management solutions, such as hardware security modules (HSMs) or decentralized key distribution systems, to safeguard encryption keys against unauthorized access.
Additional Future Directions
Beyond encryption and log monitoring, several other improvements could be considered:
- Blockchain-Based Audit Logging: Use blockchain technology to create immutable audit trails for all document transactions and access logs, ensuring transparency and tamper-resistance.
- Federated Learning for User Behavior Analysis: Implement federated learning techniques to analyze user interactions and behavioral patterns across distributed systems, while preserving individual privacy.
- Decentralized Identity Management: Integrate decentralized identity frameworks to enhance user authentication and authorization, reducing reliance on centralized systems.
- Enhanced Container Security: Adopt container security best practices such as runtime monitoring, vulnerability scanning, and automated patch management to further protect the microservices architecture.
- Improved Scalability and Resilience: Continue optimizing system performance through distributed processing, dynamic load balancing, and real-time system health monitoring.
By exploring these future directions, the system can continue to evolve into a more secure, intelligent, and resilient platform capable of handling increasingly sophisticated security challenges in a dynamic digital environment.
Built With
- docker
- indradb
- javascript
- postgresql
- react
- rust
- tailwind
- tauri
Log in or sign up for Devpost to join the conversation.