Inspiration
We all know the pain of a red "Disk Space Full" warning. When you try to clean it up, the default OS tools are inadequate, and third-party "cleaners" are often bloated black boxes that secretly delete things they shouldn't. We wanted to build a transparent, lightning-fast, and surgically precise system decluttering tool. A tool that doesn't just guess what to delete, but uses mathematical certainty and OS-level intelligence to reclaim your hard drive safely.
What it does
DeepDrive is a professional-grade, 3-tier system decluttering utility:
- Visual Disk Mapper: An interactive Plotly sunburst chart that recursively maps your directory tree, letting you drill down into massive folders to see exactly what is eating your space.
- Smart Deduplicator: A cryptographic engine that finds identical files scattered across your drive. It groups duplicates, calculates the total wasted space, and lets you bulk-delete the clones while keeping the originals.
- Windows Deep Clean: A custom OS-level sweeper. It features a "Ghost Hunter" that detects orphaned AppData folders left behind by uninstalled software, alongside an aggressive Temp Cache flusher.
How we built it
We architected DeepDrive with a decoupled backend/frontend model.
- The Backend: A high-performance Python
FastAPIserver. Instead of relying on heavy third-party packages, we leveraged Python's standard libraries (os,hashlib,winreg,difflib) to interact directly with the operating system at a low level. - The Frontend: We opted for Vanilla JavaScript, CSS3, and HTML5. Avoiding heavy frontend frameworks allowed us to heavily optimize DOM rendering and keep the UI blazing fast.
- The Math: To make the Deduplicator incredibly fast, we built a 3-Stage Mathematical Funnel:
- $O(1)$ Size Grouping: We instantly eliminate unique files by grouping them by their exact byte size.
- 4KB Fast-Hash: We read only the first 4,096 bytes of surviving files and generate an MD5 hash, leveraging the avalanche effect to drop non-matches instantly.
- Full Cryptographic Hash: We stream remaining files in 64KB chunks to generate a full SHA-256 signature. Because the SHA-256 algorithm has a collision resistance of $2^{256}$ (roughly $1.15 \times 10^{77}$ combinations), we achieve 100% mathematical certainty that the files are exact clones.
Challenges we ran into
- The Browser Freezing: Initially, when our backend found thousands of duplicates, iterating over the JSON and individually appending elements to the DOM caused the browser's main thread to lock up. We solved this by implementing Batch String Rendering—compiling the entire UI payload into a single massive HTML string in the background and injecting it into the DOM exactly once.
- The AppData Ghost Dilemma: We wanted to find folders left behind by uninstalled software, but AppData folders are rarely named exactly like their registry entries (e.g., the registry says "Discord PTB", but the folder is "discord"). We solved this by building a bi-directional fuzzy string matcher using the Ratcliff/Obershelp algorithm to calculate similarity ratios, successfully bridging the gap between Registry keys and filesystem names.
- The NTFS Bottleneck: Windows is notoriously slow at directory traversal on NTFS drives. We optimized our crawler to completely ignore files smaller than 1MB. This sped up scan times by 10x, because tracking thousands of 4KB cache files doesn't free up meaningful space anyway.
Accomplishments that we're proud of
We are incredibly proud of our Safety Architecture. Building a tool that deletes files is inherently dangerous. We implemented dynamic OS detection to permanently block scanning of critical paths (like C:\Windows or Linux /boot). We also built hardcoded whitelists into the Ghost Hunter to ensure core Microsoft packages are never touched, even if fuzzy matching flags them.
What we learned
We gained a massive appreciation for low-level OS architecture. We learned how to programmatically traverse the Windows Registry, how to handle the WOW6432Node for 32-bit vs 64-bit software architecture, and how to safely catch PermissionError exceptions when the OS locks files that are currently in use.
What's next for DeepDrive
- The "Time Machine" Stale File Detector: Flagging massive files that haven't been accessed or modified in over 365 days.
- Top 50 Space Hogs: A radar feature to instantly flag the largest single files on the drive.
- Recycle Bin API Integration: Routing deleted files directly to the OS Recycle Bin (
send2trash) instead of permanent vaporization for an ultimate "Undo" failsafe.
Built With
- fastapi
- python


Log in or sign up for Devpost to join the conversation.