Inspiration

We live in a world drowning in files. My own downloads folder had more than 1,500 files when I started this project. Existing tools organize files by name only, or rely on generic AI agents that quickly lose consistency in large folder structures. We wanted something smarter: a system that organizes files the way humans naturally do.

What it does

Solar Flow is an AI-powered file organization system that solves three core problems:

  1. Deep File Structure Understanding
  2. Problem: Current tools rely on file names and fail with complex folder structures
  3. Our Solution: A tree-based approach with a custom database powered by embeddings
  4. Innovation: A lightweight indexing agent that summarizes files and generates embeddings locally, running efficiently in the background

  5. Intelligent Content Search

  6. Problem: Users re-download files because they can’t recall names or keywords

  7. Our Solution: Lightweight vector + keyword search, enabling discovery by both content and metadata

  8. Benefit: Find files by what’s inside them, not just what they’re called

  9. Human-like Organization Behavior

  10. Problem: Tools don’t adapt to personal workflows

  11. Our Solution: An organizer agent trained to mimic user behavior and suggest personalized structures

  12. Result: Organization that feels intuitive and natural

How we built it

  • Architecture: Async pipeline (File → Parser → Summarization → Embeddings → Database → Search)
  • Backend: Python (Quart, SQLite + vector extensions)
  • Frontend: React 19 + Vite + Tailwind CSS with real-time progress tracking
  • AI Integration: Supports both local (Gemma3:4b, Ollama) and cloud models (OpenAI GPT-4, GPT-5, GPT OSS), plus Whisper for audio/video transcription
  • File Support: 30+ formats using Spotlight (macOS) with MarkItDown fallback
  • Search: Hybrid semantic + keyword scoring algorithm

Challenges we ran into

  • Time: As UW–Madison undergrads balancing research, startups, and internships, time was tight. AI tools like ChatGPT and Claude helped us accelerate brainstorming and coding.
  • Learning Curve: We were beginners in agent design and database architecture, so we spent significant time learning fundamentals.
  • Resources: No high-end GPUs. We optimized by using Mac hardware acceleration and carefully selecting efficient models.
  • Consistency: Ensuring agents behaved reliably required careful prompt engineering and rich context.
  • Compatibility: Supporting multiple embedding models forced us to design flexible, interconnected database tables.
  • Storage vs. Accuracy: Balancing speed, storage, and precision in search was a constant tradeoff.

Accomplishments that we're proud of

  • Working Prototype: A fully functional system built under time and resource constraints
  • Research Potential: A foundation to publish on LLM performance in file organization tasks
  • User Experience: Enhanced usability with AI-generated folder icons for visual navigation
  • Experimentation: Learned how to design agents, structure embeddings, and optimize pipelines in practice

What we learned

  • How to design consistent, context-aware AI agents
  • Building async pipelines with real-time progress tracking
  • Designing flexible vector databases that support multiple models
  • Integrating multi-modal AI for text, audio, and image content
  • Balancing performance, accuracy, and resource constraints

What's next for Solar Flow -- A file organization master

  • Native GUI: A SwiftUI-based macOS app
  • Model Fine-tuning: Optimizing Gemma models for indexing and OpenAI OSS models for organization
  • Codebase Rewrite: Improving maintainability and scalability
  • More File Types: Extending to specialized formats
  • Advanced Features: Folder watching, automated workflows, and batch operations
  • Academic Publication: Documenting our findings for the research community

Built With

Share this project:

Updates