🌌 Strand OS: The Spatial Mnemonic Engine A 3D Multimodal Knowledge Agent powered by Gemini 1.5 Pro & Multimodal Live. 💡 Inspiration In an era of information explosion, we are drowning in data but starving for wisdom. Traditional note-taking is linear and flat, yet human memory is associative and spatial. Inspired by the "Chiral Network" in Death Stranding and the ancient "Method of Loci" (Memory Palace), we built Strand OS. We asked a simple question: What if your knowledge wasn't a list of files, but a 3D wilderness you could navigate, explore, and inhabit? Strand OS transforms the "Text Box" into a "Living Galaxy," leveraging spatial cognition to rebuild the way we learn. 🚀 What it does Strand OS is a Local-First, Multimodal Cognitive Co-pilot that visualizes complex information as a persistent 3D neural network. Multimodal Live Interaction: Powered by Gemini Multimodal Live, the agent doesn't just "read" text; it "sees" your 3D graph via vision snapshots and "speaks" to you in real-time, guiding your journey through the knowledge desert. Dual-Layer RAG Distillation: Our secret sauce. Instead of indexing messy raw text, Strand uses Gemini to "distill" data into atomic Neural Fragments before they enter the vector store (ChromaDB), ensuring near-perfect retrieval precision. Spatial Navigation & Radar: Concepts are manifested as 3D nodes. AI automatically maps etymological, phonetic, and semantic "bridges" between ideas, creating a visible topology of thought. Gamified SRS Expeditions: Integrated Spaced Repetition (SRS) algorithms turn review sessions into "Intergalactic Missions," ensuring long-term retention through tactical progression. 🛠️ How we built it The AI Core: Google Gemini 1.5 Pro & Flash via Vertex AI for multimodal reasoning, vision analysis, and knowledge distillation. 3D Rendering Pipeline: React 18 and Three.js (@react-three/fiber). We implemented custom Simplex Noise for terrain generation and Raycasting for physics-based node snapping. The Brain (Backend): A high-performance FastAPI server managing a Dual-Database architecture: SQLModel (SQLite) for relational logic and ChromaDB for semantic vector embeddings. Native Shell: Electron for a dedicated desktop experience, with Google Cloud Storage (GCS) integration for secure, cross-device knowledge archiving. 🧠 Challenges we ran into Spatial Physics: Nodes would often "clip" through the 3D terrain. We solved this by developing a Raycaster-buoyancy logic that allows nodes to float dynamically relative to the shifting Simplex Noise topography. Multimodal Latency: Real-time vision/voice loops demand extreme efficiency. We engineered a "Dual-Speed Gear" architecture: instant heuristic-based feedback for UI interactions, while Gemini’s heavy-lifting "Deep Scans" run as asynchronous background tasks. Semantic Disambiguation: Standard RAG often suffers from "hallucinated connections." We refined our pipeline by injecting Jieba-based keyword filtering and language-specific metadata to ensure every "Neural Link" is logically sound. 🏆 Accomplishments that we're proud of Proprietary Distillation Paradigm: Moving beyond standard chunking to a "Distilled RAG" model has significantly reduced LLM hallucinations in our testing. 1:1 Data Synergy: Achieved robust, strong consistency between our Relational Schema and Vector Store via a unique embedding_id binding system. The UX of Memory: Creating an interface that feels less like software and more like an extension of the user's mind—a true "Cognitive OS." 🎓 What we learned Spatial UI Boundaries: We discovered that "LOD" (Level of Detail) isn't just for graphics; it’s for cognition. We learned how to prune visual noise to keep the user’s focus on core concepts. Agency over Interaction: Transitioning from a "Chatbot" to an "Agent" requires a fundamental shift in how state is managed between the AI and the 3D environment. 🗺️ What's next for Strand OS Semantic Nebula (Macro Clustering): Using DBSCAN algorithms to group thousands of nodes into massive, high-level "Knowledge Clouds." Collaborative Chiral Networks: Allowing multiple users to bridge their 3D galaxies for collective brainstorming and shared learning. Visual Evolution: Tying the 3D environment’s shaders and lighting to the user’s "Mastery Level"—as you learn, your world literally becomes brighter and more detailed.

Built With

  • chromadb-(vector-store)-platforms-&-tools:-electron-(desktop-packaging)
  • firebase-hosting-databases:-sqlmodel-(sqlite)
  • google-cloud-storage-(gcs)
  • javascript-frameworks-&-libraries:-fastapi-(backend)
  • languages:-python-3.11
  • react-18-(frontend)
  • sentence-transformers-(embedding)-cloud-services:-google-cloud-run
  • tailwind-css-ai/llm:-gemini-1.5-pro-(via-google-genai-sdk)
  • three.js-(3d-engine)
  • typescript
  • vite
  • zustand-(state)
Share this project:

Updates