Inspiration
We've all scrolled back to a photo from months ago — a building, a sign, a mural — and felt the story behind it slipping away. The image survived, but the context didn't. We wanted a tool that lets you point at the thing that mattered and pull the thread back, using the when and where already embedded in your photos.
What it does
Memory Vault lets you import photos, draw bounding boxes to segment objects directly in the browser using SlimSAM, and ask Claude questions about each segment. The model receives the cropped object alongside the photo's date, location, and camera metadata, so answers stay grounded in your actual trip — not generic internet guesses. Segments and conversations persist across sessions, and you can save crops to a freeform artboard for arranging, resizing, and sharing.
How we built it
React + TypeScript frontend with Vite, connected to a FastAPI backend for photo import, EXIF extraction, and reverse geocoding. Segmentation runs entirely in-browser via SlimSAM (~22 MB) through Hugging Face Transformers.js, cached for instant reloads. Segments and Q&A are stored client-side in IndexedDB via Dexie.js. Claude is called through OpenRouter with the masked crop encoded as a base64 PNG plus the photo's context. The canvas view uses framer-motion for draggable nodes and SVG bezier connectors linking photos to crops to chat cards.
Challenges we ran into
Getting SlimSAM to run performantly in the browser with SharedArrayBuffer and COOP/COEP headers required careful Vite configuration — @huggingface/transformers breaks Vite's pre-bundling and had to be excluded from optimizeDeps. Coordinating a hybrid architecture where photos live on the server but segments and masks live in IndexedDB meant keeping two data layers in sync without leaking abstractions into the UI. Applying binary mask alpha to crops for clean Claude inputs also took iteration to get right.
Accomplishments that we're proud of
The entire segmentation pipeline — model loading, image encoding, mask decoding, and crop extraction — runs in the browser with no server-side ML. The artboard archives view lets you arrange your saved crops freely, turning analysis into something you'd actually want to keep. And the conversation feels genuinely useful: Claude's answers are anchored to what you segmented and when you were there, not hallucinated context.
What we learned
Browser-based ML is surprisingly viable at this scale — SlimSAM at 22 MB is a fraction of SAM 2's 163 MB and produces indistinguishable results for landmark-level objects. We also learned that grounding an LLM with real metadata (date, GPS coordinates, camera model) dramatically improves answer quality over just sending a raw image. Strict service-layer separation (UI never touches the DB or API directly) saved us from a class of bugs that usually plague hackathon code.
Built With
- bigdatacloud-reverse-geocoding
- dexie-(indexeddb)
- exif-via-exifr-/-server-side-metadata
- fastapi
- framer-motion
- httpx
- openrouter-api
- pillow
- python
- react
- slimsam-(xenova/slimsam-77-uniform)
- transformers.js
- typescript
- uvicorn
- vite
Log in or sign up for Devpost to join the conversation.