Memento

Inspiration

We were inspired by a future where wearable cameras like Meta Ray-Bans, GoPros, VR headsets, and other always-available capture devices can record the world from your point of view. These devices can already collect pieces of your day, but most of that footage disappears into a camera roll where it is hard to search, revisit, or understand.

Memento started from the idea of an occipital layer for your life: a visual memory system that indexes your day as it happens. Instead of treating videos as flat files, Memento turns them into spatial, searchable, living memories that you can walk through, ask questions about, and build on over time.

What it does

Memento turns raw first-person or everyday video into an interactive 3D memory. A user uploads video from a phone, Meta Ray-Bans, GoPro, VR headset, or any other camera, and the system extracts frames, reconstructs the scene as a Gaussian splat, identifies key moments, and places those moments as interactive orbs inside the 3D environment. Each orb connects to analysis like timestamps, labels, summaries, tags, thumbnails, transcripts, and related context.

Memento also stores these moments in a persistent personal memory graph backed by Backboard. Each captured moment becomes a structured node with temporal metadata, spatial coordinates, Cloudinary asset URLs, semantic summaries, and links to related memories. Users can then query those memories over time and receive daily recaps or answers through Pingram-powered SMS and emails.

How we built it

We built Memento as a full media-to-memory pipeline. The backend uses FastAPI to manage uploads, sessions, pipeline state, and generated artifacts. The frontend uses React to provide a control surface for uploading video, inspecting each processing stage, previewing frames, and browsing the spatial memory.

For 3D reconstruction, we extract video frames with ffmpeg, estimate camera poses with COLMAP, optionally use MASt3R-SfM for stronger visual matching and reconstruction support, and generate Gaussian splats with OpenSplat. We then map important timestamps back to camera positions so key moments can become spatial orbs inside the reconstructed scene.

For video understanding, we use Twelve Labs to analyze uploaded videos, identify important moments, generate timestamped labels and summaries, and enrich each memory with semantic context.

Cloudinary is the media backbone of the project. We use it to store and serve the heavy visual artifacts that make each memory useful: the generated splat files, frame thumbnails, extracted moment stills, and short video clips tied to each memory node. Instead of keeping assets trapped on a local machine, Cloudinary gives every reconstructed scene and moment a durable URL that can be attached directly to graph data, rendered in the frontend, shared across devices, and reused by downstream agents. This makes Memento feel like a real media product rather than a local demo folder.

Backboard is the structured memory layer. We use it to model each day, session, and moment as graph-shaped data instead of disconnected JSON blobs. A moment node can link to its source session, neighboring moments, Cloudinary assets, semantic tags, timestamps, spatial positions, and future user annotations. That graph structure is what lets Memento compound over time: today's memories are not isolated outputs, they become part of a growing personal knowledge system that can be queried, summarized, and connected across days.

Pingram is the interaction layer that makes the memory graph immediately useful. Instead of forcing users to open a dashboard, we use Pingram to expose Memento through SMS. The goal is that a user can text questions like "What did I see at the museum yesterday?" or "Remind me what happened after dinner" and get a response synthesized from their Backboard memory graph with links to Cloudinary-hosted media. Pingram also supports the outbound side of the experience: daily recaps, important moments, and memory reminders can be pushed to the user in the same place they already receive messages.

Together, Cloudinary, Backboard, and Pingram form the core product loop: Cloudinary stores the visual proof, Backboard organizes it into long-term memory, and Pingram turns that memory into a conversational interface.

Challenges we ran into

The hardest part was connecting very different systems into one coherent flow: wearable-style video capture, frame extraction, COLMAP reconstruction, optional MASt3R-SfM reconstruction, OpenSplat output, Twelve Labs moment detection, Cloudinary asset storage, Backboard graph syncing, and Pingram SMS all have different data shapes and failure modes.

A major challenge was designing the handoff between media, graph data, and messaging. Cloudinary assets need stable identifiers and URLs. Backboard nodes need enough structure to preserve provenance, timestamps, spatial coordinates, and relationships. Pingram messages need to be short, useful, and grounded in the graph instead of feeling like generic chatbot responses. Getting those boundaries right was just as important as the 3D reconstruction itself.

We also had to work around hardware constraints. Instead of relying on CUDA-heavy tooling, we designed the pipeline around tools that could run locally, including OpenSplat, COLMAP, and MASt3R-SfM in CPU-compatible setups.

Accomplishments that we're proud of

We are proud that Memento is more than a 3D viewer. It connects wearable and first-person video, Twelve Labs video understanding, COLMAP and MASt3R-SfM reconstruction, OpenSplat Gaussian splats, Cloudinary-hosted media, Backboard memory graphs, and Pingram SMS into one system where each moment becomes both something you can see and something you can query later.

We are especially proud of the Cloudinary, Backboard, and Pingram integration because it turns a technical pipeline into an actual user experience. Cloudinary makes the media portable and production-ready. Backboard gives the memories durable structure and relationships. Pingram makes the whole system accessible through a simple text message.

We are also proud of building toward a real occipital layer: a system that can take continuous visual capture from devices like Meta Ray-Bans, GoPros, or VR headsets and turn it into indexed, spatial memories instead of forgotten footage.

What we learned

We learned that memories need both place and meaning. A 3D scene makes a memory feel immersive, but the knowledge graph makes it useful over time. COLMAP, MASt3R-SfM, and OpenSplat gave us the spatial structure, Twelve Labs gave us semantic understanding, Cloudinary gave us durable media infrastructure, Backboard gave us a graph-shaped memory model, and Pingram helped us turn that graph into a conversational product.

We also learned that the most valuable part of a memory system is the connection between layers. A Cloudinary URL by itself is just media. A Backboard node by itself is just data. A Pingram message by itself is just text. But when they are connected, a user can ask a question, retrieve a structured memory, and receive the exact visual context that supports the answer.

Most importantly, we learned that personal memory tools should not just store data. They should help people rediscover context.

What's next for Memento

Next, we want to complete the full Pingram SMS loop so users can text Memento questions like "What happened at the event yesterday?" and get answers from their Backboard memory graph with Cloudinary links to the relevant clips, frames, or splat scenes.

We also want to deepen the Backboard graph model so memories can connect across days, people, places, recurring events, and user annotations. On the Cloudinary side, we want to improve asset transformations, previews, and delivery so each memory has optimized thumbnails, clips, and shareable scene links. On the Pingram side, we want to support proactive daily recaps, follow-up questions, and memory reminders.

Longer term, Memento could become an always-on memory layer for wearable cameras and spatial devices: a searchable, spatial, AI-assisted archive that indexes your day as it happens and grows with you.