Inspiration
Memories as a whole are spatial, and one of the most powerful and nostalgic ways of capturing these memories is through photographs. They are the closest humans get to reliving their experiences, but we wondered if it was possible to get one step closer.
What it does
We came up with a way to relive your memories using Gaussian Splatting to reconstruct a 3D scene from natural language description of the memory, photos of the memory, or a video of the memory. Rem enables you to move inside the 3D space, which is customized by a creative agent to feel as closely as possible to how it felt in the moment.
How we built it
🎨 Frontend: Next.js 16 (App Router) + React 19 + Tailwind, with a three-screen flow (ingest → loading → 3D viewer)
🎙️ Voice input: intuitive browser-native Web Speech API for live transcription — no audio leaves the device
🧠 Agent-based personalization system: Gemini + Pika MCP power a lightweight agent pipeline that transforms raw inputs into a consistent memory representation and guided reconstruction.
We use two main components:
- Structured memory agents (Gemini sub-agents) vision: extracts structured understanding from uploaded photos analyzer: combines past memories + current input into a recurring “world summary” extractor: isolates the relevant slice of that world for the new memory persona: builds a persistent, evolving visual identity across memories
These help maintain consistency across reconstructions.
- Creative + tool-agent layer (Pika MCP) The system also includes a creative agent that generates the cinematic direction for each memory reconstruction. It takes the scene context and persona and translates them into a coherent visual style for the output.
Supporting tools include: fix_look: re-grades video (lighting, palette, mood, clothing, accessories) while preserving geometry and identity music: selects or generates audio to maintain emotional continuity
🌎 3D rendering: Three.js + gsplat for real-time Gaussian Splat rendering, with a key points extracted from SfM using COLMAP → gaussian initialization → rasterization and gradient descent → gaussian densification and pruning.
🔄 Pipeline: user input → personalization → 3D memory reconstruction
💾 Storage: Redis-backed job/scene store with an in-memory fallback, so the whole app degrades gracefully without infra
📈 Observability: OpenTelemetry tracing into Arize AX, plus an LLM-as-judge evaluator with a feedback loop for grading hotspot quality
Challenges we ran into
🎥 Consistent scene generation for 3D Gaussian Splatting (3DGS): Our initial approach relied on fully generated videos from Midjourney. However, 3DGS depends on Structure-from-Motion, which requires smooth camera motion and consistent scene geometry across frames. Generated videos frequently introduced temporal inconsistencies that degraded reconstruction quality.
To address this, we shifted toward real photos and videos while using Pika MCP to apply controlled personalization. This preserved the consistency required for reconstruction while still allowing creative modifications.
💸 Tool costs: We also experimented with Veo3-generated videos, which produced significantly better temporal consistency. Unfortunately, the cost of generating sufficient video data quickly exhausted our available credits. With greater resources, we believe a fully generative memory reconstruction pipeline could become feasible.
⏳ Training times: Training each splat took a significant amount of time (at least 30 minutes), and would sometimes hang for very long if the input had many photos or frames. The led us to spend a lot of effort trying different training inputs, from generated videos to generative mesh view points. In the end, we realized that sampling every other frame in a video could significantly speed up the process with minimal impact on visual quality.
Accomplishments that we're proud of
🎮 UI: We built a Three.js-powered viewer that successfully captures the feeling of stepping back into a memory rather than simply viewing media.
✨ 3DGS Quality: Despite having only a single day to develop and iterate, we achieved surprisingly strong reconstruction quality. We were especially excited to reconstruct a live human subject with limited distortion, since dynamic people are traditionally challenging for Gaussian Splatting yet are central to many memories.
What we learned
We learned how to design systems with long-latency AI pipelines involving video generation, scene reconstruction, and personalization. We also gained a much deeper understanding of 3D Gaussian Splatting, particularly the importance of input consistency and data quality.
Most importantly, we explored how creative agents can personalize experiences rather than simply generate content.
What's next for Rem
We plan on expanding Rem to be able to traverse multiple memories as once by grouping them. For example, if someone went to Florida for vacation, they can upload their photos of the beach, the southernmost point of the continental US, and Disney World separately and then group them to be able to navigate from one scene to another.
Another large area we could go into is increasing shareability of memories. We will likely make Rem a platform where users can share their memories and information hotspots with other users. These other users can add their own memories to make more hotspots, turning it into a multi-layered reconstruction of one scene.
It's like Harry Potter's Pensieve, where multiple memories are being layered and pulled out of one's brain. Like Dumbledore said, "I sometimes find, and I am sure you know the feeling, that I simply have too many thoughts and memories crammed into my mind."
Rem gives those memories a place to live.
Built With
- agents
- anthropic
- arize
- claude
- gaussian-splatting
- gemini
- gsplat
- javascript
- next.js
- node.js
- pika
- python
- react
- redis
- sharp
- supabase
- tailwindcss
- three.js
- typescript
- web-speech-api

Log in or sign up for Devpost to join the conversation.