Inspiration

Dashcams are everywhere, but they suffer from a critical limitation: they are flat. When reviewing footage whether for accident reconstruction, preserving a scenic drive, or training autonomous systems you are locked into a single, static 2D perspective. You can’t look around a blind corner, see how close a car actually was, or view the road from a truck’s height versus a go-kart’s. We wanted to break the frame. We asked: What if you could step inside the video and drive it again?

What it does

Memory Lane is a full-stack spatial computing pipeline that transforms standard 2D video (MP4) into a 3D Gaussian Splat scene.

Upload: Users drag and drop raw dashcam footage.

Reconstruction: Our cloud pipeline processes the video using COLMAP (Structure from Motion) to calculate camera poses and trains a Gaussian Splat model on an NVIDIA H100 GPU.

Analysis: It runs YOLOv8 object detection to tag key elements (cars, stop signs, trucks) and map them to the 3D timeline.

Experience: The user receives a link to a web viewer where they can:

Free Roam: Fly around the scene like a drone.

Ghost Drive: Re-drive the original path with a smoothed, physics-interpolated camera.

Shift Perspective: Instantly toggle between vehicle heights (e.g., see the road from a Truck vs. a Sedan).

Jump to Events: Click "Stop Sign" to teleport instantly to that moment in 3D space.

How we built it

Frontend: Built with React and Vite. We used Three.js and @mkkellogg/gaussian-splats-3d for the rendering engine. We implemented a custom camera controller that interpolates the original COLMAP trajectory to create a "rail-shooter" style driving mechanic.

Backend: Powered by Modal. We orchestrated a serverless pipeline that spins up different GPU instances for different tasks: an H100 for the heavy Splatfacto training (Nerfstudio) and a T4 for the YOLOv8 object detection.

Data Pipeline: We wrote custom Python scripts to convert coordinate systems between OpenCV (computer vision standard) and Three.js (web graphics standard), ensuring the 3D world didn't appear upside down or mirrored.

Challenges we ran into

The 2GB Memory Wall: Browsers have strict memory limits for WebAssembly. Our 30-second scenes (100MB+ splat files) were crashing the viewer instantly. We had to implement Cross-Origin Isolation (COOP/COEP headers) to unlock SharedArrayBuffer, allowing the browser to multi-thread the sorting of millions of Gaussians without crashing.

Coordinate Hell: Nerfstudio exports scenes with Y-down (camera coordinates), while the web uses Y-up. For a while, our cars were driving in the sky upside down. We had to implement a matrix transformation step in the backend to normalize the trajectory data before it hit the frontend.

Performance vs. Quality: Training a high-quality splat takes time. We optimized the pipeline to use a high-powered H100 for a short burst (training) and cheaper T4s for inference (detection), balancing cost and speed.

Accomplishments that we're proud of

Seamless "Ghost Mode": We managed to make the driving feel smooth, not jittery. By interpolating the camera path and calculating the tangent vectors, we created a "virtual steering" mechanic that lets you change lanes in a pre-recorded video.

Browser-Based Rendering: Getting a 100MB+ Gaussian Splat scene to run at 60fps in a Chrome tab is a massive technical win.

Instant Perspective Switching: Being able to toggle from "Go-Kart" height to "Truck" height instantly changes how you perceive the speed and danger of the road.

What we learned

Gaussian Splatting is the future of video. It bridges the gap between video and 3D geometry without the heavy polygon count of traditional meshes.

Browser Security is complex. We learned a lot about HTTP headers (Cross-Origin-Embedder-Policy) and how modern browsers sandbox heavy computation.

Cloud Orchestration: Using Modal to chain disparate tools (ffmpeg, COLMAP, Nerfstudio, YOLO) into a single synchronous flow was a masterclass in backend architecture.

What's next for Memory Lane

Lidar Integration: Combining visual splats with depth data for millimeter-accurate measurements.

Semantic Segmentation: Instead of just bounding boxes, we want to color-code the 3D cloud itself (e.g., make the road surface red and sidewalks blue).

VR Support: Since it's already in Three.js, adding WebXR support to let users "sit" in the driver's seat in Virtual Reality is our next logical step.

Built With

Share this project:

Updates