Inspiration
Travel memories often live only in photos. We wanted a way to step back into those moments instead of just viewing them on a screen. Inspired by the emotional storytelling of anime worlds, we imagined turning real travel photos into immersive spaces people can walk through again.
CityWalk was created to transform personal memories into explorable virtual worlds.
What it does
CityWalk converts travel photos into an immersive anime-style VR environment.
Users upload photos and optionally a travel journal. Our system reconstructs the location into a stylized 3D world that can be explored in first-person using a PICO VR headset. AI-generated music and narration enhance the atmosphere, allowing users to revisit meaningful places and share the experience with friends.
How we built it
We built CityWalk using a pipeline that combines multiple AI and immersive technologies:
- Gemini for scene understanding and extracting spatial information from photos
- 360° panorama generation to reconstruct the environment
- Marble Gaussian Splatting for real-time 3D rendering of stylized worlds
- ElevenLabs for AI-generated music and narration
- PICO VR for immersive exploration and interaction
The system transforms a set of photos into a navigable world representation.
Challenges we ran into
A major challenge was reconstructing a coherent 3D environment from limited user photos. Images often vary in angle, lighting, and coverage.
Another difficulty was balancing stylization and realism — ensuring the world looks anime-inspired while still preserving recognizable structures from the original location.
Integrating multiple AI systems into a smooth pipeline within the limited hackathon time was also challenging.
Accomplishments that we're proud of
We successfully built an end-to-end system that turns personal photos into immersive VR experiences.
We're especially proud of combining world modeling, Gaussian splatting, generative audio, and VR interaction into a single creative application that makes memories feel alive.
What we learned
We learned how powerful multimodal AI systems can be when combining vision, audio, and spatial reconstruction.
We also gained experience integrating generative AI with real-time VR rendering, and learned how important pipeline coordination is when connecting multiple AI models.
What's next for CityWalk
Next, we want to improve scene reconstruction so larger environments can be generated from fewer photos.
We also plan to add multiplayer exploration, more artistic styles, and better memory storytelling features to turn CityWalk into a platform for creating and sharing immersive memories.
Built With
- express.js
- gaussian
- gemini
- javascript
- multer
- musicgen
- node.js
- openai-api
- pico-4
- replicate
- sharp
- sparkjs
- splat
- three.js
- vercel
- vite
- web-audio-api
- webxr
- world-labs-marble-api
Log in or sign up for Devpost to join the conversation.