Inspiration
What if Pokémon GO was also a social experience that encourages your curiosity? A large part of today's short-form social media involves being glued to your phone, passively scrolling with constantly shifting attention – one of the main causes of the rise in learning difficulties worldwide. GeminiAtlas goes against every aspect of the "doomscroll". It encourages you to go out and explore the world, to be a curious explorer, to connect with people across time and space, and to live and grow through your learning.
What the project does
GeminiAtlas is a Snap Spectacles filter that transforms your everyday environment into a personalized, social, AI-based learning experience. It is built on several pillars: First, I align the sky with you: the Lens Constellations. This is a Snap Lens in augmented reality that overlays constellations directly onto the real stars you see. With Lens Studio, I integrated 3 key controls:
- Location: The user changes country or city. The sky recalculates in real time.
- Compass: The Lens orients with you. You look up North, it displays Ursa Major.
- Voice control: The user says "Orion" or "Cassiopeia". The Lens calls up the constellation and highlights it. The result: The sky is no longer abstract. It becomes an interactive astronomy textbook. Second, I read for you: the Lens Bookworm. Bookworm is a Lens for Snap Spectacles. Its principle is simple: the user frames a book with their hands. With Lens Studio, GeminiAtlas analyzes the cover live. In 2 seconds, it surfaces: author, summary, key themes, and similar books. The text leaves the cover and becomes a reading card you view at a glance. The result: A library becomes a living database. Third, I let you see the Globe. This is the heart of GeminiAtlas. Every map you capture with Constellations or Bookworm has a position, a subject, a memory. I organize them spatially around you. You turn your head: Noguchi in Seattle. Another turn: innovation in Cotonou. Your long-term memory becomes a landscape. The more you capture, the more the experience becomes your best learning partner. Capture and remix! Pinch any element in your environment with two hands, and GeminiAtlas turns it into a short, surprising Curiosity Card based on a random interest of your choice. You can then engage in conversation with the AI assistant about each card to learn more. Explore and share! Maps are geolocated and designed to be shared. Join hands in front of your chest – a symbolic gesture that "discovers the world" – and send a signal that analyzes your position and reveals a universe of maps left by you and others. Relive and remember – A living, evolving mental palace in the palm of your hand, where all your saved maps are stored. AI organizes everything spatially, by subject, relevance, and what it remembers about you. The more you capture, the more the experience understands how you learn. Collect and compete! Take on the challenge and learn together by facing your friends in a quiz game. Questions are randomly generated from the maps you and other players have captured; the player with the most maps and best knowledge wins. An AI host with a quirky sense of humor reads each question out loud and reacts to your answers.
How we built it
GeminiAtlas is a Spectacles lens built with Lens Studio. All our models are processed by Snap's Remote Service Gateway (RSG). The assistant and the battle host interact over Gemini Live; image understanding and battle question generation use OpenAI Vision and GPT-4o. From an architecture perspective, the interface consists of many subsystems coordinated by a set of global.* singletons, allowing prefabs to reach a shared state. Each conversational agent connects to Gemini Live over RSG, streaming speech in 24 kHz PCM, with controlled transitions so only one live session is active at a time. The capture pipeline crops two-hand gestures to generate an OpenAI Vision caption. The transition from globe to map is handled by a state machine that mathematically aligns the sphere and the flat map by ground footprint. Battle mode is managed by the host on SpectaclesSyncKit, with questions generated from captured cards (GPT-4o) and broadcast between the two players.
Challenges we faced
Most of the difficulties were about coordination – between sessions, devices, and clocks. The main issue was that RSG only keeps the most recent Gemini session active – two simultaneous Gemini sessions drop the connection – so we had to make our voices explicitly share a single time slot. The shared microphone is destructive and can lock up: a failed start leaves the provider "started but out of service", so we implemented a stop-restart cycle and a watchdog system for recovery. Multiplayer was a real challenge. Because device clocks aren't comparable, the leaderboard had to be based on the server clock. In addition, UIKit buttons misled us: their touch event handlers only fire on launch, so disabling a button at startup makes it permanently unusable.
Achievements we're proud of
We're proud that the assistant looks like a character, not just a chatbot: a strong personality, an expressive sphere that reacts to sound, real-time text captions, all running over the gateway with no on-device intervention. The globe-to-map navigation is exactly as we imagined: a single continuous gesture, from orbit to a street you can walk, with the transition invisible thanks to its design. Map discovery is truly magical: a plane wave in the room and the maps appear magically at every touch. And we shipped a real two-player quiz, latency-tolerant, with an adaptive AI host that knows not to mock a player who's already struggling.
What we learned
The common thread of everything we learned: decouple thought from voice and make the deterministic parts boring. Our line banks, tool declarations, and query logic are fully independent of any model and easily testable; agents only own the live session. We learned to freeze what never changes and only generate what is personal: the 37 fixed Cosmos cards send pre-written questions instantly and for free, while only user-captured cards require an API call. We learned to maintain pure, deterministic placement calculations: each map scatter is seeded by the map ID, so markers never drift between frames, zooms, or sessions.
Next steps for GeminiAtlas
More people, more curiosity. Right now, the globe offers Tokyo, Seattle, and Los Angeles with a universe of 37 maps and 13 initial interests. The next step is to expand the map and library to many more cities and themes, for a rich and varied discovery experience. We want to deepen the social dimension: richer profiles, the ability to follow people whose maps you like, and themed journeys through a neighborhood. We'd like to grow Battle mode beyond two players, to small-group games and themed decks. In the long term, we want GeminiAtlas to become a living, shared atlas of human curiosity: every wall, every dish, and every street corner annotated by someone who found it interesting, waiting for the next passerby. To conclude: GeminiAtlas + Snap Spectacles + Lens Studio = learning by moving, sharing, and playing. The buzzer is done waiting. Ready to start?
Built With
- ar
- google-gemini-api
- google-vision-api-computer-vision
- javascript
- lua
- snap-lens-studio
- snap-spectacles-api-agnes-ai-sdk/api
- web-speech-api

Log in or sign up for Devpost to join the conversation.