Echo: Everyday Elsewhere

UI Overview
Dual Agent Pipeline
When User's Scan Fails the Verification

Overview

Echo: Everyday Elsewhere is a narrative-driven AI companion app, which helps users to rediscover their physical surroundings through interactive stories and visual transformations.

The app simulates a physical device connected to Echo, a traveler from the year 2500. By scanning real-world objects, the user helps Echo stabilize their timeline, receiving "Temporal Postcards" (AI-generated images based on the user’s scan with personalized messages) that reveal how their city evolved in an alternative timeline.

Unlike traditional AI companions that rely on text-heavy dialogue, Echo uses an asymmetric interaction design. The user's primary interaction is scanning, reducing cognitive load while deepening the emotional bond. The user isn't just "chatting"; they are providing vital "sensor data" to help a friend in another timeline.

Gemini 3 Integration

Echo is powered by a dual-agent Gemini 3 pipeline.

In the core Scanning operation, when the user scans an object, a Director Agent (gemini-3-flash-preview) analyzes the image with Multimodal Reasoning to (1) interpret what’s present, (2) verify mission completion with a short reason, and (3) emit structured output (a JSON payload containing verification result, possible failure reason, input image description, and a prompt for image transformation), all in one call. For geo-aware missions, the Director additionally uses Google Search Grounding to identify likely landmarks/locations and attach brief historical/cultural context, blending the real-world information into the narrative.

Next, the scan becomes a “Temporal Postcard” via gemini-3-pro-image-preview (image-and-text-to-image), preserving structural consistency while applying the destination timeline’s visual style.

Finally, an Actor agent Echo (gemini-3-flash-preview) generates the conversational response using full chat history, the image descriptions, plus Thought Signatures that carry forward emotional state and key story beats to keep Echo consistent across turns.

With Gemini 3, Echo reliably validate image-based missions, branch story from real-world visual context, ground Tourist mode in real places, and generate consistent postcards and in-character narration.

Impact

Echo turns the camera into a lightweight discovery engine—starting with a focused wedge and expanding into adjacent markets.

Wedge: Urban newcomers. People who relocate often experience “urban disconnection”: they traverse the same streets daily without feeling connected to their neighborhood. Echo turns commutes into micro-adventures through scan-based missions and collectible “Temporal Postcards,” helping a city feel familiar and like a home.
Expansion: Travel & tourism. Every traveler is temporarily “new to a city.” The same scan-and-transform loop becomes a richer alternative to standard travel photography: grounded context (via Search Grounding) + a unique, shareable artifact for each location.
Platform: Education & entertainment. The core mechanic, scan a real object → receive a grounded narrative + transformed visual, generalizes to classroom scavenger hunts, city-scale puzzle/escape experiences, and cultural/museum installations.

Story Settings

Echo is a time-traveler from a solarpunk utopia in the year 2500, where humanity and nature exist in perfect harmony. Echo navigates shifting timelines, but they always need a "quantum anchor" in a different timeline to stabilize their jumps.

As an operator in 2026, the user uses their mobile device to assist.

The Scan: Echo requests a specific object. When the user scans it in their timeline, it creates Timeline Resonance.
The Bridge: This resonance opens a quantum tunnel, allowing user's low-end device to sync with Echo’s high-end terminal.
The Reward: In exchange for the sync, Echo sends back a "Postcard"—a visual transformation of the user's into Echo's timeline with text messages.

Presets

Echo offers four distinct interaction modes. Each leverages different aspects of Gemini 3's generative capabilities.

To switch presets, use the key combo UP+UP+DOWN+DOWN+ACTION.

Preset	Narrative	Target	Geo-Aware	Core Value
Story: The Time Glitch	Scripted	Fixed Objects	No	Narrative Immersion
Micro-adventure Companion	Dynamic	Outdoor Objects	No	Exploration
Tourist Companion	Geo-Aware	Landmarks/Scenes	Yes	Grounded Discovery
Temporal Scavenger Hunt	Dynamic	Theme-based	No	Creative Challenge

Story: the Time Glitch: The Narrative Hook. A scripted introduction to Echo's story. Echo is stranded; the user is the anchor. Best for demonstrating Echo's character consistency.
Micro-adventure Companion: The Discovery Engine. Uses Gemini 3’s Multimodal reasoning to turn any outdoor object into a narrative branch. The timeline Echo visits next is determined by the visual context of the user's scan.
Tourist Companion: The Reality Bender. Integrates Google Search Grounding and Geolocation. Gemini identifies the location of user's scan, retrieves historical/cultural facts, and shows that exact spot in 2500.
Temporal Scavenger Hunt: The Creative Stress-Test. An experimental mode where the requested objects are determined by the timeline (e.g., "Find an orange object when Echo is in a world of desert").

Technical Architecture

The app is built with Google AI Studio. It is a backend-less, mobile-first web app: it runs entirely in the browser and talks directly to Gemini 3 without custom server required.

Permissions

The app requests the following permissions:

Camera: used to capture images as "scans" for Echo.
Geolocation: used to infer the location of the user, to improve the accuracy of the location identification for geo-aware missions.

UI Components

Echo’s interface is designed to feel like a retro handheld device a few tactile controls.

Screen: The main display for Echo’s messages, the live camera preview during scans, and the Postcard viewer.
Oscilloscope: A lightweight “system status” visualizer (e.g., sync/connection state, pulse activity, and image generation progress) rendered as waveforms.
Mode Buttons: Switch between the three core modes.
- Message read Echo’s messages
- Scan capture the target object
- Archive browse saved Temporal Postcards
Action Button: Context-sensitive “action” button
- Message Mode send a Pulse / advance the interaction
- Scan Mode capture the scan
- Archive Mode open/close a postcard, or confirm postcard selection
D-Pad: Navigation control
- Scroll messages / move selection
- Zoom in Scan Mode

Story Structures

A "Preset" contains "Stages" (timelines), and each "Stage" contains "Missions".

Mission types:

CUTSCENE: Message-only narrative beats.
SCAN: Standard object verification missions.
SCAN_LOCATION: Geo-aware missions triggering location identification and Search grounding.

Gemini 3 Features

The dual-agent architecture is powered by the Gemini 3, utilizing advanced features to bridge the gap between timelines:

Multimodal Reasoning: Gemini 3 analyzes 2026 scans to determine if the object fulfills a mission and how it looks like in Echo's timeline.
Thought Signature: The Thought Signature ensures Echo’s emotional state remains consistent across turns.
Thinking Level: Thinking level is set to LOW/MEDIUM/HIGH on different generation tasks based on the task complexity, balancing the generation quality and latency.
Google Search Grounding: Powering Echo to identify real-world landmarks and retrieve historical/cultural facts.
Nano-Banana Pro: Performing Text-and-Image-to-Image operation, maintaining structural consistency while applying distinct aesthetic.

Dual-agent workflow

The system divides labor between two specialized agents to maintain high performance and narrative depth:

The Director Agent

The Director Agent serves as "world-builder" and the "engine". Its output is compiled to the prompt for the Actor Agent. It performs the following tasks:

Timeline Creation (gemini-3-flash-preview): Determines the destination (city, year, aesthetics) using history messages and scans as the context.
World Creation (gemini-3-flash-preview): Generates a detailed description of the world with distinct visual style.
Image Analysis (gemini-3-flash-preview): Verifies the user's scan and generates detailed description of the scan and guidance for the image generator. In a geo-aware mission, it performs deeper analysis on high-resolution to identify the location and search for historical facts. Normal scans use MEDIUM thinkg level while geo-aware scans use HIGH thinking level.
Image Generation (gemini-3-pro-image-preview): Transforms the user’s scan into a structurally consistent image in Echo's timeline.

The Actor Agent (Echo)

The Actor Agent plays the role of Echo. It uses gemini-3-flash-preview with medium thinking level, keeping full message history for personality and narrative consistency, with instructions from the Director Agent and the Story Script. It uses Thought Signatures to remember specific emotional beats and visual details across turns.

Inspiration

The project is inspired by my own experience as an immigrant in Finland. Suffering from "urban disconnection" and tough winters, I wanted to find a solution to reconnect with my city, starting from the neighborhood I pass everyday.

Learnings

I learned that a single, carefully structured Gemini multimodal call can handle mission verification, scan interpretation, and generating a constrained stylization brief—reducing orchestration overhead and keeping the experience fast. I also learned that reliability comes from enforcing structure (clear pass/fail outputs, bounded fields, and deterministic formatting). Finally, building in AI Studio made iteration extremely fast for a backend-less web app, and tools like Stitch (UI iteration) and Veo (demo B-roll) helped polish presentation

Implementation

The app was built and iterated in AI Studio. The architecture, storylines and key prompts are manually designed/engineered. I also needed to do intensive testing and debugging to ensure the product aligns with my vision.

The UI was designed with Nano-Banana and Stitch.

Limitations & Challenges

Quota & Rate Limit Management: Rapid iteration of image generation and geo-aware scans frequently hit free-tier thresholds, requiring aggressive optimization of API calls and careful tuning of Gemini’s thinking levels to maximize efficiency.
Backend-less Debugging & Observability: Operating without server-side logs necessitated a robust client-side observability framework, using structured console grouping and response inspection to reproduce and resolve failures reliably.
Temporal Latency & Round-trip Handshakes: The multi-stage "Unified Probe" and image generation pipeline creates a processing bottleneck; while the UI design mask this delay, true real-time resonance is limited by current API speeds.
Structural Fidelity in Visual Translation: Maintaining strict structural consistency between 2026 photos and an alternative timeline remains a challenge even with prompt enforcement.
Spatial Grounding & GPS Drift: In urban environments without clear visual hints, GPS inaccuracy can lead to minor "signal mismatches" where the identified landmark shifts from the user's actual target.
Preset Management & State Persistence: Switching presets during active Gemini calls is not yet fully gracefully handled, and the current architecture clears message history and postcards upon entering a new mission state.
Simplification of Storyboard Architecture: The current mission structure is technically dense, requiring further abstraction to allow non-technical users to easily create and upload their own custom storylines.
Narrative-Integrated Error Handling: While technical errors are caught, they need further refinement to be delivered through Echo’s persona.

Credits

Demo Video Music: Soft Ambient Background Music - HitsLab / Instant Crush - Corbyn Kites
UI Design: Nano-Banana, Google Stitch
AI Video Generation in Demo: Veo-3
Development Platform: Google AI Studio

Built With

gemini-3
google-ai-studio
react
typescript

Updates

Jingzhi Ye started this project — Feb 07, 2026 01:28 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.