Inspiration

The world is full of stories waiting to be told — in every building, city block, and landmark. But the knowledge is buried in Wikipedia pages and static text. We wanted discovery to feel like cinema. What if your phone could narrate the world around you, like a personal BBC documentary crew in your pocket? That's LORE: point, speak, and watch history come alive.


What it does

LORE transforms any location, question, or moment into a real-time AI-generated documentary through four distinct modes:

  • LoreMode — Point your camera and speak a question simultaneously. Gemini fuses your live visual surroundings with your spoken intent, unlocking alternate history scenarios grounded in your real backdrop.

  • VoiceMode — Speak any topic and receive a fully interleaved documentary: AI narration, generated illustrations, and Veo 3.1 cinematic video clips flowing together. Interrupt and follow up at any time.

  • SightMode — Point your camera at any monument or building. LORE recognises it via Gemini Live vision and GPS context, then streams a narrated documentary with documentary-style illustrations in real time.

  • GPS Walking Tour — Walk through your city. LORE tracks your position, auto-triggers narrations as you approach landmarks, and shares the live map with Gemini every few seconds so it can see your route and surroundings. Includes full Google Directions navigation.


How we built it

The architecture is a Flutter mobile app speaking WebSocket to three Cloud Run microservices:

  • Gemini Live Proxy (port 8090) — a transparent bidirectional WebSocket proxy to the Gemini Live API on Vertex AI. Handles ADC token refresh and routes Gemini tool calls to the downstream services. All four modes run through this single proxy.

  • Nano Illustrator (port 8091) — HTTP service generating documentary-style illustrations via Gemini 3.1 Flash Image Preview.

  • Veo Generator (port 8092) — HTTP service for async cinematic video generation via Veo 3.1, polled until completion.

The Flutter app streams live camera frames (1fps JPEG), PCM microphone audio, and GPS coordinates simultaneously over WebSocket. Gemini Live handles all three inputs in one session — vision, voice, and location text context — and fires generate_image and generate_video tool calls that the proxy routes to the appropriate service. Results stream back interleaved: audio narration, images, and video woven into a single documentary experience.


Challenges we ran into

  • Three streams, one session — Getting Gemini Live to cleanly handle simultaneous camera frames, PCM audio, and injected GPS text turns without confusing the model required careful message sequencing and rate control.

  • Async video orchestration — Veo 3.1 takes 30–60 seconds. We had to design a pattern where narration and illustrations fill the experience while video generates in the background, then slots in at natural breaks.

  • Interleaved streaming UI — Rendering live audio playback, chat transcript, streaming images, and inline video players simultaneously in Flutter without jank required careful state management with Riverpod.

  • GPS spatial grounding — The GPS Walking Tour silently injects [GPS: lat, lng] text turns into the Gemini Live session so narration stays spatially accurate as the user walks, without interrupting the conversational flow.

  • WebSocket reconnection — We implemented a 30-second ring buffer on the proxy so clients can reconnect mid-session without losing context or messages.


Accomplishments that we're proud of

  • LoreMode — The fusion of live camera vision plus voice into alternate history scenarios is genuinely new.

  • A single Gemini Live session handles vision, audio, tool calls, and GPS context simultaneously across all four modes.

  • Full end-to-end pipeline from "point camera at a building" to "narrated documentary with illustrations and cinematic video" working in real time.

  • The GPS mode's live map sharing — Gemini can literally see your route, surroundings, and active navigation on screen.

  • Clean microservices architecture: three independently scalable Cloud Run services, each with a single responsibility.


What we learned

  • The Gemini Live API's native audio mode is extraordinarily powerful for real-time experiences — barge-in, interruptions, and long conversational context all work naturally.

  • Veo 3.1 with native audio produces genuinely cinematic output that elevates the documentary feel beyond anything text-to-video produced before.

  • WebSocket proxying for Gemini Live on Cloud Run is the right architecture — it keeps Vertex AI auth (ADC) server-side while keeping the mobile client lightweight and stateless.

  • Streaming multi-modal UI (text + images + video + live audio) in Flutter is achievable but demands a disciplined unidirectional data flow.


What's next for LORE

  • Branch Documentaries — Tap any claim during narration to instantly branch into a sub-documentary on that topic, up to 3 levels deep.

  • Historical Character Encounters — Converse with AI-powered historical figures at relevant locations (Marcus Aurelius at the Colosseum, da Vinci in Florence).

  • Chronicle Export — Generate illustrated PDFs of any documentary session with citations, timestamps, and generated imagery.

  • Depth Dial — Adjust content complexity from Explorer (casual) to Scholar to Expert (academic depth).

  • Multilingual support — 24 languages with cultural adaptation.

  • Play & App Store release.

Built With

Share this project:

Updates