Lantern — Devpost Submission

Tagline: Before the last voice goes quiet.

Inspiration

Dario Amodei, in Machines of Loving Grace, argues that AI's highest calling is to help people leave something behind. We wanted to aim that thesis at the single most urgent cultural-preservation problem we could actually touch in 90 minutes: the last living WW2 veterans. Sixteen million served. About a hundred thousand remain. Roughly three hundred die every day. Their stories are on YouTube: long, raw, but unheard. A great-grandchild in 2050 will never sit through a 42-minute oral history. They will, however, read a 600-word illustrated letter from Great-Grandpa.

What it does

Paste a YouTube URL of a WW2 veteran's testimony. Four Claude agents run in a visible pipeline:

  • Story-Excavator reads the transcript and pulls out atomic story fragments: people, places, dates, emotions, verbatim quotes.
  • Fact-Weaver verifies historical claims using Claude's hosted web search and pins real sources to every assertion.
  • Memory-Graph builds a typed entity graph of the veteran's people, units, and battles: running in parallel with Fact-Weaver.
  • Narrator, with extended thinking, composes a short illustrated letter addressed to a descendant the veteran will never meet. In the veteran's own cadence, with period photos from public-domain archives and footnoted citations.

You see every fragment stream in, every fact get verified live, the graph grow, the Narrator visibly think, and the letter render: all in under a minute.

How we built it

Next.js 16 App Router on Vercel's Node runtime. One POST /api/run endpoint that returns a text/event-stream — the frontend consumes it with fetch + ReadableStream, dispatching each event through a single useReducer.

Backend is four agents written against @anthropic-ai/sdk:

  • Sonnet 4.6 for Excavator, Fact-Weaver, and Narrator (tool-use-heavy, long-context, voice-sensitive work)
  • Haiku 4.5 for Memory-Graph (schema-bound, fast, cheap)
  • Adaptive thinking on the Narrator, capped at effort: "low" so the thinking trace stays demo-visible without running unbounded
  • Hosted web_search tool on Fact-Weaver, max_uses: 3, biased toward NARA / Library of Congress / National WWII Museum
  • Parallel execution via Promise.all for Fact-Weaver + Memory-Graph once fragments are extracted

A shared types.ts file is the contract between frontend and backend — we split the work across two builders on two branches, each building against a JSON fixture. The frontend was fully functional in fixture mode before the backend existed, and the backend shipped with a full-fixture-replay fallback so neither side ever blocked the other.

Graceful degradation is baked in: every agent call is wrapped in a withFallback that serves fixture data for that agent's slice on any error. If the Anthropic API goes down mid-demo, the stream still completes with the cached sample — judges see no error state, only a seamless handoff.

Challenges we ran into

  • Claude hallucinates image URLs. The Fact-Weaver's hosted web search returns page URLs, not image paths, so when we asked Claude to include a public-domain photo URL, it guessed. Even hostname allowlists weren't enough — Claude invented plausible-looking Wikimedia paths like /a/a4/Lcvp_iwo_jima.jpg that returned real 404s. Runtime HEAD-checking didn't save us because Wikimedia rate-limits repeat requests from the same IP. We ended up resolving three verified thumbnail URLs through the Wikipedia REST API and always replacing Claude's image with the best thematic match.
  • Adaptive thinking is unbounded by default. On the first real Narrator run, Claude streamed 100+ thinking deltas across two minutes and was still going when the curl timed out. Switching to output_config: { effort: "low" } brought wall-clock back under a minute while keeping the thinking trace visible.
  • youtube-transcript is fragile. YouTube breaks scraper packages constantly. We designed the pipeline to prefer cached fixtures and quietly fall back to them if a live fetch fails — which means a live demo never depends on YouTube being in a good mood.
  • Signoff kept defaulting to "— the veteran" when no speaker name was passed in. We added an explicit priority order to the Narrator prompt: use the supplied Speaker name, else extract the name from the transcript, else use a name-less farewell — never a generic relational placeholder.

Accomplishments that we're proud of

  • True multi-agent orchestration, not a sprinkle. Four agents, two models, parallel execution, streaming tool-use at every stage, extended thinking visible on the money agent. Judges can watch the depth, not just read our claim to it.
  • The graceful fallback pattern. Every agent can fail independently and the demo recovers without breaking the event contract. The frontend never sees an error state caused by API instability. This paid off in real life three times during development.
  • Voice capture actually works. With the full transcript passed to the Narrator and an explicit "echo ≥3 verbatim phrases" rule, the letter reads like the veteran wrote it — not like an AI describing him. "Bring us some rifles." "You heard the kid." "A minute and a half is a long time when you're counting it." Those aren't paraphrases; those are the veteran's own words, preserved.
  • Two builders, parallel tracks, one contract. We shipped a 4-agent backend and a Notion-aesthetic single-screen frontend inside the 90-minute window because the JSON fixture let us work at once instead of in series.

What we learned

  • Prompt-level constraints on hallucination don't hold. We told Claude not to invent image URLs; it invented them anyway. The fix has to be in the harness, not the prompt. Always assume the model will try.
  • Adaptive thinking needs a cap for interactive UX. Sonnet 4.6 with {type: "adaptive"} will happily think for two minutes on a creative task. effort: "low" keeps the thinking budget demo-shaped.
  • Fixture-first is a superpower for team work. A shared JSON fixture playing through the same event contract turned two machines into one pipeline.
  • Graceful degradation is a feature, not just engineering hygiene. The moment we watched the full demo complete on a zero-credit API key, we realized the fallback architecture was the reason this was demo-able at a hackathon at all.
  • Voice beats volume. Three quoted lines from the veteran's own transcript do more emotional work than a hundred well-crafted sentences written from scratch. Get out of the model's way when the source material is strong.

What's next for Lantern

The MVP produces one letter from one testimony. The next version is a family-scoped memory archive, because a descendant has more than one ancestor.

  • v2 — Veteran profiles. Persistence. Each veteran gets a profile accumulating letters, transcripts, photos, and graph fragments from multiple sources (YouTube, uploaded audio, pasted text).
  • v3 — Family mind-map. A timeline-first visualization connecting people, units, and events across every veteran in a family's tree.
  • v4 — Cross-veteran reasoning. A "Family Archivist" agent with Claude's memory tool, semantic search, and sub-agent delegation — able to answer questions like "Did Grandpa and Great-Uncle Tom ever cross paths in the Pacific?"
  • v5 — Community memorial. Opt-in public archive so descendants of the same unit can find each other. A custom MCP server exposes a family's vault to any MCP-capable LLM client, and we ship a published Claude Skillheirloom-writing — other builders can import.

The full roadmap, privacy posture, and phase-by-phase technical requirements live in docs/09-FUTURE.md.

Built With

Share this project:

Updates