Inspiration

Every family has a box of photos somewhere. Faded prints, creased Polaroids, old black and white pictures that nobody can fully explain anymore. The people who lived those moments are getting older, and the stories behind those photos often live only in their memories. When they're gone, the photo becomes just a photo.

In my own life, this became very real when I lost my grandmother. Before she passed away, she was diagnosed with severe dementia. My family lived thousands of miles away, so most of the time we saw her was through FaceTime calls. I remember there were moments where she would pick up the phone and just look confused. Sometimes she didn’t recognize my mom, her own daughter. Sometimes she didn’t recognize us either.

Watching that happen was really painful, especially for my mom. There were so many stories she still wanted to ask her about. Memories from when she was growing up, stories about our family, little things that only my grandmother would know. But as time went on, those stories slowly disappeared with her.

That experience made me realize something: preserving memories is just as important as capturing them.

Inspiration for "EVA" The AI

Part of the inspiration for this project actually came from an episode of Black Mirror called Eulogy. In the episode, the main character is guided by an AI companion that helps him walk through memories from his past. The AI doesn’t just show him information, it acts more like a presence that accompanies him through those moments, helping him remember and reflect.

What I found interesting about that episode is that it shows advanced technology in a positive way. Instead of being something dystopian or scary, the AI becomes a tool that helps someone reconnect with memories and understand their past.

That idea stuck with me.

For Living Memory, I wanted to capture a similar feeling. I didn’t want the experience to just feel like browsing a photo gallery or a timeline of events. I wanted it to feel like something was guiding you through your memories, almost like a companion walking with you through your family’s story.

That’s where EVA, the AI companion, came from. EVA isn’t just there to answer questions. The idea is that it helps guide you through memory lane; helping surface stories, connect photos to moments, and make the experience feel more alive.

What it does

Living Memory is a web application that helps families preserve the stories behind their photographs before those stories are lost forever.

At the center of the experience is EVA — an AI companion powered by Amazon Nova 2 Sonic — who guides you through your memories the way a thoughtful friend would. EVA doesn't just answer questions. She listens, asks, and helps draw out the stories you've been meaning to tell. The core experience works like this: you hold up a physical photograph to your camera. EVA sees it through Nova 2 Lite's multimodal vision, recognizes what's in the frame — the people, the setting, the era — and starts a natural voice conversation about it. "That looks like it was taken somewhere warm. Who are the two people on the left?" From that conversation, Living Memory automatically generates narrations, stories, and a beautifully formatted storybook that can be shared with the whole family.

Key features include:

  • Voice conversations with EVA — real-time speech-to-speech using Nova 2 Sonic, with a vision bridge that lets EVA "see" what's in front of the camera
  • Auto photo capture — detects when a physical photo is held up, uses Nova 2 Lite to find its corners, and extracts a clean digital scan in real time
  • Story and narration generation — Nova 2 Lite synthesizes conversations into rich narrations, per-photo captions, and full family storybooks
  • Photo stylization — Nova Canvas transforms old, faded, or damaged photos into vivid restored versions or artistic interpretations
  • Fact extraction — every photo gets structured metadata: who, what, when, where, and why — building a searchable family memory archive
  • An intro experience narrated by EVA through Amazon Polly, setting the emotional tone before the user enters the app

How we built it

Living Memory is a full-stack web application built with Next.js/TypeScript, Supabase, and Tailwind CSS. The AI backbone runs entirely on Amazon Nova via AWS Bedrock:

  • Nova 2 Lite handles all multimodal reasoning — photo analysis, corner detection for auto-capture, fact extraction, story generation, narration writing, and the vision bridge that gives EVA context about what she's "looking at"

  • Nova 2 Sonic powers EVA's real-time voice conversations through a custom bidirectional WebSocket server (server.mjs) built on Node.js with HTTP/2 streaming via the AWS SDK. Because Nova Sonic is audio-only, we built a vision bridge that runs frames through Nova 2 Lite every few seconds and injects the visual description as non-interactive context — so EVA can naturally reference what she sees when the user speaks

  • Nova Canvas handles photo stylization and visual transformation

  • Amazon Polly (generative engine, Ruth voice) narrates the intro experience as EVA The camera pipeline captures frames every 1.5 seconds, debounces them, and on meaningful changes sends them to Nova 2 Lite for analysis. Extracted context is silently loaded into the Nova Sonic session so EVA speaks from genuine awareness rather than scripted responses. Conversations are persisted to Supabase alongside photo metadata, enabling the storybook and narration generation to have full context across an entire session.

Challenges we ran into

Getting Nova Sonic to work in a web environment was the biggest technical challenge. Nova Sonic uses a bidirectional HTTP/2 streaming protocol that can't be called directly from the browser. We had to build a custom WebSocket proxy server that translates browser WebSocket messages into the Bedrock bidirectional stream protocol — managing the full event lifecycle (sessionStart, promptStart, contentStart, audioInput, contentEnd, promptEnd, sessionEnd) and keeping the stream alive through silence keepalives.

Bridging Nova Sonic's audio-only nature with a visual experience required significant creative problem-solving. Gemini Live natively supports multimodal video input. Nova Sonic doesn't see images at all. We built a vision bridge that intercepts camera frames, analyzes them with Nova 2 Lite, and injects the descriptions as background context into the voice session — giving EVA the ability to reference what she "sees" without Nova Sonic ever receiving an image.

Keeping conversations natural and turn-based took careful tuning. Early on, every visual context update triggered EVA to respond, causing her to monologue without giving the user space to speak. The fix was marking context injections as non-interactive (interactive: false in the Bedrock event protocol) so EVA absorbs visual information silently and only responds when the user speaks.

Accomplishments that we're proud of

We're proud that the experience actually feels the way we imagined it. When you hold up an old photograph and EVA starts a real conversation about the people in it — asking questions, remembering details across the session, and then turning that into a story — it feels genuinely moving in a way that a simple photo upload form never could.

The Nova Sonic voice integration working end-to-end in a browser, through a custom WebSocket proxy with real-time bidirectional streaming, is something we're particularly proud of technically. It's not a trivial integration and getting the event protocol right took real persistence.

We're also proud of the auto-capture pipeline — using Nova 2 Lite to detect photo corners in real time, extract a clean scan from a held-up photograph, and immediately begin a conversation about it creates a seamless physical-to-digital experience that feels like magic when it works.

What we learned

We learned that building with a speech-to-speech model is fundamentally different from building with a text or chat model. Nova Sonic has its own protocol, its own constraints, and its own way of thinking about conversation turns. Working with it properly required understanding the full bidirectional streaming event lifecycle rather than treating it like a simple API call. We also learned how powerful Nova 2 Lite's multimodal capabilities are for a task like this. The model doesn't just identify objects — it understands context, infers relationships between people, and picks up on emotional and historical details in photographs that make the generated narrations feel personal rather than generic. Perhaps most importantly, we learned that the hardest part of building something emotionally meaningful is not the technology. It's the design — knowing when to let the AI speak, when to let silence breathe, and how to make a user feel safe enough to share something they've never written down before.

What's next for Living Memory

The most meaningful next step is family collaboration — right now Living Memory is a personal experience, but memories belong to families. We want to enable multiple family members to contribute to the same album, add their own voices and perspectives to the same photograph, and collaboratively build a shared record that no single person could complete alone.

We also want to explore long-term memory for EVA — giving her persistent knowledge of your family across sessions, so she can make connections between photographs taken decades apart and surface stories that you didn't know were related.

On the preservation side, we want to add export to physical formats — printed storybooks, narrated video slideshows, and audio recordings of the conversations themselves, so that the record survives beyond any single app or platform.

Finally, we believe this could have real impact in elder care settings — helping families reconnect with relatives experiencing memory loss, giving caregivers a structured way to document life histories, and creating something meaningful out of what is often a painful and isolating experience. That's the version of Living Memory we most want to build

Built With

Share this project:

Updates