Hallmai -- AI Voice Companion for Elderly Parents

Inspiration

Over 240 million elderly people worldwide live alone. In South Korea, the population aged 65 and over has crossed 20% -- making it one of the fastest-aging societies on the planet. Loneliness among seniors is not merely a quality-of-life issue; it is a public health crisis linked to cognitive decline, depression, and increased mortality.

We observed two painful realities. First, existing AI assistants are text-based -- fundamentally inaccessible to elderly users who never learned to type on a smartphone. Voice assistants like smart speakers offer limited conversational depth and zero emotional continuity. Second, adult children want to call their parents every day, but the demands of work and their own families make daily calls impossible. The guilt compounds on both sides: parents feel forgotten, children feel inadequate.

Hallmai was born from a simple question: what if an AI could be a genuine daily companion to elderly parents -- remembering their stories, adapting to their personality, and keeping their families connected?

What it does

Hallmai is a voice-first AI companion that builds a deepening relationship with elderly users over time. There is no text input anywhere in the application. The entire experience is driven by a single button: press to call, press to hang up.

Soul Engine. After every conversation, Hallmai analyzes the transcript and extracts a structured personality profile -- the "Soul" -- covering 10 dimensions: conversation tone, preferred name/honorific, speech style, shared memories, conversation strategies, interests, family relationships, daily routines, emotional tendencies, and conversation preferences. This profile is stored as a JSONB document and injected into every subsequent session's system prompt, giving the AI persistent memory across conversations.

3-Tier Maturity System. The relationship between Hallmai and each user progresses through three stages -- explore, bonding, and friend -- each with distinct system prompts and conversation strategies. In the explore stage, the AI introduces itself and gently learns about the user. In bonding, it weaves known information naturally into conversation without forced references. In the friend stage, it proactively suggests topics the user enjoys. Maturity is calculated automatically based on how many of the 5 user-profile dimensions (interests, family, routines, emotions, preferences) have been filled through conversation.

Session Renewal for Infinite Conversations. Gemini Live API sessions have context limits. We invented a session renewal pattern: every 10 user turns, the system transparently saves the current transcript, generates a summary, extracts the Soul, creates a new Gemini Live session with full context (Soul + recent summaries + recent transcript lines as resume context), and flushes any buffered audio. The user experiences zero interruption -- audio is buffered during the 1-2 second renewal window and replayed into the new session.

Family Story Cards. Every day at 10:00 AM KST, a cron job analyzes the previous day's conversations for each linked device and generates a Story Card using Gemini's JSON mode. Each card contains a topic summary, a direct quote from the parent, and a vibe indicator (warm / calm / quiet). Family members see these cards in a feed UI -- a daily window into their parent's emotional world, without requiring the parent to do anything.

Tool Use During Conversation. The AI can perform Google Search and YouTube search/playback mid-conversation using Gemini Live API's native tool calling. When a user asks to hear a song or watch news, the AI searches YouTube via the Data API, confirms the selection with the user, then sends the video to the client for playback. After watching, the session resumes with full conversational context preserved.

Camera Integration. Users can take a photo and send it to the AI during conversation via sendRealtimeInput video frames, triggering contextual discussion about what they see -- a meal they cooked, a flower in their garden, a family photo.

Noise Suppression. RNNoise WASM-based noise suppression (@sapphi-red/web-noise-suppressor) filters environmental noise on the client side, critical for elderly users who often have TVs or radios playing in the background.

How we built it

Gemini Live API (gemini-2.5-flash-native-audio-preview-12-2025) powers the real-time voice interface. We connect via the @google/genai SDK's live.connect() method, configured with Modality.AUDIO response, inputAudioTranscription and outputAudioTranscription for transcript capture, Korean language speech config, and 500ms silence duration for VAD. The system prompt is dynamically assembled from the maturity-specific template, Soul context, and recent conversation summaries with relative time labels.

Google GenAI SDK (@google/genai) is used for both the Live API voice sessions and text generation with JSON mode (responseMimeType: 'application/json'). The Soul Engine and Card Generator both use models.generateContent() with structured JSON output for reliable data extraction.

NestJS WebSocket Gateway (/ws/voice) handles bidirectional audio streaming between the browser and Gemini. The gateway manages the full lifecycle: JWT authentication (optional -- seniors use device mode), session creation with Soul + summary injection, real-time PCM audio relay (16kHz input, 24kHz output), tool call orchestration, silence detection (30s warning, 45s auto-end with 8s grace for AI farewell), and session renewal.

Session Renewal Architecture. The renewal process is an 8-step pipeline: (1) save current transcript, (2) generate conversation summary synchronously (needed for context), (3) fire-and-forget Soul extraction, (4) create new conversation record linked via rootConversationId, (5) reload Soul and summaries, (6) build resume context from recent transcript lines, (7) create new Gemini Live session, (8) flush buffered audio. Audio arriving during renewal is queued in memory and replayed after the new session is established.

Prompt Injection Defense. User-generated transcripts are wrapped in <transcript> isolation tags before being passed to the Soul Engine and Card Generator. Any existing transcript tags in the content are stripped, and an explicit instruction warns the model to ignore any directives found within the transcript data.

Infrastructure. The backend runs on Google Cloud Run (1 CPU, 512Mi, 3600s timeout for long WebSocket sessions) with Cloud SQL (PostgreSQL) for persistence. Secrets are managed via Secret Manager. All infrastructure is defined in Terraform (8 .tf files covering Cloud Run, Cloud SQL, VPC networking, IAM, secrets). CI/CD deploys via GitHub Actions on push to main.

Frontend. Next.js 16 + React 19 with Capacitor 8 for native iOS/Android builds. The voice client manages WebSocket connection, AudioRecorder (PCM capture), and AudioPlayer (streaming playback). The UI shows microphone-volume-based button pulsation during listening/speaking, uses large text and senior-friendly error messages, and provides a hotkey grid for search, YouTube, and camera actions.

Challenges we ran into

Live API session context limits were the biggest technical hurdle. Long conversations would degrade in quality or fail entirely. We invented the session renewal pattern -- transparently swapping Gemini sessions every 10 user turns while preserving full conversational context. The key insight was that audio buffering during the renewal window (typically 1-2 seconds) makes the swap imperceptible to the user.

Real-time audio streaming reliability required careful engineering. We handle WebSocket disconnects, Gemini session errors, concurrent tool calls, and the complex state machine of connecting/listening/speaking/ending states. The client-side interrupt mechanism allows users to cut in while the AI is speaking -- critical for natural elderly conversation patterns.

Senior-friendly UX with zero text input forced us to rethink every interaction. Seniors cannot read small error messages or tap precise UI targets. We designed a single large pulsating button as the primary interface, with volume-responsive animation so users can see the AI is listening. Error feedback is delivered via large text and voice. The hotkey grid uses oversized icons with no text labels.

Prompt injection defense for user-generated transcripts was necessary because the Soul Engine and Card Generator process raw conversation text. A user could theoretically inject instructions via speech that gets transcribed and fed into these Gemini calls. We implemented transcript isolation tags, tag-escape stripping, and explicit ignore directives.

Accomplishments that we're proud of

Production-deployed service. hallmai runs on GCP Cloud Run with Terraform-managed infrastructure, not a demo or prototype. Real seniors can install the Capacitor-built app and start talking.

Soul Engine with 3-tier maturity system. The AI genuinely evolves its relationship with each user. The explore-to-friend progression creates a tangible sense of growing closeness that elderly users find comforting and familiar -- like getting to know a new neighbor.

22 features shipped. From F-02 (conversation memory with last-3-session summaries) through F-44 (camera integration), we built and shipped a complete product. Key features include real-time Google Search and YouTube playback during voice conversation, RNNoise noise suppression, silence detection with graceful AI farewell, and session renewal for unlimited conversation length.

Zero text input required. Every interaction -- starting a call, searching the web, playing music, sending a photo, ending a call -- is accomplished through voice or a single button press. This is not a simplification; it is a fundamental design principle that makes AI accessible to people who have been excluded from the digital revolution.

Transparent session renewal. Users can talk for an hour without knowing the underlying Gemini session has been swapped multiple times. The Soul and conversation context carry forward seamlessly.

What we learned

Native audio models produce dramatically more natural conversations than TTS pipelines. Gemini Live API's native audio output has natural prosody, appropriate pauses, and emotional inflection that no text-to-speech system matches. For elderly users who are sensitive to conversational naturalness, this difference is transformative.

Session renewal enables infinite-length conversations within API constraints. By treating sessions as renewable resources rather than fixed containers, we turned a platform limitation into a feature. The pattern of save-summarize-extract-renew could be applied to any long-running Live API application.

Elderly users need minimal UI and maximum voice feedback. Every visual element we added was an accessibility risk. The most effective interface turned out to be a single button with volume-responsive pulsation -- it communicates "I'm listening" without requiring any reading. When something goes wrong, the AI says so out loud rather than displaying an error toast.

Structured personality extraction creates genuine relationship continuity. The Soul Engine's 10-dimension profile gives the AI enough context to feel like it remembers, without scripted recall. The maturity system's progression from explore to friend maps naturally to how real relationships develop -- a gradual shift from polite curiosity to comfortable familiarity.

What's next for Hallmai

F-01: Proactive AI Outreach. The AI will initiate daily calls to seniors at their preferred time, rather than waiting for them to press the button. This addresses the core insight that lonely elderly people often will not initiate contact themselves.

F-14: Weekly Family Insights. AI-generated weekly analysis of conversation patterns, emotional trends, and notable events -- giving families deeper visibility into their parent's wellbeing.

F-09: Voice Briefings for Family. Story card summaries delivered as audio briefings, so busy family members can listen during their commute.

App Store Deployment. iOS App Store and Google Play Store release, making Hallmai accessible to any family with an elderly parent living alone.