Inspiration

We all have moments worth remembering every day -a great meal, a funny conversation, a beautiful sunset. But by the time we get home, the details start to fade. We want to keep a diary, but ==who has the time to sit down and write one?==

What if you could just talk about your day, and your diary would write itself?

What it does

HARU is an AI voice diary that feels like chatting with a close friend. You talk naturally, and HARU:

  • :speech_balloon: Listens and saves moments -automatically captures key experiences with emotions, timestamps, and current weather
  • :art: Generates illustrations -transforms your photos into warm watercolor illustrations with your avatar as the character
  • :book: Writes your diary -compiles all moments into a beautifully formatted diary with bold highlights, marker effects, and embedded illustrations
  • :brain: Remembers everything -recalls past conversations using hybrid RAG search across all your diary entries

All with Human-in-the-Loop approval -no costly API calls happen without your explicit consent.

How we built it

Component Technology
Voice Conversation Gemini Live API (gemini-live-2.5-flash-native-audio) via ADK
Image Generation Vertex AI (gemini-3-pro-image-preview)
Embeddings / RAG Vertex AI (gemini-embedding-001) + pgvector
Diary Writing Vertex AI (gemini-2.5-flash)
Agent Framework Google ADK (Agent Development Kit)
Backend Python / FastAPI / uvicorn
Database PostgreSQL 16 + pgvector
Frontend React 19 + TypeScript + Vite PWA
Hosting Google Cloud Compute Engine (Seoul) + Caddy

Voice Engine: Real-time bidirectional voice streaming via ADK's run_live(). The agent has a distinct personality -like a bright, playful friend -and naturally speaks Korean, English, or Japanese.

Tool Calling: Six ADK tools run silently during conversation:

  • [x] save_moment -auto-capture diary fragments
  • [x] generate_image -watercolor illustration (HITL)
  • [x] generate_diary -compile daily diary (HITL)
  • [x] edit_moment -update saved moments
  • [x] recall_memories -RAG search past conversations
  • [x] learn_about_user -build user profile

Memory (RAG): RRF hybrid search combining pg_trgm keyword matching and pgvector semantic similarity with ==Reciprocal Rank Fusion== for cross-session recall.

Frontend: React 19 PWA with AudioWorklet for PCM audio capture, Canvas-based chromakey for mascot animation, and a recursive markup parser for rich diary rendering with notebook-style CSS.

Challenges we ran into

  1. Audio sync -Getting bidirectional voice streaming with auto-mute release timed to playback completion required tracking empty audio frames in AudioWorklet
  2. Infinite tool call loops -After HITL rejection, the agent would repeatedly call the same tool. Solved with triple defense: prompt rules + rejection hints + code guards
  3. Multi-language consistency -The agent would respond in the wrong language. Fixed by injecting language directives into every system hint and making all tool returns language-aware
  4. Session stability -Implemented transparent session resumption via ADK's SessionResumptionConfig with automatic context compression for unlimited conversation length

Accomplishments that we're proud of

  • Natural conversation flow -Haru genuinely feels like talking to a friend, not an AI assistant. The persona system with natural fillers and reactions makes conversations feel alive.
  • Human-in-the-Loop design -A floating approval UI that prevents expensive API calls without user consent, solving a real problem with autonomous agents.
  • End-to-end diary pipeline -From voice conversation to illustrated, beautifully formatted diary entries -all in one seamless flow.
  • RRF hybrid search -Combining keyword and semantic search with Reciprocal Rank Fusion for accurate cross-session memory recall.
  • Solo hackathon project -Built the entire stack (backend, frontend, agent, deployment) in one week.

What we learned

  • Gemini Live API's native audio model produces remarkably natural conversations when given the right persona
  • Human-in-the-Loop ==greatly improves trust== when agents call costly APIs -users want to feel in control
  • ADK's tool calling works best when tool descriptions are precise and system instructions are explicit about when NOT to call tools

What's next for HARU -AI Voice Diary

  • [ ] Live camera streaming - Haru watches your day like a video call, automatically captures memorable scenes, and turns them into illustrations
  • [ ] Voice-based diary playback (read diary entries aloud)
  • [ ] Shared diaries between friends/family
  • [ ] Mood analytics and weekly/monthly summaries

Built With

Share this project:

Updates