Inspiration
We all have moments worth remembering every day -a great meal, a funny conversation, a beautiful sunset. But by the time we get home, the details start to fade. We want to keep a diary, but ==who has the time to sit down and write one?==
What if you could just talk about your day, and your diary would write itself?
What it does
HARU is an AI voice diary that feels like chatting with a close friend. You talk naturally, and HARU:
- :speech_balloon: Listens and saves moments -automatically captures key experiences with emotions, timestamps, and current weather
- :art: Generates illustrations -transforms your photos into warm watercolor illustrations with your avatar as the character
- :book: Writes your diary -compiles all moments into a beautifully formatted diary with bold highlights, marker effects, and embedded illustrations
- :brain: Remembers everything -recalls past conversations using hybrid RAG search across all your diary entries
All with Human-in-the-Loop approval -no costly API calls happen without your explicit consent.
How we built it
| Component | Technology |
|---|---|
| Voice Conversation | Gemini Live API (gemini-live-2.5-flash-native-audio) via ADK |
| Image Generation | Vertex AI (gemini-3-pro-image-preview) |
| Embeddings / RAG | Vertex AI (gemini-embedding-001) + pgvector |
| Diary Writing | Vertex AI (gemini-2.5-flash) |
| Agent Framework | Google ADK (Agent Development Kit) |
| Backend | Python / FastAPI / uvicorn |
| Database | PostgreSQL 16 + pgvector |
| Frontend | React 19 + TypeScript + Vite PWA |
| Hosting | Google Cloud Compute Engine (Seoul) + Caddy |
Voice Engine: Real-time bidirectional voice streaming via ADK's run_live(). The agent has a distinct personality -like a bright, playful friend -and naturally speaks Korean, English, or Japanese.
Tool Calling: Six ADK tools run silently during conversation:
- [x]
save_moment-auto-capture diary fragments - [x]
generate_image-watercolor illustration (HITL) - [x]
generate_diary-compile daily diary (HITL) - [x]
edit_moment-update saved moments - [x]
recall_memories-RAG search past conversations - [x]
learn_about_user-build user profile
Memory (RAG): RRF hybrid search combining pg_trgm keyword matching and pgvector semantic similarity with ==Reciprocal Rank Fusion== for cross-session recall.
Frontend: React 19 PWA with AudioWorklet for PCM audio capture, Canvas-based chromakey for mascot animation, and a recursive markup parser for rich diary rendering with notebook-style CSS.
Challenges we ran into
- Audio sync -Getting bidirectional voice streaming with auto-mute release timed to playback completion required tracking empty audio frames in AudioWorklet
- Infinite tool call loops -After HITL rejection, the agent would repeatedly call the same tool. Solved with triple defense: prompt rules + rejection hints + code guards
- Multi-language consistency -The agent would respond in the wrong language. Fixed by injecting language directives into every system hint and making all tool returns language-aware
- Session stability -Implemented transparent session resumption via ADK's
SessionResumptionConfigwith automatic context compression for unlimited conversation length
Accomplishments that we're proud of
- Natural conversation flow -Haru genuinely feels like talking to a friend, not an AI assistant. The persona system with natural fillers and reactions makes conversations feel alive.
- Human-in-the-Loop design -A floating approval UI that prevents expensive API calls without user consent, solving a real problem with autonomous agents.
- End-to-end diary pipeline -From voice conversation to illustrated, beautifully formatted diary entries -all in one seamless flow.
- RRF hybrid search -Combining keyword and semantic search with Reciprocal Rank Fusion for accurate cross-session memory recall.
- Solo hackathon project -Built the entire stack (backend, frontend, agent, deployment) in one week.
What we learned
- Gemini Live API's native audio model produces remarkably natural conversations when given the right persona
- Human-in-the-Loop ==greatly improves trust== when agents call costly APIs -users want to feel in control
- ADK's tool calling works best when tool descriptions are precise and system instructions are explicit about when NOT to call tools
What's next for HARU -AI Voice Diary
- [ ] Live camera streaming - Haru watches your day like a video call, automatically captures memorable scenes, and turns them into illustrations
- [ ] Voice-based diary playback (read diary entries aloud)
- [ ] Shared diaries between friends/family
- [ ] Mood analytics and weekly/monthly summaries
Built With
- caddy
- fastapi
- gemini-live-api
- google-adk
- google-cloud-compute-engine
- pgvector
- postgresql
- python
- react
- tailwind
- typescript
- vertex-ai
- vite
Log in or sign up for Devpost to join the conversation.