Inspiration

We receive fragmented information every day: an idea in the shower, something interesting at the grocery store, a half-remembered conversation. These fleeting moments hold real value but vanish before we can act on them. Existing productivity tools require manual input, creating more work. For AI to be truly personal, it needs to experience what we experience and think what we've thought. It needs a way to quickly and collaboratively help us capture our world. Gemini 3's multi-modal capabilities made it possible from a single model family.

What it does

A voice-first mobile web app with four surfaces. Chat: real-time voice conversations via the Gemini Live API where a background agent pipeline extracts knowledge, detects actions, and tracks emotions, with evidence trails and three permission levels (Suggest, Draft & Review, Execute). Hub: AI-prioritized daily focus and proactive insights via Gemini 3 Flash with Code Execution. Reflect: conversation history, extracted claims, goals with milestones, and AI-detected conflict review. Settings: privacy and per-action-type permissions.

How we built it

Four Gemini models across 15+ integration points. Gemini Live API handles low-latency bidirectional voice with Voice Activity Detection and barge-in. Gemini 3 Flash powers text intelligence using structured output (JSON Schema), thinking levels (low/medium/high), plus Google Search and URL Context tools for autonomous actions. Gemini 3 Pro drives high-fidelity vision with media_resolution control and Code Execution for photo analysis, with Flash as fallback. Gemini Embedding (gemini-embedding-001) enables semantic deduplication and search. A blackboard agent pipeline with thought signatures processes turns in real-time.

Challenges we ran into

Live API text-only fallback, AudioWorklet in headless browsers, extraction timing without finality signals, embedding threshold tuning, and multi-model fallback for resilience.

Accomplishments that we're proud of

15+ Gemini integration points across four models, not a chat wrapper but a deep multi-model architecture. Real-time knowledge extraction during conversation with evidence trails. Automated E2E testing where Gemini judges its own extraction quality.

What we learned

Gemini 3's structured output with thinking levels dramatically improves extraction quality. Embeddings-based deduplication is essential for knowledge accumulation. The blackboard agent pattern scales well for real-time processing.

What's next for Second-Self

Google Calendar and Gmail integrations, push notifications, PWA offline support, multi-device sync, and wearable integration.

Share this project:

Updates