⚠️ Critical Notice: Post-Submission Platform Update

Please be aware that a sudden specification change occurred in AI Studio after the submission deadline, which caused unexpected breaking changes to our hosted prototype.

While we have applied a quick compatibility patch to the AI Studio version to reflect these changes, it may still be unstable. To experience the project as originally intended, we strongly recommend cloning our GitHub repository and running the code locally.

We appreciate your understanding regarding this external platform issue.

Inspiration

Every night, we place the smartest device in human history on our bedside table. Yet every morning, what greets us is a harsh, repetitive alarm tone — the most primitive and stressful user experience imaginable.

We asked ourselves: "Why does technology attack us in the morning, instead of nurturing us?"

A mother opening the curtains and softly whispering her child's name — we wanted to recreate that sense of "care" from our memories, powered by Gemini. We're shifting the paradigm from "alarm clock" to "morning partner."

What it does

OKIRO is an AI morning partner that uses Gemini 3 Flash's Function Calling to dynamically collect and analyze morning context, then wakes users up through Gemini Live API's real-time voice conversation.

Context-driven autonomous wake-up strategy: Before the alarm triggers, Gemini 3 Flash autonomously collects calendar events, weather, sleep data, emails, smart home status, and news via Function Calling. "Important meeting at 9 AM + poor sleep last night" → urgency: high, tone: energetic — the wake-up strategy is determined dynamically.

Real-time voice interaction: Through Gemini Live API's bidirectional audio streaming, an AI character calls the user by name and speaks to them. 30 voice options across 3 languages (English, Japanese, Korean) are supported.

Real-time wake status monitoring: During the Live API session, the AI autonomously invokes Function Calling to report the user's wake status (awake / drowsy / unresponsive) as structured data based on their voice responses. The app automatically escalates tone, volume, and call intervals based on these results.

Interactive post-wake briefing: Once the user is confirmed awake, the session transitions to a briefing context prepared in the background, enabling a natural conversational Q&A session — "What's on my schedule today?" "How's the weather?"

How we built it

We built OKIRO with a two-layer architecture combining Gemini 3 Flash and Gemini Live API. Developed as a React + TypeScript web application, it is deployed on Google AI Studio. From API key management to model execution, everything operates within the Google AI Studio ecosystem.

Intelligence Layer: Gemini 3 Flash (Function Calling)

This is OKIRO's brain. A two-phase context engineering pipeline designs the wake-up experience.

  • Phase 1 — Data Collection: Based on user settings, enabled tools (calendar, weather, sleep, email, smart home, news, etc.) are passed to Gemini 3 Flash, which autonomously collects data via Function Calling. Gemini 3 itself decides which tools to call and in what order.
  • Phase 2 — Strategy Decision: Urgency is calculated from the collected data, combined with sleep quality and user preferences to select the appropriate tone and escalation strategy. An optimized system prompt is generated for the Live API.

Emotion Layer: Gemini Live API (Real-time Voice)

Connected to Gemini Live API via WebSocket for bidirectional audio streaming. Web Audio API handles the microphone input and audio playback pipeline. Function Calling is also utilized within the Live API session, delegating wake status assessment to the AI itself.

Tech Stack: React, TypeScript, Vite, Gemini 3 Flash, Gemini Live API, Web Audio API, Google AI Studio

Challenges we ran into

Tool schema design for Function Calling: The quality of context collection was heavily influenced by what tools and schemas were provided to Gemini 3 Flash. By designing data structures that mirror actual Google API response formats, we ensured the Function Calling logic can be reused as-is when connecting to real APIs in the future.

Latency optimization between the two model layers: Initially, we had Gemini 3 Flash generate the entire system prompt, but the delay before Live API could start was degrading the experience. We optimized by generating the wake-up context (lightweight) immediately to start the Live API quickly, while the briefing context (heavier) is generated in parallel in the background.

Prompt engineering for "nurturing": Tuning the AI persona to be "caring yet concise" was challenging. We embedded five tone levels based on urgency into the system prompt and constrained the speech volume per turn to achieve the balance between warmth and brevity.

Accomplishments that we're proud of

Practical application of Gemini 3 Function Calling: We demonstrated an architecture where Gemini 3's Function Calling autonomously builds context for the essential LLM application challenge of "collecting data and constructing optimal prompts." The design allows seamless migration from mock data to real APIs simply by swapping tool definitions.

Two-layer coordination of intelligence and emotion: Gemini 3 Flash (intelligence layer) understands and analyzes context, while the Live API (emotion layer) speaks to humans in real time. This two-layer structure achieves a morning partner that is both "smart and gentle."

Human state monitoring via Function Calling: By having the AI autonomously invoke Function Calling during a Live API session, we established a pattern for extracting structured state ("wake level") from unstructured voice responses and connecting it to app-level escalation control.

What we learned

Function Calling enables agentic behavior: Gemini 3's Function Calling is more than just a means to call APIs — it enables AI to autonomously strategize information gathering and decide what data to collect. We observed Gemini 3 Flash optimizing its own collection order based on the combination of enabled tools.

Context engineering determines experience quality: Even with the same instruction to "wake me up gently," the AI's behavior changes dramatically depending on whether it has context like "important meeting this morning + sleep-deprived." The pipeline that dynamically collects and analyzes context with Gemini 3 to construct optimal prompts was the core of the experience.

Real-time voice is essential for emotional AI: The emotional communication through "tone of voice" — impossible with text chat — is the only natural interface for someone who just woke up and can't even open their eyes.

What's next for OKIRO

Real API connections: Connect the tools currently running on demo data to actual services like Google Calendar, Gmail, and weather APIs. The Function Calling architecture is designed to be reused as-is.

Multimodal wake analysis: Leverage Gemini's vision capabilities to more accurately detect user wake status from camera feeds.

IoT device integration as tools: Define smart curtain and lighting controls as Function Calling tools, enabling Gemini to autonomously operate devices based on the wake-up scenario.

Health data synchronization: Automatic optimization of escalation strategies based on sleep quality data from wearable devices.

Built With

Share this project:

Updates

posted an update

⚠️ Critical Notice: Post-Submission Platform Update

Please be aware that a sudden specification change occurred in AI Studio after the submission deadline, which caused unexpected breaking changes to our hosted prototype.

While we have applied a quick compatibility patch to the AI Studio version to reflect these changes, it may still be unstable. To experience the project as originally intended, we strongly recommend cloning our GitHub repository and running the code locally.

We appreciate your understanding regarding this external platform issue.

Log in or sign up for Devpost to join the conversation.