Inspiration
C-suite executives spend a massive chunk of their week jumping between meetings, often with zero time to switch contexts. They are expected to remember every detail from past discussions, know the backgrounds of everyone in the room, and stay updated on breaking industry news. We realized that while LLMs are great at summarizing text, executives don't have time to read more text. They need a Chief of Staff to brief them. That inspired Kanojo: an automated, voice-native assistant that synthesizes all that disparate data into a quick, conversational briefing before you ever step into the boardroom.
What it does
Kanojo is a proactive, voice-activated AI Chief of Staff. Instead of just managing your calendar, she actively prepares you for it. When asked, Kanojo delivers a personalized auditory briefing that covers your upcoming meeting agendas, summarizes relevant past meeting notes, provides background on the attendees, and pulls in real-time news about their companies.
How we built it
We built Kanojo using a three-part pipeline focused on emotional intelligence and agentic reasoning:
Contextual Auditory Input: We used the Modulate API for speech-to-text. Instead of just transcribing words, it captures the user's feelings and tone, allowing Kanojo to understand the urgency or stress level of the request.
Agentic Orchestration: The core "brain" is an information funnel built with the Airain Agent Creator. This agent framework handles the complex logic of parsing the user's voice request, querying schedules, digging through past meeting transcripts, and fetching real-time news without hallucinating.
Expressive Voice Output: To ensure Kanojo feels like a trusted human partner rather than a robotic tool, we utilized Google AI Studio's Text-to-Speech (Gemini TTS). This allowed us to inject natural emotion, pacing, and inflection into her audio responses based on the context of the briefing.
Challenges we ran into
Building a seamless voice experience is tough. Our biggest hurdle was latency. We had to chain together emotion-detecting speech-to-text, complex multi-step reasoning through the Airain agent, and expressive text-to-speech generation. Optimizing this pipeline so that Kanojo responds naturally, without awkward robotic pauses, took a lot of trial and error. Additionally, ensuring the agent pulled exactly the right context (like distinguishing between two clients with similar names) required tight prompt engineering and data structuring.
Accomplishments that we're proud of
We are incredibly proud of how "human" Kanojo feels. By combining Modulate's emotion capture with Google's expressive TTS, we created an AI that doesn't just recite data—it speaks to you with the appropriate tone. We're also proud of successfully orchestrating the backend data funnel, proving that an AI agent can reliably synthesize multiple messy data streams (calendars, past notes, news) into one coherent summary.
What we learned
We learned just how powerful agent orchestration frameworks are. Using the Airain Agent Creator taught us how to break down a massive, ambiguous request (like "Prep me for the afternoon") into specialized, executable tool calls. We also realized that in voice AI, latency and emotional tone matter just as much as the accuracy of the underlying language model.
What's next for Kanojo - She handles your schedule
Our immediate next step is to integrate a live X (Twitter) feed API. This will allow Kanojo to actively monitor and alert the CEO to any real-time, breaking news about the specific managers or companies they are meeting with that day. Further down the line, we want to add CRM integrations (like Salesforce) so Kanojo can automatically log meeting notes and update deal stages after the meeting ends.
Built With
- airia
- brave
- flora
- google-cloud
- modulate
- python
- vercel
Log in or sign up for Devpost to join the conversation.