Inspiration

We noticed that most AI assistants and chatbots are reactive — they wait for users to ask something. But busy people like managers, founders, students don’t need “another app to talk to.” They need something that anticipates their day, handles routine tasks and proactively keeps them on track like a real executive assistant.

We were inspired by the idea of Jarvis from Iron Man not as a voice chatbot but as a system that observes context, decides what matters and takes action automatically. Echo AI is our step toward that reality.

What it does

Echo AI is a voice-first proactive executive assistant deployed on Google Cloud Run using the Google Agent Development Kit (ADK).

Echo can:

Listen to natural voice commands and understand intent

Schedule or reschedule meetings using Google Calendar

Summarize unread emails and draft replies

Fetch and summarize relevant daily news

Calculate commute time and tell you when to leave based on live traffic

Generate a Morning Briefing that automatically summarizes your schedule, travel time, priority emails, and news — without you asking.

The goal: You start your day already prepared.

How we built it

Google Cloud Run for deploying all backend services

Google ADK multi-agent architecture

Calendar & Commute Agent → Google Calendar + Maps Routes API

News Agent → RSS + Gemini summarization via AI Studio

Email Agent → Gmail API

Orchestrator Agent → decides what information is relevant

Google Cloud Speech-to-Text for real-time transcription from the browser microphone

Google Cloud Text-to-Speech for natural spoken responses

WebSockets for low-latency audio streaming and conversational feedback

UI built in React + Vite with a clean minimal “ripple” voice interface

Cloud Run Job + Cloud Scheduler to trigger the Morning Brief every day

The entire system runs serverless, scales automatically, and requires no manual infrastructure management.

Challenges we ran into

Low latency audio streaming was challenging combining WebM/Opus browser encoding with server-side STT streaming required careful buffer handling.

Ensuring proactive behavior without being interruptive designing the right triggers and thresholds for when the agent should speak.

Also fine tuning summarization prompts so news briefings remained factual, concise and contextual instead of generic.

Accomplishments that we're proud of

We successfully built a system that feels agentic so Echo does things on its own rather than waiting for input.

The Morning Briefing feature turned out to be both powerful and surprisingly natural to use.

We deployed a multi-agent architecture entirely on Cloud Run with smooth communication and real-time voice interaction.

What we learned

Building an agent that acts intelligently is less about model complexity and more about context modeling and trigger design.

Real-time voice applications depend heavily on streaming architecture not just LLM quality.

Proactivity is a UX problem first so the agent must help without being intrusive.

Google Cloud Run + ADK makes multi agent orchestration surprisingly clean compared to traditional server stacks.

What's next for Echo AI

Long-term personal memory (preferences, habits, communication style, recurring patterns)

Meeting intelligence (auto notes + action item extraction)

Deep workspace integrations: Slack, Notion, Jira, Teams

Adaptive tone voice synthesis depending on time of day something like a calm morning , focused work sessions and a evening EOD wrap ups.

Mobile first app with continuous lightweight listening mode

Built With

Share this project:

Updates