Architecture Diagram

Oprina: Conversational AI Avatar Assistant

Inspiration

We've always dreamed of developing a voice assistant—not just another chatbot, but something that felt like talking to a real person who could genuinely help get things done. The vision was to create an assistant with a voice and presence that made interactions feel natural and human, even though you knew it wasn't real.

When we discovered Google's Agent Development Kit (ADK), we realized this was the perfect opportunity to finally build that dream. Here was a framework powerful enough to create sophisticated AI agents that could work together seamlessly.

The inspiration for Oprina came from a simple yet profound realization: productivity tools should adapt to humans, not the other way around. In our daily lives, we're constantly switching between apps—checking emails, managing calendars, responding to messages—all through traditional interfaces that fragment our attention and workflow.

We envisioned a future where managing your digital life could be as natural as having a conversation with a trusted assistant. What if you could simply say "Schedule a meeting with the marketing team for next Tuesday" or "Summarize my important emails from this morning" and have an intelligent avatar handle everything seamlessly?

This personal passion combined with the powerful capabilities of ADK led us to create Oprina (It's the first one of its kind) - a multi-agent conversational AI avatar assistant that transforms productivity through natural voice interactions and lifelike avatar technology.

What it does

Oprina is a conversational AI avatar assistant that revolutionizes how you manage your digital productivity through natural voice interactions. Instead of clicking through multiple apps and interfaces, you simply talk to your AI avatar assistant who handles everything for you.

Core Capabilities:

Email Management: Read emails aloud to you, send emails on your behalf, organize your inbox, and provide AI-powered email analysis and summaries
Calendar Operations: Create and schedule meetings, check your availability, manage appointments, and resolve scheduling conflicts automatically
Voice-First Experience: Complete hands-free operation through natural speech conversations with real-time voice responses
Avatar Interaction: Lifelike AI avatars with synchronized lip-sync technology that make conversations feel natural and engaging
Cross-Platform Integration: Seamless connection to Gmail and Google Calendar with secure OAuth authentication
Intelligent Workflows: Convert email requests into calendar events, extract action items from conversations, and manage complex multi-step tasks

What makes Oprina unique is the combination of sophisticated multi-agent AI architecture with immersive avatar technology, creating the first truly conversational productivity assistant that feels like talking to a real person.

How we built it

Multi-Agent Architecture

Our platform leverages Google's Agent Development Kit to create a sophisticated multi-agent ecosystem:

Root Agent Orchestration: Central coordinator that intelligently routes user requests to specialized sub-agents based on context and intent
Email Agent Specialization: Dedicated agent handling Gmail operations including reading, sending, organizing, and AI-powered email analysis
Calendar Agent Management: Specialized agent for comprehensive Google Calendar operations, event creation, scheduling, and conflict resolution
Cross-Agent Workflows: Seamless data passing between agents enabling complex scenarios like converting email requests into calendar events

Technology Stack

Frontend: React 18 with TypeScript, Vite build system, Tailwind CSS for responsive design, and custom avatar integration components

Backend: FastAPI with Python for high-performance API development, comprehensive Supabase integration for authentication and data management

Google Cloud Stack:

Vertex AI for agent deployment and management
Google Cloud Speech-to-Text and Text-to-Speech APIs for voice processing
Cloud Run for scalable containerized deployment

Avatar Integration:

HeyGen API for streaming avatar generation and real-time voice synthesis
Custom static avatar system for fallback scenarios

Authentication: Supabase Auth with Google OAuth integration for secure user management

Core Features Implementation

Voice-First Interface: Complete speech-to-text and text-to-speech integration allowing natural conversation with AI avatars

Real-Time Avatar Integration: HeyGen streaming avatars provide lifelike visual representation with synchronized voice responses and lip-sync technology

Gmail and Calendar OAuth Integration: Seamless connection to Google services enabling comprehensive email management and calendar operations

Session Management: Persistent conversation history with context awareness across multiple agent interactions

Challenges we ran into

OAuth Verification Hurdles

Due to Google's OAuth verification requirements, our application is currently in testing mode, which restricts access to whitelisted test user accounts only. Production-level OAuth access requires Google's brand verification process, which takes 3–5 business days for initial response and up to 4–6 weeks for full approval—a timeline incompatible with hackathon submission deadlines. We solved this by providing test account credentials for evaluation.

Voice Processing Latency

Balancing real-time voice interaction with high-quality avatar responses proved challenging. The delay between speech input and avatar response needed to feel natural. With more time, we could have implemented advanced caching strategies and optimized our speech processing pipeline to reduce latency significantly.

Dependency Conflicts

We encountered an unexpected and time-consuming issue where importing Supabase client libraries caused HTTP response parsing conflicts with Vertex AI ADK session creation. The Supabase client's HTTP handling interfered with ADK's expected response format, causing critical errors. This wasn't something we anticipated and took considerable time to resolve by developing a custom authentication approach.

HeyGen AI Streaming Sessions and Cost Management

HeyGen's streaming avatar sessions come with significant costs, which forced us to implement user quotas limiting each user to 15 minutes of streaming avatar time. After the quota is reached, users switch to a static avatar with voice to prevent burning through our budget while still demonstrating the full concept. Additionally, switching between streaming and static avatars can break the session, so we implemented a 10-second lock period to ensure smooth transitions.

Accomplishments that we're proud of

Technical Achievements

We successfully built and deployed a full-scale, production-ready website that demonstrates sophisticated multi-agent AI coordination in real-world scenarios. This isn't just a proof of concept—it's a functional application that users can interact with immediately.

Learning Multi-Agent Development

As first-time ADK users, we truly learned how to build multi-agent systems for real-world applications and discovered how crucial evaluations (evals) are for ensuring agent reliability and performance. The journey from concept to deployment taught us invaluable lessons about agent orchestration and production AI systems.

Unique Innovation

We're particularly proud of our unique idea of giving digital assistants a face—creating conversational AI avatar assistants that combine the intelligence of multi-agent systems with the human connection of avatar interaction. This represents a new category of productivity tools that prioritizes natural human-computer interaction.

End-to-End Implementation

Building everything from the multi-agent backend to the responsive frontend, implementing secure authentication, managing external API integrations, and creating a seamless user experience across web and voice interfaces—all within the hackathon timeframe—represents a significant technical accomplishment.

What we learned

Our Personal ADK Journey

As first-time users of the Agent Development Kit, our learning curve was steep but incredibly rewarding:

Initial Challenges: Understanding ADK's session management system proved tricky initially. The concept of tool context and how state persists across agent interactions required careful study. However, once we grasped these fundamentals, we discovered how remarkably easy ADK makes implementing complex agent workflows and multi-agent coordination.

Documentation Goldmine: The ADK documentation became our invaluable reference, providing clear guidance on everything from basic tool development to advanced deployment strategies. The official sample repository proved to be an absolute goldmine, offering practical examples that accelerated our development process significantly.

Game-Changing Development Tools: ADK web emerged as our most valuable development tool, enabling rapid local testing and real-time agent interaction debugging. The seamless integration with Vertex AI for production deployment created an incredibly smooth development-to-production pipeline.

Technical Insights

Authentication Strategy: We learned the importance of choosing one authentication method and sticking with it throughout development after encountering conflicts between different auth approaches.

Dependency Management: The critical lesson about library compatibility—especially how client libraries can interfere with framework expectations—will influence our future architecture decisions.

Cost Management: Implementing intelligent quota systems and fallback mechanisms taught us valuable lessons about building sustainable AI applications with external service dependencies.

What's next for Oprina: Conversational AI Avatar Assistant

Enhanced Agent Intelligence

We plan to make our agents significantly smarter by implementing advanced reasoning capabilities, better context understanding, and more sophisticated cross-agent collaboration. This includes expanding beyond email and calendar to integrate with additional productivity tools and services.

Improved Avatar Experience

Extended Streaming Conversations: Developing cost-effective solutions to provide longer avatar streaming sessions, potentially through optimized session management and tiered user plans.

Avatar Customization: Allowing users to personalize their AI assistant's appearance, voice characteristics, and personality traits to create truly personalized digital assistants.

Desktop Application

Building a native desktop application that provides seamless integration with local productivity workflows, offline capabilities, and deeper system-level integrations that aren't possible through web browsers.

Enterprise Features

Expanding into enterprise markets with team collaboration features, advanced security controls, integration with business tools like Slack and Microsoft Teams, and administrative dashboards for organization-wide AI assistant management.

Agent Marketplace

Creating an ecosystem where users can discover and integrate specialized agents for different domains—from sales and marketing to development and research—truly realizing the vision of personalized AI assistant teams.