Inspiration
We were inspired by the countless immigrants who struggle to navigate essential systems in their new country - from understanding medical bills to opening bank accounts. These challenges are amplified by language barriers and unfamiliarity with local processes.
Our team is built entirely by immigrants who came to the United States and experienced these struggles firsthand. For one of our founders, when he was very young: after his family migrated, his parents didn’t know how to navigate the healthcare system and language barrier prevented them from understanding the system. When a medical emergency struck, they had no idea where to go or what to do. They ended up in an ER, facing an overwhelming bill they didn’t understand — a moment that underscored how critical accessible information can be.
Those personal experiences drive us. We wanted to create a compassionate AI companion available 24/7, offering guidance in multiple languages and making critical information understandable and accessible to everyone, regardless of background or circumstances.
What it does
Guru is an AI assistant that helps immigrants navigate the US healthcare, financial, and legal systems. It offers:
- Specialized AI Agents for healthcare, finance, and legal guidance
- Smart Routing that sends questions to the right specialist
- Multiple Ways to Connect — web chat + WhatsApp
- Voice Support via WhatsApp with automatic transcription (Whisper)
- Real-time Voice Conversations using OpenAI’s Realtime API
- Multilingual Assistance
- Conversation History for contextual, personalized help
How we built it
Tech Stack:
- Frontend: Next.js 16 with TypeScript for a modern, responsive chat interface
- Backend: Node.js/Express server using OpenAI Agents SDK
- AI Models: GPT-4o for agent responses, GPT-4o-mini for cost-efficient routing
- Voice Processing: OpenAI Whisper for transcription, OpenAI TTS for speech synthesis
- WhatsApp Integration: Twilio API for SMS and voice message handling
- Voice Agent: OpenAI Realtime API with WebRTC for low-latency voice conversations
Architecture: We built a simple architecture where the frontend communicates with a backend API that orchestrates three specialized agents. Each agent has custom instructions and prompts tailored to their domain. The routing system uses AI to intelligently classify user queries and direct them to the appropriate specialist, ensuring users always get expert-level guidance.
Challenges we ran into
- Agent Migration: We initially built with Claude's Agent SDK but needed to migrate to OpenAI Agents SDK mid-development. We successfully refactored the entire backend while maintaining backward compatibility.
- Conversation Context: Implementing conversation history across different communication channels (web chat, WhatsApp, voice) while keeping the context coherent was challenging. We solved this by creating a unified session management system.
- WhatsApp Voice Messages: Processing voice messages from WhatsApp required careful handling of audio formats and Twilio authentication. We built a robust pipeline that downloads, transcribes, and processes voice messages seamlessly.
- Real-time Voice Quality: Achieving low-latency voice conversations required implementing WebRTC properly with the OpenAI Realtime API, including handling network issues and audio feedback.
Accomplishments that we're proud of
- Accessibility First: Built a system that works across multiple channels (web, WhatsApp, voice) to meet users where they are
- WhatsApp Integration: We separated the backend and the frontend so that we can talk to the agent over any channel, WhatsApp, WeChat, or the webapp. WhatsApp will soon support voice as well.
- Intelligent Agent Routing: Our AI-powered routing system ensures users always talk to the right specialist
- Comprehensive Solution: Created a complete ecosystem from medical bill negotiation to legal document requirements to banking guidance
- Real-world Testing: Successfully tested with actual immigrant use cases and refined based on feedback
What we learned
- AI Agent Architecture: Learned how to design and implement multi-agent systems with specialized roles and intelligent routing
- Voice AI Integration: Gained understanding of WebRTC, real-time audio processing, and OpenAI's Realtime API. Also learned how to pass context from past chat history into OpenAI’s realtime API for voice.
- Twilio WhatsApp API: Mastered webhook handling, media processing, and asynchronous message delivery
- Migration Strategy: Learned how to migrate between AI SDKs while maintaining system stability
- User-Centered Design: Discovered the importance of multilingual support and multiple interaction modalities for immigrant communities
What's next for Guru
- Messaging App support: Build native support for WhatsAp and WeChat ecosystem, so that users can chat with the agent and call the agent over their native interface
- Group Chat support: Over WhatsApp and WeChat, it should be possible to add the agent to group chats to help it spread in communities
- Hyper-localized Community Resources: Integrate real-time database of local community health centers, legal aid societies, and immigrant support organizations
- Document Processing: Enable users to upload photos of documents (medical bills, legal papers) for automatic analysis and guidance
- Appointment Scheduling: Add ability to help users schedule appointments with healthcare providers, immigration attorneys, and financial advisors
- Legal Application and Form Filling: Agent should be able to fill out legal applications and forms on behalf of the user
- Follow-up System: Implement proactive reminders for important deadlines (visa renewals, payment due dates, appointments)
- Partnership Integration: Partner with hospitals, legal aid organizations, and banks to provide verified, institution-specific guidance
- Financial Assistance Database: Integrate real-time information about available grants, charity care programs, and financial assistance
Log in or sign up for Devpost to join the conversation.