Inspiration
Every day, we face situations where we need to make time-consuming phone calls - booking a haircut appointment, ordering takeout from a restaurant that doesn't have online ordering, scheduling car maintenance, or following up on a repair. These calls interrupt our workflow, can involve long hold times, and often require navigating complex phone menus. We realized that while we have AI assistants for many digital tasks, nobody had truly solved the problem of making real-world phone calls on our behalf.
What it does
HERMES is an AI-powered personal assistant that makes phone calls for you. Simply tell HERMES what you need in natural language - "Book me a haircut for tomorrow afternoon" or "Order a large pepperoni pizza for delivery" - and it will:
- Search for relevant businesses in your area
- Evaluate options based on ratings and reviews
- Place an actual phone call using an AI voice agent
- Handle the entire conversation professionally
- Complete your request (booking, ordering, inquiring)
- Add to and read from your google calendar
- Provide you with a detailed summary and confirmation
Users can monitor calls in real-time with live transcripts, listen in via audio streaming, and even use "YOLO mode" for fully automated execution without any manual selection.
How we built it
We built HERMES using a modern tech stack combining multiple AI services:
- Frontend: Next.js 15 with TypeScript and Tailwind CSS for a responsive interface
- Search Intelligence: OpenAI GPT-4o with web search capabilities to find real businesses and contact information
- Voice Agent: VAPI.ai integration with Claude Sonnet 4 for natural phone conversations
- Backend Services: Flask (Python) for business search/summarization and Express.js for call management
- Real-time Features: WebSocket connections for live call transcripts and PCM audio streaming
- Calendar Integration: Google Calendar API to check availability and automatically add appointments
- State Management: React hooks for seamless transitions between search, active call, and results views
Challenges we ran into
- Audio Streaming: Implementing real-time PCM audio streaming from VAPI calls required complex WebSocket handling and audio buffer management
- AI Prompt Engineering: Crafting prompts that made the AI agent sound natural while staying focused on the user's specific request
- Business Data Accuracy: Ensuring we found real, current business information with accurate phone numbers
- Call State Management: Coordinating between multiple services (VAPI, our backend, and frontend) to maintain consistent call state
- Error Handling: Managing edge cases like businesses being closed, wrong numbers, or unexpected call scenarios
Accomplishments that we're proud of
- Successfully integrated multiple AI models (GPT-4o for search, Claude Sonnet 4 for calls) to work together seamlessly
- Built a fully functional system that can make real phone calls and complete real-world tasks
- Implemented live call monitoring with both text transcripts and audio streaming
- Created an intuitive UI that makes complex AI orchestration feel simple
- Developed "YOLO mode" - true one-click automation from request to completion
- Achieved natural-sounding conversations that businesses couldn't distinguish from human callers
What we learned
- The importance of prompt engineering for voice AI - small changes dramatically affect conversation quality
- WebSocket programming for real-time features is complex but enables magical user experiences
- Combining multiple AI models with different strengths creates more capable systems
- Real-world integration (like phone calls) introduces unique challenges not present in purely digital products
What's next for hermes
- Expanded Capabilities: Add support for more complex multi-step tasks and follow-up calls
- Voice Customization: Let users choose different AI voice personas for different situations
- SMS Integration: Handle businesses that require text confirmation
- Multi-language Support: Enable calls in different languages for diverse communities
- Business API: Allow businesses to optimize their phone systems for AI agents
Built With
- claude
- nextjs
- openai
- python
- typescript

Log in or sign up for Devpost to join the conversation.