Inspiration
Gathering quotes for a service, reserving a specific table at a restaurant, finding a niche item in a mom-and-pop store; These all require spending time on the phone with businesses. Many people find this stressful or don't have time; 90% of Gen Z is anxious about speaking on the phone*. For small businesses who don't post their full offerings online, this means lost business.
More and more consumers will use assistants like Gemini to complete the full buying process, from research to checkout. I built the ServiceScout A2A server to make it easy for legacy businesses to participate in the massive market that AI is creating
What it does
ServiceScout automates the calling process for the customer. With just one instruction, it can get quotes from dozens of candidate businesses, and then call back to make an appointment with the best one - all autonomously, with no user input. ServiceScout reacts to the information it gets over the phone to adapt and make a game plan for success. This saves the customer valuable time and drives more leads to small businesses. It's a win-win.
The gathered information about businesses' offerings is stored and can be referenced in future queries through RAG (retrieval-augmented generation), saving even more time and helping future customers discover small businesses.
How we built it
There are two ADK agents hosted on Cloud Run; Scout (research) and Phone (outreach). Once Scout identifies candidate businesses from Google Places API, it calls the Phone agent, which holds a natural and adaptive phone conversation with each business as though it were the customer. We use Gemini Live Audio API to create a conversational and personable experience throughout, from the customer's chat with Scout to the Phone Agent's calls with businesses.
The frontend, Scout Agent, Phone Agent and A2A server are all Cloud Run services.
The phone agent integrates with Twilio over WebSocket, and the frontend also communicates with the Scout agent over websocket to enable live bidirectional audio.
Firebase is used for authentication and domain mapping.
Challenges we ran into
Prompt engineering for Gemini Live Audio is a lot harder than with text models, and it took hours of iterating on prompts to get the desired behavior with some degree of consistency.
I originally tried using Vertex AI RAG engine for the knowledge base, but had to give up after a lot of troubleshooting due to an inexplicable GCP error. I quickly pivoted to a DIY RAG solution using Firestore vector search and Vertex AI Text Embeddings endpoint, which works great.
The audio formats of Twilio's media stream and Gemini Live aren't the same, so I had to do some audio resampling which took a while to get right.
Accomplishments that we're proud of
Finally getting the Gemini Live API and Twilio phone call to "click" was one of my favorite moments in the project. The latency is so low and the intonation and voice are so natural that speaking on the phone with Gemini feels almost like talking to a real person. We loved it so much that we made the user-facing dashboard be voice-native, too.
Integrating pre-release and novel technologies like Gemini Live and some Vertex AI offerings proved to be a challenge due to limited online discussion and examples. Solving unexpected bugs by diving into the source code and debugging the ADK code was like a treasure hunt, and it was rewarding to gain a deep understanding of the ADK this way.
What we learned
I am now more familiar with the ADK library code than I initially planned, but this has encouraged me to contribute to the project on GitHub with possible improvements I identified.
The serverless/ephemeral nature of Cloud Run instances complicated status updates of long-running tasks like phone calls. Locally, we could just store a mapping of session_id -> ADK live stream in memory to forward status updates that come in on a webhook (what Twilio uses), but the continuity of the instance is not guaranteed on Cloud Run. This made me learn and adopt best practices for microservices and distributed systems.
What's next for ServiceScout
Phone calling is just the start. So many businesses don't fully publish their offerings on the internet; In other parts of the world, many businesses operate solely on WhatsApp. ServiceScout is adding more outreach methods like email and WhatsApp to help customers find the right business faster, and bring more leads to small businesses worldwide.
Built With
- adk
- agents
- firebase
- firestore
- gemini
- google-places
- nextjs
- phone
- python
- rag
- twilio
- websockets


Log in or sign up for Devpost to join the conversation.