Inspiration
Production incident alerts and response consumes a lot of Engineering bandwidth and leads to financial loss as well as loss of trust. I have faced similar issues while running an education app with 15M MAUs. I always wanted something that can give me distilled information when I need it, even if I'm not connected to my system.
What it does
Engineers can call up 911 Production Incident Response service and ask questions about ongoing incidents and ask it to debug the incident further. It can also help answer questions about your AWS architecture or questions about your monitoring data from NewRelic
How we built it
We used Elevenlabs Agent framework to define the call agent's tasks and how to solve problems. We provided it with tools to fetch the required information like get_throughput_by_newrelic, get_cpu_data_for_ec2_from_aws, get_autoscaling_policy_from_aws, get_throughput_by_transactions_by_newrelic, do_rca and use them based on the ongoing conversation with the user.
Challenges we ran into
Initially we tried doing this as a Google Meet joiner, but having a bot join Google Meet is without commercial tools is quite hackish and seemed like a task quite huge for 24 hours. Then we discovered Elevenlabs voice agents platform, which helped us immensely. Integrating Twilio with Elevenlabs was a bit bug prone hence we didn't expose it as a phone service.
Accomplishments that we're proud of
The name! The bot actually works really well and is almost production ready. Proud to have pulled it off in 24 hours.
What we learned
How to use Elevenlabs Voice agents platform and define tools and how to prompt it correctly to pick the right tools. Integrating with Twilio using websockets, though it didn't work as seamless.
What's next for 911 Production Incident Response
Integrate with more services and tools and turn it into a full RCA voice service
Built With
- amazon-web-services
- elevenlabs
- new-relic
- node.js
Log in or sign up for Devpost to join the conversation.