Inspiration

Production incident alerts and response consumes a lot of Engineering bandwidth and leads to financial loss as well as loss of trust. I have faced similar issues while running an education app with 15M MAUs. I always wanted something that can give me distilled information when I need it, even if I'm not connected to my system.

What it does

Engineers can call up 911 Production Incident Response service and ask questions about ongoing incidents and ask it to debug the incident further. It can also help answer questions about your AWS architecture or questions about your monitoring data from NewRelic

How we built it

We used Elevenlabs Agent framework to define the call agent's tasks and how to solve problems. We provided it with tools to fetch the required information like get_throughput_by_newrelic, get_cpu_data_for_ec2_from_aws, get_autoscaling_policy_from_aws, get_throughput_by_transactions_by_newrelic, do_rca and use them based on the ongoing conversation with the user.

Challenges we ran into

Initially we tried doing this as a Google Meet joiner, but having a bot join Google Meet is without commercial tools is quite hackish and seemed like a task quite huge for 24 hours. Then we discovered Elevenlabs voice agents platform, which helped us immensely. Integrating Twilio with Elevenlabs was a bit bug prone hence we didn't expose it as a phone service.

Accomplishments that we're proud of

The name! The bot actually works really well and is almost production ready. Proud to have pulled it off in 24 hours.

What we learned

How to use Elevenlabs Voice agents platform and define tools and how to prompt it correctly to pick the right tools. Integrating with Twilio using websockets, though it didn't work as seamless.

What's next for 911 Production Incident Response

Integrate with more services and tools and turn it into a full RCA voice service

Share this project:

Updates