Guardian

💡 Inspiration

Over 38% of NHS 111 callers receive a call-back between 1 to 3 hours after their initial call, with 23% facing longer wait times. As well as being highly inconvenient, when 16.9% of these calls are "emergencies", this becomes unacceptable. In addition to this, emergency infrastructure is plagued by "noise", leading to calls which never required immediate expert attention. This problem resonates across many emergency services, particularly for 999 calls, where only ~20% were considered genuine. This increases wait times for higher-priority callers and wastes the operators' time, time that could be used to save lives.

With that, we developed Guardian, a multi-agent autonomous operator for the NHS, powered by ElevenLabs and Koog.

✨ Key Features

2 ElevenLabs operators: one specialised in 111 responses, and the other in 999.
111 Operator: This follows official protocol for its initial questions, afterwards dynamically generating questions narrowing down towards a single diagnosis. The possible final states are 1. Diagnosing a common issue, 2. Flagging a user as high-priority and requesting a call to 999, 3. Being unable to diagnose a problem. A report is sent to a medical professional in the last two cases.
999 Operator: The 999 operator categorises an emergency to be dealt with by either a fire/hazmat team, the police, or an ambulance. The responses are dynamic, though tailored to the situation. Finally, it sends a report to the relevant department.
Tone shifting: The 999 operator is capable of tone-shifting based on the user's panic level. Namely, a "calming, reassuring" mode and a "normal, efficient" mode.
Report: The report is concise, priority is clearly displayed, and only essential context is given.

🛠️ How We Built It

ElevenLabs Voice Agents:

We created two 999 voice operatives using ElevenLab's web interface (same voice, different personalities and sliders). We designed an extremely detailed system prompt that formed the foundation of training the voice agent to act as an experienced trained operative. We built on this by implementing vector embeddings through RAG. We went through 100s of 999 roleplay scripts and picked the most relevant ones, to then convert into vector embeddings. Through these embeddings, Gemini Flash 2.5, our model of choice, was able to generate responses directly picked from the script. We connect ElevenLabs to more context using webhooks that connect to the Koog AI agents. We further improve on the responses by dealing with the tonality of the ElevenLabs caller. If anger/panic is detected (LLM-based detection on ElevenLabs) and the current agent is "Normal", a transfer_agent() command is run to switch to the "Calm Agent". Alternatively, if the user returns to a calm state and the current agent is "Calm", a transfer_agent() command is run to switch to the "Normal Agent". To emulate calmness in the agent, we used an identical voice (same person), though the "Sweet Option". We also scaled down Speed and scaled up Stability.

Main Koog Agents:

We used Koog to manage a Supervising Agent that directs traffic between specialized ai agents.

999 Operator:

The Supervising Agent handles communication with the current ElevenLabs voice agent. Once it gets the user response, it redirects this to the relevant informational agent (Medical, Location, Police, Fire) through Koog orchestration:

Medical Agent: This has access to a pre-filled database of common emergency conditions, appropriate responses to them, and keywords related to that condition. This is a multi-stage process. 1. Queries, through exact-word matching, the database. If match is found, go to stage 3, 2. Restricts to a list of candidate conditions based on the user's context data (obtained through the call), and then narrows this done to the best candidate using Claude API. If match is not found, go to stage 5, 3. Formats instruction into a custom object MedicalResponse (to ensure all relevant information has been extracted through Claude), 4. Generates very concise, informational data through Claude API using a restrictive prompt, 5. If no match is found, the user is told this is an appropriate manner.
Location Agent: Utilising the W3W API and Open Street Map, this agent is able to locate heuristic caller location descriptions to exact coordinates. E.g. London Bridge -> 123.45 Long 18.1274 Lat. It is essential to locate a caller even if they don't know their exact address. Additionally, the location agent will have access to Ambulance coordinates in order to provide a real time estimate for expected arrival time.
Police Agent: This uses an AI-powered emergency classification system to analyse potentially violent emergency calls and develop threat assessments. Specialised tools are used by Claude AI agents to assess the danger level on a numeric scale, and output a standardised classification of the format PRIORITY=X | CONFIDENCE = Y | DANGER = Z. ElevenLabs utilises this output to communicate emotionally with the caller, essentially replacing human delay in threat assessment and doing so consistently.
Fire Agent: This uses a database to match any existing key words in the users' context (related to this department) and mapping them to useful information to create a response. For example, correlating a hazmat code which a caller may be able to read out with a chemical and returning its possible risks. If there is no match then it still comes up with a suitable message, either using Claude API for a best answer or for example warning the user to stay away just in case. If no match is found at all it returns conservative safety defaults to the user and instructions to dispatch specialised teams.

111 Operator:

As seen by the flowchart, the structure is very similar to the above, with just a different implementation for the central medical agent:

Medical Agent (111):

The NHS 111 Medical Assessment Agent handles non-emergency medical calls where callers are uncertain whether they need emergency services or just medical advice. It asks focused questions one at a time to understand the caller's symptoms, severity, and duration, then classifies cases into three categories: critical emergencies requiring immediate ambulance dispatch, cases needing doctor consultation within hours or days, or self-care advice for minor issues. The system defaults to escalating to a doctor when uncertain, and saves detailed assessment reports with conversation history for healthcare provider review. After the session is closed, the transcript is obtained and the report generated.

🚀 Challenges We Ran Into

Koog: Initially understanding the Koog framework for agent integration. Particularly, the concept of connecting agents together in a logical order.
Tone-shifter: For the tone-shifter, though the final solution using ElevenLabs was elegant, we initially tried to use a Koog agent on a different endpoint that either existed in the background or communicated with the front-end. It was difficult to determine whether this was possible, as the information of what we had access to (without Enterprise features) was limited. More than that, it was difficult to find relevant documentation.
Tokens: We ran out of tokens for a few devices as testing the actual operator calls were expensive.
Testing: It was difficult to test, as we had to run through various scenarios as we developed many relevant agents, and there was also variability in the responses naturally.

🧠 What We Learned

Koog: Learned how to orchestrate agents using Koog, routing information between agents.
ElevenLabs:Learned how to use ElevenLabs. Most interestingly, the concept and implementation of dynamic agent transfer.
Prompt Engineering: Intelligent prompt engineering with Claude API. Specifically, in getting data that is concise and relevant.
Kotlin: Learning Kotlin, as some of had not used it.

⏭️ Extension Ideas

Asynchronous background agents: Accessing a web socket through ElevenLab's enterprise features would allow for the creation of asynchronous background agents with access to more information (e.g. the tone-shifting functionality would have had access to VAD scores, etc.), but this was paid.
Agent extension: Extending the agent transfer to work with more personalities.
Department Integration: Integration with police departments, hospitals, etc., such that reports can be immediately dispatched or integrated with auto-deploy functionality.
Multiple departments: Deploying to multiple departments through the transcript of a single call (e.g. someone may have been shot, so there is a need for both police and medical responses).
Asynchronous tooling to provide faster response times.
Make a centralised platform to handle calls and track call history.