This project is an autonomous AI voice agent that handles the entire patient intake process for a dental practice — from check-in to appointment booking — without any human staff involvement. When a patient joins a session, the agent greets them through a lip-synced avatar and walks them through a structured workflow entirely by voice. It starts by asking the patient to hold up their photo ID to the camera, then uses Gemini Vision to extract their name, date of birth, and address directly from the card image, populating the intake form in real time. It does the same for insurance cards, pulling the provider, member ID, group number, and policy holder automatically. The agent checks a persistent knowledge base powered by Senso to recognize returning patients, referencing their previous visits and personalizing the conversation. After collecting the reason for the visit, it queries the practice's schedule, presents available slots with specific doctors, and books the appointment on the spot. Every piece of information flows to the frontend instantly through LiveKit RPC calls, so the patient sees the form filling itself out as they speak. At the end of the call, the agent writes a structured summary back to the knowledge base so the next interaction picks up where this one left off. The entire stack runs on LiveKit Agents with Deepgram for speech-to-text, OpenAI for reasoning, ElevenLabs for natural-sounding voice output, and an Anam avatar for visual presence — all orchestrated through a single Python agent with function tools that drive the frontend in real time.

Built With

  • anam-avatar
  • deepgram
  • elevenlabs
  • gemini-2.5-flash
  • livekit
  • livekit-agents
  • next.js
  • openai-gpt-4o
  • python
  • react
  • senso-ai
  • silero-vad
  • tailwind-css
Share this project:

Updates