Sign In
Dashboard
Widget Configuration
Tour and Trekking Config
AI Debuuger for Devs
Conversation and Transcripts
Widget
Gemini Model and Tool Calls

TrekDesk AI: Inspiration, Implementation, and Learnings

Inspiration

The inspiration for TrekDesk AI came from seeing the friction in the trekking industry. Mountain adventures are complex; guests have endless questions about gear, acclimatization, and route difficulty. Standard chatbots fail because trekking is deeply personal and requires real-time, nuanced conversation. I wanted to build an AI that doesn't just answer questions but acts as a seamless extension of the operator’s team, capable of checking real-time availability and handling bookings via voice while you're planning your next summit.

How I Built It

The Voice Core

I leveraged the Gemini Multimodal Live API over WebSockets to provide a human-like voice interface. To handle audio cleanly, I implemented custom Voice Activity Detection (VAD) on the client side using ONNX models.

The Brain (RAG)

To prevent hallucinations, I built a Retrieval-Augmented Generation (RAG) pipeline using pgvector. I ingest tour itineraries and gear guides, transforming them into vector embeddings that the AI searches during tool calls.

The Integrations

I connected the AI directly to the operator's business operations via the Google Calendar API, allowing the AI to actually perform tasks like booking a trek, not just answer questions.

What I Learned

This project was a deep dive into Multimodal AI. I learned that the "UI of the future" isn't just a set of buttons, but a fluid conversation. I mastered managing complex tool-calling sequences where the AI must decide, in the middle of a sentence, whether to fetch data from a database or verify a calendar slot.

Challenges I Faced

The biggest hurdle was Real-Time Synchronization. Dealing with audio buffers, WebSocket handshakes, and AI tool-calling latencies across a distributed system was challenging. I spent significant time debugging VAD asset loading issues to ensure the system felt responsive and didn't "cut off" users. Ensuring the RAG pipeline returned high-precision results for technical trekking data (like altitude curves and tiered pricing) required careful prompt engineering and vector tuning.

Technical Notes

Voice Activity Detection (VAD): Implemented using ONNX models for efficient client-side inference.
RAG Pipeline: Uses pgvector for vector embeddings of tour itineraries and gear guides.
Integrations: Google Calendar API for real-time booking capabilities.
Latency Management: Optimized WebSocket connections and audio buffer handling for near real-time performance.

Built With

Updates

Udara Shanuka Senarath started this project — Mar 16, 2026 04:27 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.