TrekDesk AI: Inspiration, Implementation, and Learnings
Inspiration
The inspiration for TrekDesk AI came from seeing the friction in the trekking industry. Mountain adventures are complex; guests have endless questions about gear, acclimatization, and route difficulty. Standard chatbots fail because trekking is deeply personal and requires real-time, nuanced conversation. I wanted to build an AI that doesn't just answer questions but acts as a seamless extension of the operator’s team, capable of checking real-time availability and handling bookings via voice while you're planning your next summit.
How I Built It
The Voice Core
I leveraged the Gemini Multimodal Live API over WebSockets to provide a human-like voice interface. To handle audio cleanly, I implemented custom Voice Activity Detection (VAD) on the client side using ONNX models.
The Brain (RAG)
To prevent hallucinations, I built a Retrieval-Augmented Generation (RAG) pipeline using pgvector. I ingest tour itineraries and gear guides, transforming them into vector embeddings that the AI searches during tool calls.
The Integrations
I connected the AI directly to the operator's business operations via the Google Calendar API, allowing the AI to actually perform tasks like booking a trek, not just answer questions.
What I Learned
This project was a deep dive into Multimodal AI. I learned that the "UI of the future" isn't just a set of buttons, but a fluid conversation. I mastered managing complex tool-calling sequences where the AI must decide, in the middle of a sentence, whether to fetch data from a database or verify a calendar slot.
Challenges I Faced
The biggest hurdle was Real-Time Synchronization. Dealing with audio buffers, WebSocket handshakes, and AI tool-calling latencies across a distributed system was challenging. I spent significant time debugging VAD asset loading issues to ensure the system felt responsive and didn't "cut off" users. Ensuring the RAG pipeline returned high-precision results for technical trekking data (like altitude curves and tiered pricing) required careful prompt engineering and vector tuning.
Technical Notes
- Voice Activity Detection (VAD): Implemented using ONNX models for efficient client-side inference.
- RAG Pipeline: Uses
pgvectorfor vector embeddings of tour itineraries and gear guides. - Integrations: Google Calendar API for real-time booking capabilities.
- Latency Management: Optimized WebSocket connections and audio buffer handling for near real-time performance.
Log in or sign up for Devpost to join the conversation.