Inspiration

Modern sightseeing can be exhausting. It is plagued by switching between Google Maps for navigation, TripAdvisor for ratings, Wikipedia for history, and Instagram for inspiration.

So we asked ourselves: What if we got rid of the screens, the lists, and the planning? We built Travel Daddy to have the perfect afternoon in a new city.

The Pavement Plod-and-Plop

To understand the simplification, let's look our team member Jakub on his first day in London.

  • Now: Jakub sees a cool building. He stops walking. He opens Maps. He tries to match the blue dot to a pin. He Googles the name. He reads a Wikipedia page while blocking the pavement.
    • Result: 6 steps, 2 apps, boring.
  • The "Travel Daddy" Way: Jakub walks past the building. Travel Daddy chimes in: "That brick warehouse? It used to be a Victorian tea vault. Want to know who stole from it?" Jakub says, "Yeah that is super cool." Result: 0 screens, 1 interaction, perfect.

What it does

Acts as a fully autonomous travel guide for a user exploring a city. Users can interact with the guide in a hands-free manner, and inform it on the sights they want to see as they walk around the city. At start-up, the user is asked what they would like to do for the day, such as explore the area or visit a famous landmark. After this, the guide will create a path that passes through many points of interest, as well as offering to go into greater detail with the user and engaging with in-depth conversations.

How we built it

The app was built through React Native, with a typescript frontend and a Python web API backend. GPS services are used to determine the user's location through a sequence of pings, which trigger the agent to give tour-guide style information about areas of interest around the user. The real magic comes from our integration with Claude models for generating descriptive and engaging conversations, and with ElevenLabs for state of the art text-to-speech and speech-to-text models. We layered these services to create a seamless pipeline from the user's speech to the agent's output audio.

Challenges we ran into

The main technical challenges came with providing real-time responses to maintain a fluid conversational tone. The latency associated with generative models and text-to-speech made it difficult to produce responses within a reasonable time. This was mitigated using latency hiding with pre-generated responses. Given more time and resources, we could have experimented with other, faster models.

Accomplishments that we're proud of

  • Creating a fully hands-free conversational pipeline for talking to a claude ai agent with context based on your location and points of interest located near you.
  • Developing algorithms that can take a users plan for the day and generate a tour throughout the city they are in, visiting multiple landmarks and attractions.

What we learned

  • Hands-free UI/UX: We learned that not all UIs have to have crazy visuals, there is a beauty in a simple human-like interaction. This allowed us to explore how to keep a user engaged and interested while conveying meaningful information.
  • Latency in real time conversations are hard: We realised that for a conversation to feel real, speed is the most important. We learned valuable lessons in optimising the pipeline between the AI agent and the Text-to-Speech engine to minimise the "thinking time."
  • AI does not solve everything: We learned not to force AI into every problem. We used simple geospatial maths for trigger zones and open-source databases for landmark retrieval. This backend work kept the app fast and less coupled with AI, letting the AI focus purely on generating personality and becoming a true tour guide.

What's next for Travel Daddy

Travel Daddy has the potential to grow into a real-time, conversational AI agent which can provide complex routing and intriguing dialogue to suit any traveller's wishes. Future capabilities would allow the agent to learn from the user's preferences, whether that be the types of tourist attraction they like to visit or the manner of speaking they interact best with. It could provide customised routes for any traveller in any given city to take them, for example, to the best historical landmarks.

Another future ambition would be implementing more ways to interact with the agent. We wanted to keep a "hands-free" approach to sightseeing, but thought that some users may wish to take pictures of buildings and request information, for example.

Built With

Share this project:

Updates