✈️ TravelAIAgent: A Multimodal Conversational Travel Planner

Author: Ramis Hasanli
|| LinkedIn || Kaggle Notebook || YouTube || Github ||

Overview

This project is a part of the 5-day Gen AI Intensive Course with Google, held from March 31 - April 4, 2025. The course provided a comprehensive dive into the fundamental technologies and techniques behind Generative AI (Gen AI), led by Google’s ML researchers and engineers. This hands-on course helped developers like myself deepen our understanding of Gen AI capabilities and apply them to real-world problems through practical projects.

Problem Statement

Travel planning can be a fragmented and time-consuming process. Travelers often juggle multiple apps and websites for weather updates, event information, translation tools, and itinerary building. This disjointed approach complicates the overall travel experience.

TravelAIAgent solves this problem by combining multiple functionalities into one intuitive and multimodal assistant. This AI-powered agent simplifies the travel planning process by:

Understanding natural language and interpreting photos to gather context from the user.
Providing personalized recommendations based on preferences.
Fetching real-time data for weather and events.
Generating dynamic itineraries tailored to the user's travel style, weather conditions, and available events.
Offering cultural tips and recommendations grounded in real-time information.

TravelAIAgent is not just LLM chatbot. Agent is enhanced with numerous AI capabilities, including:

Image Understanding: Translating and interpreting text from user-uploaded photos to enhance user experience with visual information (e.g., street signs, menus).
Retrieval-Augmented Generation (RAG): Providing cultural tips and recommendations by retrieving relevant information from a database using vector search and generating responses based on this data.
Few-Shot Prompting: Generating dynamic itineraries based on minimal user input, adapting the assistant's responses based on user preferences and requirements.
Function Calling: Executing specific functions based on user commands, such as generating itineraries, fetching weather data, or summarizing YouTube videos.
Long Context Window: Managing and retaining user preferences and conversation history across multiple interactions to provide more personalized recommendations and responses.
Context Caching: Storing relevant data temporarily to improve response speed and reduce redundant API calls, ensuring a smoother user experience.
Gen AI Evaluation: Using an LLM-based evaluation system to assess the quality of generated itineraries, providing a "Travel Score" based on factors like balance, personalization, and weather fit.
Grounding: Ensuring that the assistant's responses are grounded in real-time data (e.g., live weather, events, and local activities) to improve the accuracy and relevancy of suggestions.
Embeddings: Utilizing embeddings for effective text understanding and search, such as providing culturally relevant tips by embedding city-specific knowledge.
Video Understanding: Understanding and extracting useful information from YouTube videos.

Key Features

🌦 Weather Info: Get current and forecasted weather using the Gemini 2.0 model.
🧳 Travel & Cultural Tips: Personalized tips from 30+ cities, delivered using RAG-style vector search.
🎟 Event Discovery: Find events near you using the Ticketmaster API.
🗺 Itinerary Generator: Build personalized travel plans based on your preferences.
🧠 Itinerary Evaluator: Self-assesses itineraries with a “🌍 Travel Score” based on weather fit and personalization.
🖼 Landmark Descriptions: Upload photos to get tourist-style explanations of landmarks.
📺 YouTube Video Summarizer: Share YouTube video links and receive a summary and insights.
📸 OCR & Translation: Extract text from images (e.g., signs, menus) and translate to English.
❓ Travel Quiz: Answer fun questions to receive destination recommendations tailored to your style and budget.
💾 Memory: The assistant remembers your preferences and past trips to improve future suggestions.
📝 Export Itineraries: Save itineraries in .md, .pdf, or .json formats.
📤 Export Full Chat History: Save chat interactions for reference at any time.

Why This Matters

Travel planning today is fragmented, requiring users to manage multiple apps and websites. TravelAIAgent brings everything together into one intelligent assistant, offering a more seamless, personalized experience. By combining language understanding, image analysis, real-time data, and self-evaluation, this tool showcases the potential of Generative AI in real-world problem-solving.

How It Works

The assistant uses an intent-based routing system that directs user requests to appropriate handlers.

Intent Recognition: Interpreting User Requests One of the core components of TravelAIAgent is its ability to understand and route user inputs to the correct function. The interpret_user_request function uses language model prompting to classify the user's intent into a structured Python dictionary. This allows the assistant to dynamically route the conversation to the correct handler—whether it's fetching the weather, planning an itinerary, summarizing a YouTube video, or translating an image. The prompt is carefully crafted to instruct the language model to return only a valid dictionary without explanations or formatting. Based on user input, the model selects an appropriate intent and fills in required values (e.g., city, filename, timeframe).

def interpret_user_request(user_input):
    prompt = (
        "You are a function router. Based on the user message, output ONLY a Python dictionary.\n"
        "- If the user is asking for weather info, return: {'intent': 'get_weather', 'location': '<CITY>'}\n"
        ...
        "Respond with ONLY the dictionary. No explanations, no code blocks.\n\n"
        f"User: {user_input}"
    )
    response = client.models.generate_content(
        model="gemini-2.0-flash",
        contents=[types.Part.from_text(text=prompt)]
    )
    try:
        return ast.literal_eval(response.text.strip())
    except Exception:
        return {'intent': 'chat'}

Evaluating Itinerary Quality with Prompted Self-Assesment One of the most thoughtful touches in TravelAIAgent is the inclusion of an itinerary evaluation feature. This function lets the AI critique its own output, giving users confidence and transparency in the itinerary recommendations. The evaluate_itinerary function uses prompt engineering and examples to guide Gemini into giving a detailed self-assessment of a generated plan. The prompt starts with two curated examples of good and bad itineraries, including scores and comments. This guides the model’s evaluation tone and format. Then, the model is also given the user’s travel preferences so it can comment on personalization. Finally, the output is expected to include a 🌍 Travel Score out of 10, followed by Gemini commentary. This approach ensures that users receive meaningful, transparent evaluations rather than blindly trusting generated itineraries. This is especially valuable when dealing with uncertain or dynamic factors like weather, events, or the user’s mood—giving the assistant a humanlike ability to "second guess" itself when needed.

def evaluate_itinerary(itinerary: str, user_prefs: str) -> str:
    examples = (
        "EXAMPLE 1:\n"
        ...
    )
    prompt = (
    "You are an AI travel evaluator who just generated the itinerary below. "
    ...
    )
    response = client.models.generate_content(
        model="gemini-2.0-flash",
        contents=[types.Part.from_text(text=prompt)]
    )
    return response.text.strip()

Launching the Interactive Travel Chat Loop The start_travel_chat function powers the entire interactive experience behind TravelAIAgent. It acts as the brain and memory manager, allowing for continuous, multimodal travel planning within a dynamic loop. Following happens inside the function:

Session memory: A memory dictionary tracks context such as the last mentioned city, user preferences (vibe, food, pace, etc.), and past outputs like itineraries or YouTube summaries. This allows continuity across the conversation.
Flexible command handling: Users can type commands like !export or !profile for extra control, making the agent feel like a smart CLI assistant.
Intent-based routing: User input is passed through interpret_user_request, which returns a structured action dict. This enables routing to specific handler functions like handle_get_weather, handle_plan_itinerary, etc.
Fallback logic: If the intent isn't matched to a specialized function, the chatbot falls back to standard Gemini-powered chat with full history context.

This loop makes the chatbot feel continuous, state-aware, and genuinely helpful — not just a one-shot bot. It's the central nervous system that connects interpretation, memory, response, and personality.

def start_travel_chat():
    history = []
    memory = {
        ...
        },
        ...
    }

    onboarding_block = types.ModelContent(parts=[types.Part.from_text(text=onboarding_message)])
    history.append(onboarding_block)
    show_response(onboarding_block)

    # Main loop
    while True:
        user_input = input("👤 You: ").strip()

        # Quit
        if user_input.lower() in ['!q', 'quit', '!quit']:
            print("Thanks for using TravelAIAgent. Goodbye!")
            break

        # Export chat history
        if user_input.lower() == "!export":
            filename = f"travel_chat_{datetime.datetime.now().strftime('%Y%m%d_%H%M%S')}.md"
            export_chat_history(history, filename)
            continue

        # View travel profile
        if user_input.lower() == "!profile":
            show_travel_profile(memory)
            continue

        # Interpret user intent
        action = interpret_user_request(user_input)
        if action.get("intent") == "get_weather" and "location" in action:
            handle_get_weather(action, history, memory)
       ...
        else:
            # Default chat fallback
            if user_input:
                ...
                except Exception as e:
                    print("⚠️ TravelAIAgent: I had trouble generating a response. Please try rephrasing.")
                    print("Error:", str(e))

Development Journey & Challenges

During the 5-day Gen AI Intensive Course with Google, I thoroughly enjoyed exploring the course materials, which included hands-on tutorials, insightful whitepapers, and in-depth theoretical explanations. These resources laid a strong foundation for understanding and applying Gen AI capabilities using Google’s GenAI SDK.

To build TravelAIAgent, I referenced the exercise submissions from the course and followed a step-by-step approach. My primary objective was to demonstrate a broad spectrum of Gemini's capabilities—like few-shot prompting, image understanding, RAG, and more—without focusing too heavily on perfecting every implementation detail. I wanted to showcase the versatility and power of Generative AI when applied to real-world problems like travel planning.

Working within the Kaggle environment made development smooth and accessible. However, one challenge I faced was determining the most effective Input-Output style within the Kaggle Notebook interface to simulate a conversational chatbot experience. Experimenting with different formats helped me find a setup that felt intuitive and user-friendly. I ended up implementing simple yet powerful CLI Interface.

One of the standout tools I integrated was ChromaDB, which I found incredibly helpful and easy to manage. It enabled me to implement memory and vector search functionality efficiently, greatly enhancing the chatbot’s ability to recall past interactions and provide context-aware suggestions.

This project was a rewarding learning experience that allowed me to apply Gen AI in creative and practical ways, reinforcing my confidence in building with large language models and multimodal systems.

Limitations & Future Work

Current Limitations:

Event Handling: Cannot process specific days like "this Sunday" and struggles with multi-day events.
Itinerary Memory: Itinerary data is not saved if the user doesn't export it.
Preferences Overlap: The assistant may sometimes confuse or forget user preferences if too many are shared.

Future Enhancements:

Streamline result printing and memory updates.
Integrate voice input/output.
Fine-tune the assistant with travel datasets for improved contextual understanding.
Add UI elements like Gradio for a more user-friendly interface.

Built With

chromadb
gemini
ipynb
kaggle
openweathermap
python
rag
text-embedding-004
ticketmaster

Updates

Ramis H. posted an update — Apr 20, 2025 06:27 PM EDT

Submitted to Google & Kaggle

Log in or sign up for Devpost to join the conversation.

Ramis H. started this project — Apr 20, 2025 04:06 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.