B2Travel 🌌✈️

B2Travel changes how you save, organize, and experience your travel inspirations. Instead of losing your favorite destinations in a messy camera roll or scattered browser bookmarks, B2Travel automatically organizes them using advanced semantic AI and projects them into a fully immersive, interactive 3D VR environment.

🚀 Features

  • Chrome Extension: Right-click and save any image or quotes straight to your Second Brain.
  • AI-Powered Semantic Clustering: Our backend utilizes OpenAI's CLIP Model to extract 512-dimensional semantic embeddings from your images. Snow mountains group with snow mountains, and sunny beaches automatically group with other beaches.
  • UMAP 3D Projection: We use UMAP (Uniform Manifold Approximation and Projection) to squash high-dimensional semantic spaces down into a visual 3D coordinate system.
  • Immersive WebVR Experience: Built with A-Frame, B2Travel renders your saved destinations in a beautiful, interactive Virtual Reality "Memory Universe." Look around, use gaze-based interactions, and travel through your memories straight from your phone or VR headset.
  • Conversational Voice AI Agent: Integrated with ElevenLabs Conversational AI, you can speak directly to a VR voice assistant. Say "Take me to a summer beach," and the AI dynamically calculates the semantic coordinates and visually guides you through the VR space to that exact vibe!
  • Multi-Modal Recommendations: Select photos you like in VR by looking at them, and the frontend instantly beams the images to your Voice Agent to provide personalized travel recommendations.

🏗️ Architecture (End-to-End Pipeline)

  1. Input (Chrome Extension): Click/Save an image or text while browsing.
  2. Backend (FastAPI): Receives the data and runs it through the CLIP-ViT model to generate embeddings.
  3. Storage (MongoDB Atlas): Stores the images, text, and vector embeddings.
  4. Dimension Reduction (UMAP): It maps all vectors to 3D space.
  5. Presentation (A-Frame / CodePen): A frontend WebXR interface fetches coordinates and images, rendering them into a beautiful, explorable VR galaxy.
  6. Multi-Modal AI Interaction (ElevenLabs): A Python agent continuously listens to the user's voice. When the user asks for a vibe, it emits a real-time event that the VR frontend polls and reacts to, actively guiding the user.

🛠️ Tech Stack

  • Backend: Python, FastAPI, PyTorch, HuggingFace Transformers (CLIP), UMAP-learn
  • Voice AI Agent: ElevenLabs Conversational AI API
  • Database: MongoDB (Atlas)
  • Frontend: A-Frame (WebVR/WebXR), HTML5, Vanilla JS, CSS
  • Tools: Chrome Extension APIs, ngrok (for tunneling localhost to mobile VR)

Built With

Share this project:

Updates