B2Travel 🌌✈️
B2Travel changes how you save, organize, and experience your travel inspirations. Instead of losing your favorite destinations in a messy camera roll or scattered browser bookmarks, B2Travel automatically organizes them using advanced semantic AI and projects them into a fully immersive, interactive 3D VR environment.
🚀 Features
- Chrome Extension: Right-click and save any image or quotes straight to your Second Brain.
- AI-Powered Semantic Clustering: Our backend utilizes OpenAI's CLIP Model to extract 512-dimensional semantic embeddings from your images. Snow mountains group with snow mountains, and sunny beaches automatically group with other beaches.
- UMAP 3D Projection: We use UMAP (Uniform Manifold Approximation and Projection) to squash high-dimensional semantic spaces down into a visual 3D coordinate system.
- Immersive WebVR Experience: Built with A-Frame, B2Travel renders your saved destinations in a beautiful, interactive Virtual Reality "Memory Universe." Look around, use gaze-based interactions, and travel through your memories straight from your phone or VR headset.
- Conversational Voice AI Agent: Integrated with ElevenLabs Conversational AI, you can speak directly to a VR voice assistant. Say "Take me to a summer beach," and the AI dynamically calculates the semantic coordinates and visually guides you through the VR space to that exact vibe!
- Multi-Modal Recommendations: Select photos you like in VR by looking at them, and the frontend instantly beams the images to your Voice Agent to provide personalized travel recommendations.
🏗️ Architecture (End-to-End Pipeline)
- Input (Chrome Extension): Click/Save an image or text while browsing.
- Backend (FastAPI): Receives the data and runs it through the
CLIP-ViTmodel to generate embeddings. - Storage (MongoDB Atlas): Stores the images, text, and vector embeddings.
- Dimension Reduction (UMAP): It maps all vectors to 3D space.
- Presentation (A-Frame / CodePen): A frontend WebXR interface fetches coordinates and images, rendering them into a beautiful, explorable VR galaxy.
- Multi-Modal AI Interaction (ElevenLabs): A Python agent continuously listens to the user's voice. When the user asks for a vibe, it emits a real-time event that the VR frontend polls and reacts to, actively guiding the user.
🛠️ Tech Stack
- Backend: Python, FastAPI, PyTorch, HuggingFace Transformers (CLIP), UMAP-learn
- Voice AI Agent: ElevenLabs Conversational AI API
- Database: MongoDB (Atlas)
- Frontend: A-Frame (WebVR/WebXR), HTML5, Vanilla JS, CSS
- Tools: Chrome Extension APIs, ngrok (for tunneling localhost to mobile VR)
Log in or sign up for Devpost to join the conversation.