Inspiration

The idea for WanderLens was inspired from a universal travel frustration we've all experienced: standing in front of something amazing and thinking, "What am I looking at?". We found ourselves constantly juggling multiple apps while traveling such as Google Lens for identification, Google Translate for signs, Google Maps for finding places and notes for remembering where we'd been. It was exhausting. We'd pull out our phones to learn about a landmark, then get lost in a rabbit hole of tabs and apps, missing the actual moment we came to experience. We noticed how this got in the way of spontaneous exploration. People would walk past incredible historical sites simply because they didn't know what they were looking at, or they'd struggle with menu translations for so long that they'd just point at random items. Tour guide apps existed, but they required planning, following set routes, and constantly staring at your screen instead of experiencing the place. We wanted something different, an all-in-one solution that makes travel feel effortless. Something that answers "What is this?" instantly, breaks down language barriers in real-time, suggests nearby discoveries without overwhelming you, and captures your journey. That's how WanderLens was created: your smart travel companion that turns curiosity into instant knowledge.

What it does

WanderLens is an application that transforms your mobile camera into an intelligent travel guide, offering four main features that work seamlessly together:

  1. Landmark Recognition Point your camera at any monument, building, or landmark and tap "Identify Landmark." Our AI instantly recognizes it and displays the name with contextual information in a beautiful overlay.
  2. Real-Time Translation Encounter a sign, menu, or notice in a foreign language? Select your target language and tap "Translate to Target." The app uses OCR (Optical Character Recognition) to read the text in view, detects the source language automatically, and displays the translation instantly.
  3. Nearby Discovery Curious what's around you? Use the "Discover Near Me" feature to find curated attractions and authentic food spots within walking distance. Each result shows distance, estimated walking time, descriptions, and one-tap directions. Toggle between "Attractions" and "Food Spots" tabs to find exactly what you're looking for.
  4. Digital Passport Every landmark you identify and place you discover gets automatically saved to your Digital Passport, a visual journal of your travels. Each entry includes a photo, timestamp, and description, creating a searchable memory bank of your adventures. Gesture Control Magic WanderLens features hands-free gesture controls powered by MediaPipe, letting you navigate without touching your phone:

โœŒ๏ธ Peace sign: Start a 3-second countdown to scan a landmark ๐Ÿ‘ Thumbs up: Save current result to your Digital Passport ๐Ÿ‘Ž Thumbs down: Cancel actions

How we built it

WanderLens is built with a Python Flask backend and a vanilla JavaScript frontend.

Frontend Architecture (Web Interface) HTML/CSS/JavaScript: We chose vanilla JavaScript over frameworks to keep the app lightweight and ensure fast load times on mobile devices. We also used CSS custom properties for theming, creating a native app-like experience with smooth animations and intuitive touch interactions. MediaPipe Gesture Recognizer: Integrated Google's MediaPipe library for real-time hand gesture detection. LocalStorage for Persistence: Implemented browser localStorage to save the Digital Passport entries locally, ensuring users don't lose their travel memories even without a backend database.

Backend Architecture (Flask Server) Flask with CORS: Lightweight Python backend with CORS enabled for seamless cross-origin requests during development Google Cloud Vision API: Landmark Detection: Identifies famous landmarks, monuments, and buildings. Text Detection (OCR): Extracts text from images with advanced filtering to remove noise and low-confidence results. Google Cloud Translation API: Detects source languages automatically and translates extracted text into the user's selected target language.

Challenges we ran into

Building WanderLens presented integration challenges as four developers worked on separate features such as landmark recognition, translation, hand gesture controls, and nearby place discovery. Our biggest hurdles were overlapping UI elements, where the landmark overlay, translation results, and attractions popup competed for screen space, causing visibility issues and layout shifts that required careful coordination of z-index values and positioning strategies. We also struggled with OCR noise, as raw text detection captured everything such as prices, watermarks, small labels, creating cluttered, unusable results. We spent time developing smart filtering algorithms using bounding box area ratios and confidence thresholds to distinguish between primary content (like signs and menus) and background noise. These challenges taught us that merging complex features requires not just technical skill, but constant communication and careful architectural planning to create a cohesive user experience.

Accomplishments that we're proud of

We're incredibly proud of what we built and how it came together. Seamless Multi-Feature Integration We successfully integrated our independent features into a single application that feels easy to use. You can easily flow from scanning a landmark to translating a nearby sign to discovering restaurants, all without switching apps or losing context, which is exactly the experience we envisioned. Hands-Free Innovation The gesture control system is something we're particularly proud of. It makes the app more usable when you're traveling with bags or taking photos. Smart Data Filtering Our OCR filtering algorithm was a success. We transformed noisy, cluttered text detection results into clean, readable translations that focus on what actually matters. Collaborative Success Most importantly, we're proud that the members with different specializations successfully collaborated to build something greater than the sum of its parts. Each person brought unique expertise, and through clear communication and mutual support, we turned individual modules into a unified product.

What we learned

We gained hands-on experience with Google Cloud Vision API for landmark detection and OCR, learning how to parse complex JSON responses, handle confidence scores, and filter bounding box data. We also mastered the Google Translation API for automatic language detection and multi-language support. Most importantly, we learned proper API key management using environment variables and .env files to keep credentials secure. Computer Vision & Image Processing We discovered that raw OCR output is messy and requires intelligent filtering. We learned to implement geometric calculations for bounding boxes, analyze area ratios relative to image dimensions, and apply confidence thresholds to extract meaningful text while eliminating noise. This taught us that computer vision isn't just about calling an APIโ€”it's about understanding and processing the results intelligently. MediaPipe & Gesture Recognition Integrating MediaPipe for hand gesture detection taught us about real-time video frame processing, confidence thresholds, and the balance between sensitivity and false positives.

What's next for WanderLens

WanderLens has a foundation, but we envision taking it much further. Our immediate priorities include enhancing landmark information by adding "Learn More" buttons that link to Wikipedia articles, Google searches, and official websites for deeper exploration, along with rich historical context, opening hours, and visitor information. We want to make the Digital Passport more powerful by allowing users to rename entries with personal memories, add notes and reflections, organize entries into trips with timeline visualizations, and export beautifully formatted PDFs for sharing. Beyond these core improvements, we plan to implement cloud database integration for cross-device sync, add personalized recommendations that learn user preferences over time, create curated walking routes connecting multiple attractions, and develop offline mode with cached data for use without internet. We're also exploring conversation mode for two-way translation with locals, voice commands for truly hands-free operation, and transforming WanderLens into a Progressive Web App for app-like installation and offline functionality. Our ultimate vision is to make WanderLens the go-to travel companion that makes every journey more accessible, understandable, and exciting.

Built With

Share this project:

Updates