Inspiration

TOtoTO began with a simple observation: when exploring a large campus, people often take photos of buildings or scenes but have no idea what they’re looking at or what’s nearby. Existing navigation tools don’t help much with “visual discovery.” I wanted to build something that lets users upload a photo and instantly know where they are, find similar scenes, and even get an AI-generated explanation or “tour guide” about the location.

What I Built

TOtoTO is a lightweight system that combines:

  • Image embedding for scene similarity search
  • Vector indexing to find the closest visual matches
  • FastAPI backend for simple deployment and integration
  • A clean interface that returns top-k similar campus scenes along with optional AI commentary

The core idea is: $$\text{query_img} \xrightarrow{\text{encoder}} \mathbf{v} \xrightarrow{\text{index}} {\mathbf{v}_1,\mathbf{v}_2,\dots}$$ …then return the closest matches and generate descriptive context.

What I Learned

  • How to build a minimal but efficient image-retrieval pipeline
  • Managing embeddings and vector indexes for real-time search
  • Handling image uploads, preprocessing, and inference in a clean backend
  • Integrating LLM-based descriptions in a controlled and lightweight way
  • Keeping the entire project small, understandable, and easy to extend

How I Built It

  1. Collected campus images and processed them into embeddings
  2. Built a vector index that supports fast similarity search
  3. Implemented the FastAPI backend (upload → encode → search → respond)
  4. Added optional LLM output to generate user-friendly explanations
  5. Packaged everything into a simple repo that anyone can run locally

Challenges

  • Balancing speed vs. accuracy of the image encoder
  • Keeping dependencies slim so deployment wouldn’t become a mess
  • Handling noisy or low-quality photos while still returning reasonable matches
  • Integrating LLM outputs without making the pipeline slow
  • Making the system robust enough to handle different lighting conditions and angles

What’s Next

  • Add more datasets beyond campus scenes
  • Improve the UI so the experience is smoother
  • Support map-based visualizations and location refinement
  • Build a demo site so users can try it without running anything locally
  • Add 3D campus' models to support visualization

Built With

Share this project:

Updates