Inspiration
TOtoTO began with a simple observation: when exploring a large campus, people often take photos of buildings or scenes but have no idea what they’re looking at or what’s nearby. Existing navigation tools don’t help much with “visual discovery.” I wanted to build something that lets users upload a photo and instantly know where they are, find similar scenes, and even get an AI-generated explanation or “tour guide” about the location.
What I Built
TOtoTO is a lightweight system that combines:
- Image embedding for scene similarity search
- Vector indexing to find the closest visual matches
- FastAPI backend for simple deployment and integration
- A clean interface that returns top-k similar campus scenes along with optional AI commentary
The core idea is: $$\text{query_img} \xrightarrow{\text{encoder}} \mathbf{v} \xrightarrow{\text{index}} {\mathbf{v}_1,\mathbf{v}_2,\dots}$$ …then return the closest matches and generate descriptive context.
What I Learned
- How to build a minimal but efficient image-retrieval pipeline
- Managing embeddings and vector indexes for real-time search
- Handling image uploads, preprocessing, and inference in a clean backend
- Integrating LLM-based descriptions in a controlled and lightweight way
- Keeping the entire project small, understandable, and easy to extend
How I Built It
- Collected campus images and processed them into embeddings
- Built a vector index that supports fast similarity search
- Implemented the FastAPI backend (upload → encode → search → respond)
- Added optional LLM output to generate user-friendly explanations
- Packaged everything into a simple repo that anyone can run locally
Challenges
- Balancing speed vs. accuracy of the image encoder
- Keeping dependencies slim so deployment wouldn’t become a mess
- Handling noisy or low-quality photos while still returning reasonable matches
- Integrating LLM outputs without making the pipeline slow
- Making the system robust enough to handle different lighting conditions and angles
What’s Next
- Add more datasets beyond campus scenes
- Improve the UI so the experience is smoother
- Support map-based visualizations and location refinement
- Build a demo site so users can try it without running anything locally
- Add 3D campus' models to support visualization
Built With
- ai
- api
- css
- database)
- faiss
- flask
- html
- javascript
- llm
- main
- mysql
- node.js
- npm
- python
- typescript
- vite
- vue
Log in or sign up for Devpost to join the conversation.