Inspiration
Traveling to a foreign country is exciting — but also overwhelming. You step off the plane, surrounded by unfamiliar streets, signs in languages you can't read, and no idea where to eat, what to visit, or how to get around. We wanted to build something that turns your phone into an instant travel companion — no googling, no scrolling through outdated reviews. Just point, scan, and know.
What it does
Waypoint lets travelers open their phone camera, capture what's around them, and instantly receive AI-powered context about their surroundings — nearby landmarks, restaurants, transit options, cultural tips, and safety info. Users can then chat with Gemini to refine results based on their preferences: "Find me something vegetarian nearby," "Is this area safe at night?" or "What's the history of that building?" It's like having a local guide in your pocket.
How we built it
We built Waypoint as a progressive web app (PWA) using real-time camera input and GPS location data. The visual analysis is powered by Gemini 2.5 Pro, which processes the camera feed and geolocation to generate contextual recommendations. We implemented WebSocket streaming for low-latency responses and built a conversational interface on top so users can ask follow-up questions and refine suggestions dynamically.
Challenges we ran into
Getting accurate, real-time visual analysis from a live camera feed was a major hurdle — balancing latency with response quality required careful prompt engineering and streaming optimization. We also struggled with grounding the AI's responses in verifiable local data rather than hallucinated recommendations, and ensuring the app performed smoothly on mobile browsers with limited bandwidth.
Accomplishments that we're proud of
We built a fully functional PWA that works offline-first and delivers sub-second AI responses through streaming. The conversational refinement loop — where users can naturally narrow down results through follow-up questions — felt genuinely intuitive in testing. We're also proud of how seamlessly the camera, GPS, and AI layers work together to create a unified experience.
What we learned
We learned how critical prompt engineering is when combining visual and geospatial inputs — small changes in how we framed the context to Gemini dramatically changed output quality. We also gained experience with WebSocket architecture for real-time AI applications and learned a lot about designing mobile-first UX for high-cognitive-load tasks like navigation in unfamiliar places.
What's next for Waypoint
We want to add offline caching for previously scanned areas, multi-language voice interaction so users can speak naturally in their own language, and integration with local transit and booking APIs for one-tap actions. Longer term, we envision Waypoint as a platform that crowdsources local knowledge — letting residents contribute tips that make the AI smarter for future travelers.
Built With
- gemini
- google-adk
- google-cloud
- leaflet.js
- web-speech
- webaudio
- websockets
Log in or sign up for Devpost to join the conversation.