Inspiration
This project was inpirated by the AR interfaces in manga and anime, where the protagonist can chat with a "system" to interact with the world. A second inspiration was Pokemon go, the AR capabilities and geolocation integrations encourages people to go outside and interact with the world.
What it does
WalkWise-Just4Fun is a mobile-first web application that acts as a real-time, AI-powered walking companion. At its core, it features:
- Continuous Edge-AI Object Detection: It runs YOLOv8 directly in the browser to detect objects in the camera feed without relying on costly or latency-heavy cloud inferences for basic framing.
- Context-Aware Q&A: When the user has a question about their surroundings or needs a more detailed description, they can tap a button to capture a snapshot. Google Vision and Gemini analyze the scene, transcribe the user's voice prompt, and provide conversational answers in a fantasy doungeon master style.
- Proactive Narration & Guardian Mode: Using ElevenLabs TTS, the AI speaks the answers naturally, providing a immersive DnD like experience.
How we built it
- Computer Vision: We used
ONNX Runtime Webto run a lightweightyolov8n.onnxmodel entirely client-side, falling back to WASM when WebGPU wasn't available. For complex queries requiring deep semantic understanding, we integrated the Google Vision API. - Voice & Intelligence: We utilized Gemini 2.0 Flash for fast audio transcription and multimodal reasoning, and the ElevenLabs API for expressive, low-latency text-to-speech.
Challenges we ran into
- API Costs & Latency: We originally faced challenges with passive API costs and latency. We solved this by shifting the continuous monitoring to local YOLOv8 and only triggering Gemini/Vision for on-demand for more detail obejct description.
- Scoping the application: There was a lot of ideas for this app, many of which are too hard to implement during a short hackathon, we had to boild it down to a more polish minimal product.
Accomplishments that we're proud of
Setting up a smoth pipeline integrating multiple external APIs, and in the end we have a product that is fun for us to use and was fun to build and test.
What we learned
We learned about how to integrate multiple external services into a single cohesive product, and got a taset of building AR application, which we will explore more in the future.
What's next for WalkWise-Just4Fun
- Custom Trained Models: Upgrading from standard YOLOv8 to custom models trained specifically on navigation hazards (e.g., uneven sidewalks, specific types of crosswalks, overhanging branches).
- Offline Mode: Transitioning more of the conversational AI capabilities to run locally (via WebNN or similar technologies) to ensure the app works fully in areas with no cell coverage.
- Advanced Gamification: Introducing location-based exploration features, achievements, and interactive storytelling to further blend utility with fun. we had planned Alternate Reality Game features like using QR code and Vision API to register object for in person event like escape room and treasure hunt, but due to time constrain is not implemented.
- Explore IOS platform tools: Apple provides many of the tools we need to make this offline capable and more fun, such as ARKit and realityKit, and coreML, we want to explore building a native client with better platform integration compared to web app.
Log in or sign up for Devpost to join the conversation.