Inspiration
WorldLens was born from a simple observation: for the visually impaired, the world doesn't just need to be seen - it needs to be understood in context. While static AI vision tools exist, they often act as passive answer-machines. I wanted to build a true "digital companion" that maintains a persistent memory of the user's surroundings and proactively assists them. Imagine walking down a grocery aisle and having a friend whisper, "Hey, that low-sugar cereal you were looking for is three feet to your left." That level of proactive, situationally-aware support is what inspired WorldLens.
What it does
WorldLens is a real-time multimodal assistant that sees the world through a mobile camera and explains it conversationally.
- Real-Time Voice interaction: Natural, bidirectional conversations powered by Amazon Nova Sonic.
- Persistent World Memory: Unlike traditional vision apps, WorldLens builds a cumulative "world model" of what it has seen, allowing it to remember objects and context across camera frames.
- Proactive Assistance: It alerts users to hazards or relevant items (like a specific grocery product) based on their predefined goals.
- Context-Specific Modes: High-precision pipelines for Grocery Shopping, Document Reading, and Medication Safety.
- Smart Sampling: Intelligently captures frames only when motion or speech is detected, ensuring high performance and low battery drain.
How I built it
I leveraged the full power of the Amazon Nova model suite through Amazon Bedrock:
- Amazon Nova Sonic: Acts as the central voice orchestrator, handling bidirectional speech-to-speech interaction and native tool use.
- Amazon Nova Lite: Performs the heavy lifting for multimodal scene understanding, OCR, and complex reasoning over historical session data. Also handles visual grounding and fact-verification in the MVP.
- Amazon Nova Act (Simulated): The architecture is designed to integrate Nova Act for deep external grounding; in the current MVP, this is simulated through Nova Lite-powered reasoning.
- Frontend: A Next.js mobile web app utilizing the MediaDevices API and real-time Voice Activity Detection (VAD).
- Backend: AWS Lambda and DynamoDB for session state and memory management.
- Infrastructure: Deployed via AWS CDK with a "Zero-Touch IAM" philosophy for seamless setup.
Challenges I ran into
- The Latency Barrier: Achieving sub-1.5 second end-to-end latency from "seeing" to "speaking" required careful optimization of the bidirectional stream.
- Cognitive Overload: Balancing proactive alerts so the AI is helpful but not annoying required fine-tuning the "Proactive Guardrails" and cooldown timers.
- Smart Sampling: Designing a client-side motion detection and VAD system to ensure we only send high-quality, relevant frames to Bedrock to save on costs and tokens.
Accomplishments that I'm proud of
- Native Sonic Orchestration: Successfully replacing a traditional STT -> LLM -> TTS pipeline with a single, high-speed Nova Sonic session.
- Grounded Reasoning: Using Nova Lite to verify visual observations against general knowledge and context, providing a baseline for safety.
- The "Aha!" Moment: Seeing the AI proactively chime and offer a suggestion based on an object it saw 30 seconds ago in a different part of the shelf.
- Accessibility Integration: Implementing a system of "Earcons" (audio cues) that provide state feedback to visually impaired users without interrupting the conversation.
What I learned
- Native Tool Use is a Game Changer: Nova Sonic’s ability to invoke vision tools mid-conversation drastically simplifies the architecture of real-time AI agents.
- Memory vs. Noise: I learned the importance of "Context Compression" - periodically summarizing the world model so the AI maintains situational awareness without being overwhelmed by every single frame.
What's next for WorldLens
- Long-Term Spatial Memory: Moving beyond single sessions to "remember" the layout of the user's home or local pharmacy.
- Haptic Feedback: Integrating spatial haptics to guide a user's hand toward a specific item on a shelf.
- Nova Act Expansion: Deeper integration with Nova Act for complex tasks like "Find the best deal for this medicine online and check if my insurance covers it."
Built With
- amazon-bedrock
- amazon-cognito
- amazon-dynamodb
- amazon-lambda
- amazon-nova
- amazon-web-services
- next
- next.js
- nextjs
- nova
- nova-2-lite
- nova-2-sonic
Log in or sign up for Devpost to join the conversation.