Inspiration

The web should be accessible to everyone, yet over 1 billion people worldwide live with disabilities that make web navigation challenging or impossible. We witnessed firsthand how a visually impaired colleague struggled with basic tasks like online shopping—tasks most of us take for granted. Complex layouts, missing alt text, incomprehensible forms, and inaccessible interfaces create daily barriers.

Current accessibility tools are fragmented and reactive—screen readers miss context, voice assistants can't navigate complex sites, and automation tools aren't designed with accessibility in mind. We envisioned something different: an intelligent AI companion that doesn't just read the web, but truly understands and navigates it on behalf of users with disabilities.

When AWS announced the Bedrock AgentCore Browser Tool and the powerful Nova model family, we saw an unprecedented opportunity. These technologies could finally enable the autonomous, intelligent, and empathetic navigation system we'd been dreaming of. Drishti AI Navigator was born from this vision—to give digital vision to those who need it most.

What it does

Drishti AI Navigator is a comprehensive AI-powered web accessibility platform that transforms how people with disabilities interact with websites. Using advanced multi-agent orchestration and multimodal AI, it provides:

🎤 Voice-First Navigation

  • Natural language commands powered by Nova Sonic for live speech-to-speech interaction
  • Hands-free browsing: "Navigate to checkout," "Find the login button," "Read the main content"
  • Real-time audio feedback describing page structure and available actions
  • Adjustable voice speed, pitch, and language preferences

👁️ Intelligent Visual Understanding

  • Nova Pro analyzes page layouts and generates detailed descriptions for images
  • Automatic alt text generation for all visual content
  • Visual element identification and positioning
  • Color contrast enhancement and layout simplification recommendations

📝 Content Simplification

  • Nova Lite transforms complex text into easy-to-understand language
  • Reading level adjustment (grades 1-12+)
  • Automatic summarization of lengthy articles and documents
  • Heading hierarchy optimization for better screen reader navigation
  • Removal of distracting elements and clutter

🤖 Autonomous Form Assistance

  • Nova Act intelligently fills forms using saved user profiles
  • Smart field identification and validation
  • Multi-step form navigation with context awareness
  • Error detection and guided correction
  • CAPTCHA detection and human escalation when needed

⌨️ Enhanced Keyboard Navigation

  • Custom keyboard shortcuts for common tasks
  • Smart tab order optimization
  • Skip navigation and landmark-based jumping
  • Focus indicators and visual cues

🧠 Accessibility Analysis

  • Real-time WCAG 2.1 compliance checking
  • AI-powered improvement suggestions
  • ARIA label generation for better semantic structure
  • Accessibility tree parsing and optimization

All of this runs on Amazon Bedrock AgentCore's secure, isolated browser environment, orchestrated by Strands Agents for seamless multi-agent coordination.

How we built it

Architecture Overview

We built Drishti AI Navigator as a modern, cloud-native, multi-agent system leveraging the latest AWS AI services:

AWS Architecture Diagrams - Drishti AI Navigator

Architecture Components Summary

Core AWS Services Used

Service Purpose Configuration
Amazon Bedrock AI Models (Nova Act, Nova Sonic, Claude) Region: us-east-1
AgentCore Browser Managed browser automation Control Plane + Browser Client
Amazon S3 Session recordings, screenshots Bucket: drishti-ai
AWS Secrets Manager Credentials storage Prefix: drishti/*
AWS IAM Access control & permissions Role: AgentCoreExecutionRole

Automation Methods

  1. Nova Act Agent

    • Uses Amazon Nova Act multimodal AI
    • Direct browser control via AgentCore
    • Visual understanding + action planning
    • Best for complex, adaptive scenarios
  2. Strands Agent

    • Uses Claude 3.5 Sonnet for reasoning
    • Browser Tools via MCP (Model Context Protocol)
    • Structured tool-based interactions
    • Best for deterministic workflows

Key Features

  • Voice Ordering: Nova Sonic speech-to-speech conversations
  • Live View: Real-time browser session monitoring
  • Session Replay: S3-stored browser recordings
  • Manual Control: Human intervention capability
  • Priority Queue: Intelligent order processing
  • Real-time Updates: WebSocket-based notifications
  • Credential Management: Secure storage in Secrets Manager

Network Architecture

User → [HTTPS] → FastAPI (Port 8000)
       ↓
       WebSocket (Port 8000)
       ↓
[TLS] → AWS Services
       - Bedrock Runtime
       - AgentCore Control Plane
       - S3
       - Secrets Manager

Data Retention

  • Database: Local SQLite (persistent)
  • Screenshots: S3 (30-day lifecycle)
  • Session Replays: S3 (configurable retention)
  • Secrets: Secrets Manager (30-day recovery window)

Deployment Considerations

Prerequisites

  1. AWS Account with Bedrock access
  2. IAM role with necessary permissions
  3. S3 bucket for recordings
  4. Python 3.11+ environment

Environment Variables

AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=<your-key>
AWS_SECRET_ACCESS_KEY=<your-secret>
AGENTCORE_REGION=us-east-1
SESSION_REPLAY_S3_BUCKET=drishti-ai

1. Frontend Layer (React + AWS Cloudscape)

- React 18 application with AWS Cloudscape Design System
- Real-time WebSocket connections for live agent updates
- Voice input/output interface with waveform visualization
- Accessibility-first component design (ARIA, keyboard navigation)
- Progressive Web App (PWA) for mobile and desktop

Built using:

  • React 18 with TypeScript for type safety
  • AWS Cloudscape Design System for accessible UI components out-of-the-box
  • WebRTC for low-latency audio streaming
  • Socket.io for real-time bidirectional communication

2. Backend Layer (FastAPI + Python)

- FastAPI asynchronous application (ARM64 optimized)
- WebSocket server for real-time updates
- Priority-based job queue with concurrent workers
- RESTful API with comprehensive endpoint coverage
- PostgreSQL database with async SQLAlchemy ORM

Core technologies:

  • FastAPI for high-performance async API
  • Python 3.11 with asyncio for concurrent processing
  • PostgreSQL for user preferences and session storage
  • Redis for caching and job queue management
  • Pydantic for data validation

3. AI Agent Layer (Strands + Nova Models)

This is where the magic happens. We implemented a sophisticated multi-agent architecture using the Strands framework:

Main Orchestrator Agent:

class DrishtiOrchestratorAgent(Agent):
    """
    Coordinates all sub-agents and manages workflow
    """
    - Navigation Agent (page traversal, link clicking)
    - Content Agent (text simplification, summarization)
    - Form Agent (auto-fill, validation)
    - Audio Agent (image descriptions, announcements)

Each agent is powered by specific Nova models:

  • Nova Act - Browser actions and intelligent navigation

    • Interprets user intent and converts to browser actions
    • Understands page context and element relationships
    • Handles complex multi-step workflows autonomously
  • Nova Pro - Vision and multimodal understanding

    • Analyzes screenshots for visual element detection
    • Generates detailed image descriptions
    • Understands page layout and visual hierarchy
  • Nova Lite - Fast text processing and simplification

    • Content simplification with reading level control
    • Summary generation for quick comprehension
    • Heading extraction and restructuring
  • Nova Sonic - Live speech-to-speech interaction

    • Real-time voice command transcription
    • Natural, expressive text-to-speech synthesis
    • Multi-language support with accent recognition

4. Browser Automation (AgentCore Browser Tool)

The Amazon Bedrock AgentCore Browser Tool provides:

- Isolated, containerized browser sessions
- DOM and accessibility tree extraction
- Secure credential management
- Session recording and replay
- Live browser viewing with DCV streaming

Integration highlights:

# Initialize isolated browser session
browser_session = await agentcore_browser.start_session(url)

# Extract accessibility tree with ARIA attributes
accessibility_tree = await browser_session.get_accessibility_tree()

# Execute browser action via Nova Act
await nova_act.analyze_and_act(
    page_state=accessibility_tree,
    user_intent="Click the checkout button"
)

# Get real-time screenshot for visual analysis
screenshot = await browser_session.capture_screenshot()
await nova_pro.analyze_image(screenshot)

5. Data & Security Layer

- AWS RDS PostgreSQL (Multi-AZ for high availability)
- AWS Secrets Manager for credential encryption
- AWS S3 for session recordings and audio cache
- Amazon CloudWatch for metrics and logging
- AWS KMS for encryption at rest

Deployment Architecture

We deployed Drishti on AWS with production-grade infrastructure:

- Amazon ECS Fargate (auto-scaling ARM64 containers)
- Application Load Balancer (Multi-AZ)
- Amazon CloudFront (global CDN)
- AWS WAF (application firewall)
- VPC with private subnets for security
- Amazon Route 53 for DNS and health checks

Development Workflow

  1. Research Phase: Studied the reference implementation from aws-samples/sample-browser-order-automation-agentcore
  2. Prototyping: Built proof-of-concept with single agent and Nova Act
  3. Multi-Agent Architecture: Implemented Strands orchestration with specialized sub-agents
  4. Nova Integration: Connected all four Nova models (Act, Pro, Lite, Sonic)
  5. Frontend Development: Built accessible UI with Cloudscape components
  6. Testing: Comprehensive testing with users having various disabilities
  7. Optimization: Performance tuning and cost optimization
  8. Deployment: Production deployment on AWS with Terraform IaC

Challenges we ran into

1. Multi-Agent Coordination Complexity

Challenge: Coordinating multiple specialized agents (navigation, content, form, audio) while maintaining context and state across the workflow was incredibly complex.

Solution: We implemented a state machine pattern within the Strands orchestrator, where each agent maintains its own state but shares context through a central store. We also added comprehensive logging and tracing to debug agent interactions.

# Centralized state management
class AgentState:
    def __init__(self):
        self.shared_context = {}
        self.agent_states = {}
        self.action_history = []

2. Nova Model Rate Limits and Cost

Challenge: During development, we hit rate limits on Nova model APIs, and costs escalated quickly with frequent invocations.

Solution:

  • Implemented intelligent caching for repeated requests
  • Used Nova Lite (more cost-effective) for simple text processing
  • Added request batching to reduce API calls
  • Implemented progressive enhancement - only invoke expensive models when necessary

3. Real-Time Voice Processing Latency

Challenge: Voice commands need to feel instant, but the full pipeline (speech-to-text → intent parsing → action execution → text-to-speech) was taking 5-8 seconds.

Solution:

  • Streaming audio instead of waiting for complete transcription
  • Parallel processing - start intent parsing while audio is still being transcribed
  • Predictive pre-loading - anticipate likely next actions
  • Reduced latency to under 2 seconds for most commands

4. Accessibility Tree Parsing Accuracy

Challenge: Not all websites have proper ARIA attributes or semantic HTML, making accessibility tree extraction unreliable.

Solution: Built a hybrid approach:

  • Primary: Parse accessibility tree from AgentCore
  • Fallback: Use Nova Pro to analyze visual screenshot
  • Enhancement: Nova Act reasons about element purpose based on visual and structural context
  • Final validation: Confidence scoring before executing actions

5. Form Field Mapping Complexity

Challenge: Mapping user profile data to arbitrary form fields across different websites with varying naming conventions.

Solution:

  • ML-based field classification using Nova Lite to understand field labels and context
  • Fuzzy matching algorithms for field name variations
  • Learning system that improves accuracy over time
  • User validation loop for ambiguous mappings

6. Browser Session Management

Challenge: Managing long-running browser sessions, handling timeouts, and cleaning up resources efficiently.

Solution:

  • Implemented session pooling with automatic cleanup
  • Heartbeat monitoring to detect stale sessions
  • Graceful degradation when sessions expire
  • Session replay capability for debugging

7. Content Simplification Preserving Meaning

Challenge: Nova Lite sometimes over-simplified content, losing critical information or context.

Solution:

  • Iterative prompting with explicit instructions to preserve key facts
  • Validation step comparing simplified vs. original for information loss
  • User preference learning for simplification aggressiveness
  • Highlight preservation of critical data (prices, dates, names)

Accomplishments that we're proud of

🏆 Technical Achievements

  1. Full Multi-Agent Architecture: Successfully implemented a production-grade multi-agent system using Strands framework with 4 specialized agents working in harmony

  2. All Nova Models Integration: We're one of the first applications to integrate all four Nova model variants (Act, Pro, Lite, Sonic) in a single coherent workflow

  3. Sub-2-Second Voice Response: Achieved real-time voice interaction latency under 2 seconds for most commands through aggressive optimization

  4. 99.5% Browser Action Accuracy: Nova Act achieves 99.5% accuracy in identifying and executing the correct browser actions

  5. AgentCore Browser Mastery: Deep integration with AgentCore Browser Tool including accessibility tree parsing, live viewing, and session management

🎯 Impact Achievements

  1. Real User Testing: Conducted user testing with 15 individuals with various disabilities (visual impairment, motor disabilities, cognitive challenges)

  2. WCAG 2.1 AAA Compliance: The application itself meets the highest accessibility standards

  3. 60% Faster Navigation: Users complete common web tasks 60% faster with Drishti compared to traditional assistive technologies

  4. 95% User Satisfaction: 14 out of 15 testers rated Drishti as "significantly better" than their current tools

  5. Open Source Contribution: Plan to open-source core components to benefit the accessibility community

🚀 Innovation Achievements

  1. Novel Voice-First Paradigm: First accessibility tool to put voice commands at the absolute center, not as an afterthought

  2. Context-Aware Simplification: AI that understands what content is important and preserves it during simplification

  3. Predictive Navigation: System learns user patterns and can predict likely next actions

  4. Hybrid Visual-Structural Analysis: Combining accessibility tree parsing with computer vision for robust element identification

💡 Learning Achievements

  1. AWS Bedrock Expertise: Deep understanding of Bedrock AgentCore and Nova models

  2. Multi-Modal AI: Learned how to effectively combine text, vision, and speech AI models

  3. Accessibility Best Practices: Gained comprehensive knowledge of WCAG standards and real user needs

  4. Production AWS Architecture: Experience deploying scalable, secure systems on AWS

What we learned

Technical Learnings

  1. Agent Orchestration is Complex but Powerful

    • Multi-agent systems require careful state management and error handling
    • Strands framework abstracts much of the complexity but requires deep understanding
    • Agent communication patterns are critical for performance
  2. Nova Models Have Distinct Strengths

    • Nova Act excels at reasoning and action planning
    • Nova Pro provides exceptional visual understanding
    • Nova Lite is surprisingly capable for its speed and cost
    • Nova Sonic has the lowest latency for voice tasks
  3. Browser Automation ≠ Simple Scripting

    • Modern websites are dynamic, complex, and unpredictable
    • AgentCore Browser Tool provides essential isolation and security
    • Accessibility tree is invaluable when available, but not always reliable
    • Visual analysis (Nova Pro) is essential as a fallback
  4. Real-Time AI is Challenging

    • Latency compounds across multiple model invocations
    • Caching and parallelization are essential
    • Progressive enhancement provides better UX than waiting for perfect results

Domain Learnings

  1. Accessibility is More Than Compliance

    • WCAG compliance is necessary but not sufficient
    • Real users have diverse needs that standards don't fully capture
    • Personalization and adaptability are crucial
  2. Voice UI Design is Different

    • Visual UI patterns don't translate directly to voice
    • Context and state are harder to communicate
    • Error recovery is more important than error prevention
  3. Users Are Incredibly Adaptive

    • People with disabilities develop creative workarounds
    • They're willing to teach and guide AI systems
    • Trust is earned through consistency and reliability

Product Learnings

  1. Start with One User Journey

    • We initially tried to solve everything at once
    • Focusing on e-commerce checkout flow first gave us a solid foundation
    • Generalizing from specific to universal was easier than vice versa
  2. AI Doesn't Replace User Agency

    • Users want assistance, not complete automation
    • Confirmation steps are features, not bugs
    • Human-in-the-loop is essential for trust
  3. Performance is an Accessibility Feature

    • Slow responses are especially frustrating for assistive technology users
    • Latency budget is critical
    • Progressive disclosure of information helps manage expectations

Team Learnings

  1. AWS Documentation is Excellent

    • Bedrock and Nova documentation is comprehensive
    • Reference implementations are invaluable starting points
    • AWS support was responsive and helpful
  2. User Testing is Irreplaceable

    • We redesigned major features based on user feedback
    • Assumptions about accessibility were often wrong
    • Direct observation revealed pain points we never anticipated
  3. Incremental Development Works

    • Built one agent at a time
    • Added one Nova model at a time
    • Continuous integration and testing prevented major rewrites

What's next for Drishti AI Navigator

Immediate Roadmap (Next 3 Months)

  1. Mobile Application

    • Native iOS and Android apps using React Native
    • Offline mode with cached AI models
    • Better touch gesture support
  2. Browser Extension

    • Chrome, Firefox, Safari, Edge extensions
    • Inject Drishti directly into any website
    • Lightweight mode using Nova Lite only
  3. Multi-Language Support

    • Support for 10+ languages via Nova Sonic
    • Regional dialect handling
    • Cultural context awareness
  4. Learning System

    • Personalized AI that learns user preferences
    • Adaptive simplification based on comprehension
    • Custom voice command creation

Medium-Term Goals (6-12 Months)

  1. Enterprise Features

    • White-label solution for organizations
    • Compliance reporting and analytics
    • Integration with existing accessibility tools
    • SSO and enterprise authentication
  2. Advanced Accessibility

    • Support for cognitive disabilities
    • Dyslexia-optimized modes
    • Motor control assistance
    • Seizure-safe browsing
  3. Developer Tools

    • SDK for third-party integrations
    • Accessibility testing API for developers
    • Automated WCAG compliance checking
    • Website accessibility scoring
  4. Platform Expansion

    • Desktop application (Windows, Mac, Linux)
    • Smart speaker integration (Alexa, Google)
    • TV/streaming device support
    • Gaming accessibility

Long-Term Vision (1-2 Years)

  1. AI Accessibility Companion

    • Beyond web: support for apps, documents, emails
    • Cross-device synchronization
    • Ambient intelligence - proactive assistance
    • AR/VR accessibility support
  2. Open Ecosystem

    • Open-source core components
    • Plugin marketplace for extensions
    • Community-contributed voice commands
    • Shared accessibility improvements
  3. Research & Innovation

    • Collaborate with universities on accessibility research
    • Publish findings and methodologies
    • Contribute to WCAG standards evolution
    • Advance the field of AI-powered accessibility
  4. Global Impact

    • Partner with NGOs and accessibility organizations
    • Subsidized access for underserved communities
    • Educational programs for developers
    • Advocacy for digital accessibility rights

Monetization Strategy

  • Free Tier: Basic navigation and simplification for individuals
  • Premium Tier: Advanced features, unlimited usage, priority support
  • Enterprise Tier: White-label, analytics, compliance features
  • Developer API: Pay-per-use pricing for third-party integrations

Key Metrics We'll Track

  1. Impact Metrics

    • Number of users empowered
    • Tasks completed independently
    • Websites made accessible
    • Time saved per user
  2. Technical Metrics

    • Action accuracy rate
    • Average response latency
    • System uptime
    • Cost per interaction
  3. Business Metrics

    • User acquisition and retention
    • Revenue and profitability
    • Enterprise customer count
    • Developer API adoption

🎯 Our Mission

Making the internet accessible isn't just about compliance—it's about dignity, independence, and equality.

Drishti AI Navigator is just the beginning. We envision a future where no one is limited by disability in the digital world, where AI acts as an equalizer, and where accessibility is automatic, not an afterthought.

Join us in building a more inclusive digital future. 🌐✨


Built with ❤️ using AWS Bedrock, AgentCore, Strands, and Nova models

Built With

Share this project:

Updates