Noor — نور — "Light"

Inspiration

Screen readers have existed for decades, yet the web remains hostile to blind users. Most sites lack proper ARIA labels, semantic HTML is an afterthought, and every new JavaScript framework introduces fresh accessibility failures.

But the real pain isn't reading, it's doing. Booking a flight, filling out a government form, or completing a multi-step checkout; these transactional workflows require sighted interaction at every step.

The Question: What if an AI agent could just look at the screen and do it for you?

A sighted friend doesn't need your website to have perfect markup; they see the page, understand the layout, and click the right button. Noor is that friend.


How We Built It

Noor is a single-agent orchestrator built on Google ADK with Gemini 3.1 Pro Preview for vision and Gemini Live API for real-time bidirectional voice, deployed on Cloud Run.

The core loop is simple: the user speaks a goal ("Book me a flight to Cairo"), and Noor plans, sees, acts, and narrates—autonomously executing multi-step browser workflows.

Architecture

A NoorOrchestrator (ADK LlmAgent) with a BuiltInPlanner (2048-token thinking budget) drives 15 tools across four domains:

Domain Tools Included
Browser Maps_to_url, click_element_by_text, find_and_click, type_into_field, select_dropdown_option, fill_form, scroll_down/up, go_back
Vision analyze_current_page (Sends Playwright screenshots to Gemini for structured JSON: page type, bounding boxes, modals, and recommended actions)
Page extract_page_text, get_accessibility_tree, read_page_aloud
State explain_what_happened, task_complete

The orchestrator uses two-layer perception: the accessibility tree first (fast, structured), then vision analysis (slower, but understands visual hierarchy and layout).


Technical Deep Dive

Browser Automation

Playwright runs with a 3-strategy launch system: CDP connect for external browsers, system Edge/Chrome for Windows dev, and bundled Chromium for Docker/Cloud Run.

  • Stealth Layer: Defeats Cloudflare and DataDome via addInitScript() patches:
  • navigator.webdriver spoofing
  • window.chrome injection
  • WebGL fingerprint masking
  • Automatic cookie banner dismissal

Frontend

A zero-dependency, WCAG 2.1 AA accessible client:

  • Vanilla JS with ARIA live regions.
  • Keyboard shortcuts (Space for mic, Tab navigation).
  • Live screenshot panel with bounding-box overlays.
  • Waveform visualization and exponential-backoff reconnect.

Challenges

  • Stealth vs. The Modern Web: Real websites fight automation. We built a stealth initialization pipeline and a two-layer cookie dismissal system (CSS hiding + button click fallback) to ensure Noor works on essential sites.
  • Dual Perception Modes: Teaching the agent when to use the fast Accessibility Tree vs. the deep Vision Analysis required precise prompt engineering and tool docstrings.
  • Asyncio Conflicts: Playwright’s subprocess spawning on Windows requires ProactorEventLoop, which often conflicts with other libraries. We patched the event loop policy at every entry point.
  • Regional Constraints: The Gemini Live API (native audio) requires specific regional Vertex AI endpoints. We authored a _RegionalLiveGemini subclass to route traffic transparently.
  • Narration Pacing: Blind users lose trust if they don't know what's happening. We balanced tool execution events start, end, and live screenshots to ensure the user is never left in silence.

What We Learned

  • Vision-First > DOM-First: Gemini’s understanding of a screenshot is remarkably close to human perception. It handles cookie banners and dynamic content that usually break screen readers.
  • One Smart Agent > Three Mediocre Ones: We consolidated from a multi-agent setup to a single orchestrator. On a hackathon timeline, prompt iteration speed beats debugging inter-agent communication.
  • The Demo is the Product: Every architectural decision—from stealth evasion to narration cadence, was optimized to make the mission visceral and the experience functional.

Built With

Share this project:

Updates