Inspiration

Ever watched someone struggle with a new app — clicking the wrong buttons, getting lost in menus, wishing someone could just show them where to go? That frustration sparked Nixoraa.

We were inspired by the idea that interfaces shouldn't require a manual. Just like a GPS guides you through unknown roads, Nixoraa acts as a real-time co-pilot for any digital interface — understanding your intent and navigating the UI on your behalf. The name Nixoraa blends "navigate" and "explore" — a nod to its core purpose of intelligent UI exploration.

What it does

Nixoraa is an AI-powered UI navigation agent that understands natural language commands and autonomously interacts with web interfaces to complete tasks. You describe what you want — it figures out how to do it.

Key capabilities:

  • Natural language control — tell it what you want in plain English
  • Visual element detection — identifies buttons, forms, and menus on any page
  • Step-by-step guidance — highlights the next action and explains each step
  • Cross-platform support — works across web apps, dashboards, and multi-page flows

How we built it

Nixoraa is built on a multi-layer architecture combining vision, language understanding, and action execution:

  • LLM core — Claude/GPT-4 for intent parsing and step reasoning
  • DOM parser — extracts semantic structure from page elements
  • Vision module — screenshot analysis using multimodal AI
  • Action executor — Playwright/Puppeteer for browser automation
  • React frontend — overlay UI showing live navigation state
  • FastAPI backend — orchestrates agent, memory, and task queue

The agent loop runs as:

observe → reason → act → verify → repeat

At each step, the model computes the most likely action using a confidence score:

$$\text{action}^* = \underset{a \in \mathcal{A}}{\arg\max} \; P(a \mid \text{intent},\, \text{DOM_state},\, \text{history})$$

Challenges we ran into

  • Ambiguous intent — "Go to settings" can mean 5 different things in different apps
  • Dynamic DOM changes — SPAs update content without page reloads, breaking element targeting
  • Latency — chaining LLM calls with browser actions added noticeable delays
  • Hallucinated actions — the model occasionally tried to click elements that didn't exist
  • Cross-origin restrictions — iframes and CORS policies limited DOM access on some sites

Accomplishments that we're proud of

  • Built a working end-to-end agent loop within the hackathon timeframe
  • Achieved ~78% task completion rate on benchmark UI flows
  • Designed an overlay UI that doesn't obstruct the page being navigated
  • Successfully handled multi-step flows: login → search → form fill → submit
  • Kept the system modular — swappable LLM backend and browser engine

What we learned

  • Grounding LLM reasoning in structured DOM data dramatically reduces hallucinations
  • Streaming partial results keeps UX feeling fast even with multi-step LLM chains
  • Accessibility attributes (aria-labels, roles) are goldmines for semantic element identification
  • Agentic systems need explicit failure recovery — not just "what to do" but "what to do when it fails"
  • Multimodal AI + DOM parsing is far more powerful than either alone

What's next for Nixoraa — next generation exploration agent

  • Browser extension — inject Nixoraa into any site with one click
  • Memory & personalization — remember user preferences and frequent workflows
  • Voice interface — hands-free navigation via speech commands
  • Accessibility mode — assist users with disabilities navigating complex UIs
  • Enterprise API — let companies embed Nixoraa into their own onboarding flows
  • Offline mode — lightweight local model for privacy-sensitive environments
Share this project:

Updates