Aurora

Inspiration

Navigating the internet as a blind user is overwhelmingly difficult. Existing screen readers often feel rigid, slow, or detached from modern AI advancements. Aurora was inspired by the idea of a web that listens, understands, and acts — a browser where users describe what they need, and AI interprets, reads, and interacts on their behalf naturally.

What it does

Aurora is an AI-driven browser designed specifically for blind and visually impaired people. It listens to spoken commands through LiveKit’s real-time speech interface, transcribes and understands intent with Groq Whisper 3, and automates browsing actions with Playwright MCP. Users can ask Aurora to “open my email,” “summarize this page,” or “read only the important links,” and it handles everything in real time — no mouse, no menus, no visual clutter

How we built it

Aurora’s core pipeline combines:

Playwright MCP (multi-code protocol) to drive browser automation, DOM traversal, and interaction with web elements.

LiveKit for ultra-low-latency voice streaming and bidirectional speech support.

Groq Whisper 3, the heart of Aurora’s speech-to-text layer, enabling lightning-fast and accurate command recognition on-device.

Mem0 for persisting memory over sessions.

Custom prompt orchestration to interpret user intent, trigger Playwright scripts, and return readable or spoken summaries using integrated generative models.

The frontend was built in a minimal Electron environment to keep it lightweight and cross-platform, while backend operations run asynchronously to maintain responsiveness.

Challenges we ran into

Voice intent parsing: Translating freeform, natural-language voice input into structured browser actions was harder than expected.

Accessibility testing: Simulating blind-user workflows revealed subtle issues — for example, ARIA tags and page segmentation matter far more for spoken navigation than for visual browsers.

Latency constraints: Ensuring that speech streaming through LiveKit stayed real-time even under network variation required careful threading and buffering control.

Accomplishments that we're proud of

Built a fully voice-driven browser from scratch using Playwright automation and real-time STT (speech-to-text).

Achieved sub-150ms latency speech recognition with Groq Whisper 3 — fast enough for natural conversation.

Successfully ran user tests with visually impaired participants who were able to independently browse and summarize live web content for the first time without external help.

What we learned

Building for accessibility isn’t just about adding “voice support.” It’s about rethinking the interface entirely. Text, layout, and interaction flow matter differently for someone who experiences the web acoustically. We also learned that edge speech inference (like Groq’s) completely changes what “real-time” feels like.