Mira: See Through Sound

Summary result of the page
Setting Page of Mira
Background parsing & image handling process

Inspiration: Seeing Through Sound

Mira comes from the Latin mirare — to look, to wonder. It also echoes mirror: a surface that reflects, clarifies, and helps us truly see.

We chose this name because accessibility is not only about visibility — it is about understanding, connection, and dignity.

Every day, billions of people browse the web visually, but those who rely on assistive technologies experience the web through fragmented voices: isolated captions, monotone text-to-speech, disconnected descriptions. In this noise, the story disappears.

Mira was born from a simple but radical belief: accessibility should illuminate meaning, not merely pronounce words.

By combining AI understanding with inclusive design, Mira transforms webpages into coherent narratives. It speaks not only what appears on the page but why it matters — helping listeners follow ideas, not just sentences.

We don’t just make the web aloud — we make the web as a web aloud.

In a world increasingly mediated by sound, Mira is our attempt to make that connection possible — one story, one voice, one page at a time.

Because Accessibility Means More Than Visibility

Despite an explosion of “read-aloud” extensions, the web still isn’t truly audible.

Most tools recite text linearly, inserting raw image descriptions mid-sentence. This breaks the narrative and overloads the listener’s working memory.

Psychology research has long shown that fragmented multimodal input impairs comprehension:

According to the Split-Attention Effect, separating related information across sources increases extraneous cognitive load and reduces understanding (Ayres & Sweller, 2014; Fenesi, Kramer & Kim, 2016).
In real-world use, the WebAIM Screen Reader User Survey #10 (2024) reports that over 60 % of blind or low-vision users find current web audio “disjointed” and “mentally tiring.”
An ACM TOCHI (2023) study on chart accessibility shows that when visual data lacks contextual narration, comprehension accuracy drops by more than 40%.

Existing read-aloud tools therefore achieve audibility without achieving intelligibility.

Mira addresses this gap by integrating the image’s purpose directly into the surrounding text flow — ensuring that listening becomes as meaningful and coherent as reading.

Target Users: From Those Who Need to Those Who Prefer

Mira is designed for a spectrum of listeners: from those who depend on accessibility tools to those who simply prefer listening.

1. Functional Accessibility – Those Who Need It

More than 2.2 billion people worldwide live with some form of vision impairment (WHO, 2023). Screen readers grant access to text but often fail to convey why an image appears. For blind and low-vision users, comprehension collapses whenever a figure, chart, or photograph is described out of context. Mira restores that missing connection — explaining images as evidence, contrast, or conclusion, rather than isolated objects.

2. Social Listening – Those Who Prefer It

Beyond accessibility, we now live in an audio-first culture.

Podcast audiences grow every year; Edison Research’s The Infinite Dial 2025 reports that 47 % of U.S. adults listen monthly, while Pew Research 2025 finds 86% consume news primarily through mobile audio or summaries. In the UK, the National Literacy Trust (2024) found that “listening” has surpassed “reading” as young people’s favorite way to engage with stories.

Mira therefore serves two converging worlds:

those who require inclusive design to access knowledge, and
those who choose listening as their natural mode of understanding.

By uniting both, Mira reframes accessibility as a universal experience — proving that what helps some can, and should, enhance the way everyone experiences the web.

The Mira Difference: Design Intelligence over Complexity

Mira doesn’t add complexity. It rearranges simplicity.

The web already “speaks”, but it doesn’t make sense when it speaks.

Most accessibility extensions recite every word and describe every image, but few understand how those pieces connect.

The problem isn’t a lack of AI, it’s a lack of design intelligence: the ability to orchestrate information so that listeners experience meaning, not mechanics.

Mira runs entirely within the browser using Chrome’s built-in AI stack. By leveraging Chrome's Built-in Gemini Nano (Prompt API and Summarizer API), we've created an intelligent system that:

Aspect	Traditional Read-Aloud Tools	Mira’s Approach	Underlying Technology
Processing logic	Linear DOM traversal: reads every element top to bottom	Hybrid parsing: Gemini 2.5 Flash identifies key visual elements → Readability.js extracts the main article body → hierarchical parsing structures content semantically	Gemini 2.5 Flash, Readability.js
Image handling	Reads each image as isolated alt text	Prompt API generates caption → Summarizer API integrates it into paragraph meaning (evidence, contrast, example)	Chrome Built-in AI (Prompt + Summarizer)
Comprehension outcome	“Audible but fragmented”	“Audible and coherent”: contextual, fluent, human-like

Innovation & Value

Mira introduces a new paradigm in accessible AI design — connecting meaning, not just media.

🧠 1. Understands context before it speaks

Unlike linear readers, Mira orchestrates existing on-device APIs to turn images into context-aware cues.

Prompt API first produces a concise image caption (what the image depicts). Summarizer API then analyzes the surrounding paragraph(s) + that caption to generate a short, integrated cue about the image’s role in the local argument. We speak this cue at the end of the paragraph and gate it by confidence.

🔊 2. Speaks meaning, not just words

Mira’s audio design follows the Coherence Principle from cognitive psychology: it delivers images’ roles after their textual context, preserving the listener’s mental flow.

By using asynchronous TTS buffering (chrome.tts), Mira creates a pseudo-streaming narration that feels natural, fluent, and unbroken.

🌍 3. Bridges inclusive and mainstream design

What began as accessibility assistance becomes a universal listening interface.

Mira benefits blind users who need comprehension and busy multitaskers who prefer listening — reframing accessibility as a shared cognitive improvement, not a niche feature.

In doing so, Mira redefines accessibility as a human-centered augmentation — intelligent, inclusive, and emotionally resonant.

Impact & Future

Mira is more than a feature—it’s a way to make the web understandable for everyone. By orchestrating Chrome’s Built-in AI on-device, Mira delivers context-aware narration with minimal code and maximum clarity, improving comprehension.

What’s next:

More languages & voices (auto-detect + consistent narration)
Richer contexts (infographics, slide decks, long-form real-time streaming)
User-controlled modes (concise / explanatory / educational)
Stronger accessibility integration (screen reader co-design, low-vision pilots)
Partnerships (disability advocacy groups, news/education publishers)

References

Ayres, P., & Sweller, J. (2014). The Split-Attention Principle in Multimedia Learning. In R. E. Mayer (Ed.), The Cambridge Handbook of Multimedia Learning (pp. 206–226). chapter, Cambridge: Cambridge University Press.
Fenesi, B., Kramer, E., & Kim, J. A. (2016). Split‐attention and coherence principles in multimedia instruction can rescue performance for learners with lower working memory capacity. Applied Cognitive Psychology, 30(5), 691-699.
World Health Organization. (2023, August 10). Blindness and vision impairment. Who.int; World Health Organization: WHO. https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment?
WebAIM. (2024). Screen Reader User Survey #10 Results. Webaim.org. https://webaim.org/projects/screenreadersurvey10/
Fan, D., Fay Siu, A., Rao, H., Kim, G. S. H., Vazquez, X., Greco, L., ... & Follmer, S. (2023). The accessibility of data visualizations on the web for screen reader users: Practices and experiences during COVID-19. ACM Transactions on Accessible Computing, 16(1), 1-29.
Research, E. (2024, April 3). Podcast Listening Hits Record Highs. Edison Research. https://www.edisonresearch.com/podcast-listening-hits-record-highs/
National Literacy Trust. (2025, February 25). Children and young people’s listening in 2024. National Literacy Trust. https://literacytrust.org.uk/research-services/research-reports/children-and-young-peoples-listening-in-2024/
Ritchie, H. (2024). What Share of Global CO2 Emissions Come from aviation?. Our World in Data. https://ourworldindata.org/global-aviation-emissions