INCLUSIGHT

Making the invisible understandable.


Inspiration

This project started with something personal.

We have recently had a friend who is blind. Through them, we began to realize something we had never questioned before:

A world that feels effortless to us can be extremely challenging for someone without sight.

Simple actions, like receiving an image in a chat, become complex, multi-step processes.

As we explored further, we saw that current digital infrastructure still has many gaps in accessibility.
One of the most overlooked gaps lies in visual communication:

  • images
  • memes
  • stickers
  • emojis

These are not just visuals, they carry nuance, emotion, humor, and cultural meaning.

Yet today, blind users are largely excluded from this layer of communication.


The Problem

When a blind user receives an image in a conversation, the current process is fragmented and disruptive:

  • Download the image
  • Open another app (ChatGPT, Gemini, etc.)
  • Upload the image
  • Ask for a description
  • Wait for a response
  • Return to the original conversation

This breaks the natural flow of communication. More importantly, existing tools focus on describing what is in the image, not explaining what it means in context.


Solution

INCLUSIGHT is a real-time visual message interpreter designed for blind users.

We don’t just describe images. We interpret them.

Images → Meaning
Visuals → Understanding

Our system explains:

  • what the image shows
  • what it means in the conversation
  • the tone, emotion, or intention behind it

All delivered through fast, natural audio.


How users use INCLUSIGHT

Designed with accessibility-first interaction:

  • Trigger instantly from the chat interface
  • No typing required
  • Minimal steps
  • Immediate audio feedback

The system:

  1. Detects the image in conversation
  2. Interprets content and context
  3. Generates a concise explanation
  4. Reads it aloud instantly

Technology Behind

Our system integrates:

  • Vision-language models for context-aware interpretation
  • Prompt engineering to prioritize meaning over literal description
  • Text-to-speech (Blaze.vn) for natural Vietnamese audio output

We focus on reliability and real-world usability, not just accuracy.


Challenges We Faced

1. Speed vs. Level of Detail

There is a fundamental trade-off:

$$ \text{Speed} \uparrow \Rightarrow \text{Detail} \downarrow $$

$$ \text{Detail} \uparrow \Rightarrow \text{Latency} \uparrow $$

We solved this by:

  • Using lighter models
  • Designing more structured and precise prompts

This allowed us to maintain:

Fast response High-quality, meaningful output


2. Meaning vs. Description

Most systems optimize for:

$$ \text{Accuracy} = f(\text{Objects}) $$

But real communication requires:

$$ \text{Understanding} = f(\text{Context}, \text{Tone}, \text{Intent}) $$

We shifted the system toward interpretation, not just recognition.


Opportunities & Gaps

Through building INCLUSIGHT, we identified broader accessibility gaps:

  • Poor Vietnamese speech-to-text (lack of commas, punctuation)
  • Limited support for Vietnamese context and culture
  • Robotic, unnatural Vietnamese text-to-speech
  • Translation gaps from English → Vietnamese

These are not edge cases—they affect millions of users.


Vision

This is just the beginning.

We envision:

A digital world that is fully inclusive and accessible for blind users—especially in Vietnamese contexts.

INCLUSIGHT can evolve into:

  • An accessibility API for messaging platforms
  • A real-time interpretation layer across apps
  • A standard for inclusive visual communication

Closing

INCLUSIGHT — Making the invisible understandable.

Because connection today lives in:

  • images
  • memes
  • stickers
  • emojis

And:

Everyone deserves to understand it.

Built With

  • blaze
  • codex
  • openai
  • trae
Share this project:

Updates