INCLUSIGHT
Making the invisible understandable.
Inspiration
This project started with something personal.
We have recently had a friend who is blind. Through them, we began to realize something we had never questioned before:
A world that feels effortless to us can be extremely challenging for someone without sight.
Simple actions, like receiving an image in a chat, become complex, multi-step processes.
As we explored further, we saw that current digital infrastructure still has many gaps in accessibility.
One of the most overlooked gaps lies in visual communication:
- images
- memes
- stickers
- emojis
These are not just visuals, they carry nuance, emotion, humor, and cultural meaning.
Yet today, blind users are largely excluded from this layer of communication.
The Problem
When a blind user receives an image in a conversation, the current process is fragmented and disruptive:
- Download the image
- Open another app (ChatGPT, Gemini, etc.)
- Upload the image
- Ask for a description
- Wait for a response
- Return to the original conversation
This breaks the natural flow of communication. More importantly, existing tools focus on describing what is in the image, not explaining what it means in context.
Solution
INCLUSIGHT is a real-time visual message interpreter designed for blind users.
We don’t just describe images. We interpret them.
Images → Meaning
Visuals → Understanding
Our system explains:
- what the image shows
- what it means in the conversation
- the tone, emotion, or intention behind it
All delivered through fast, natural audio.
How users use INCLUSIGHT
Designed with accessibility-first interaction:
- Trigger instantly from the chat interface
- No typing required
- Minimal steps
- Immediate audio feedback
The system:
- Detects the image in conversation
- Interprets content and context
- Generates a concise explanation
- Reads it aloud instantly
Technology Behind
Our system integrates:
- Vision-language models for context-aware interpretation
- Prompt engineering to prioritize meaning over literal description
- Text-to-speech (Blaze.vn) for natural Vietnamese audio output
We focus on reliability and real-world usability, not just accuracy.
Challenges We Faced
1. Speed vs. Level of Detail
There is a fundamental trade-off:
$$ \text{Speed} \uparrow \Rightarrow \text{Detail} \downarrow $$
$$ \text{Detail} \uparrow \Rightarrow \text{Latency} \uparrow $$
We solved this by:
- Using lighter models
- Designing more structured and precise prompts
This allowed us to maintain:
Fast response High-quality, meaningful output
2. Meaning vs. Description
Most systems optimize for:
$$ \text{Accuracy} = f(\text{Objects}) $$
But real communication requires:
$$ \text{Understanding} = f(\text{Context}, \text{Tone}, \text{Intent}) $$
We shifted the system toward interpretation, not just recognition.
Opportunities & Gaps
Through building INCLUSIGHT, we identified broader accessibility gaps:
- Poor Vietnamese speech-to-text (lack of commas, punctuation)
- Limited support for Vietnamese context and culture
- Robotic, unnatural Vietnamese text-to-speech
- Translation gaps from English → Vietnamese
These are not edge cases—they affect millions of users.
Vision
This is just the beginning.
We envision:
A digital world that is fully inclusive and accessible for blind users—especially in Vietnamese contexts.
INCLUSIGHT can evolve into:
- An accessibility API for messaging platforms
- A real-time interpretation layer across apps
- A standard for inclusive visual communication
Closing
INCLUSIGHT — Making the invisible understandable.
Because connection today lives in:
- images
- memes
- stickers
- emojis
And:
Everyone deserves to understand it.
Built With
- blaze
- codex
- openai
- trae
Log in or sign up for Devpost to join the conversation.