✨ AuraSync: Bridging the Sensory Gap

💡 Inspiration

The inspiration for AuraSync came from a fundamental realization: accessibility tools often treat video as a series of labels rather than a cohesive experience. For the visually and hearing impaired, "what" is happening is only half the story; the "when" and "how" provide the emotional context. We wanted to build a bridge—a Universal Temporal Reference System—that gives users a cinematic yet precise understanding of video content, ensuring they never lose their place in the narrative timeline.

🚀 What it does

AuraSync is a dual-stream accessibility engine that transforms video into immersive, context-aware narratives:

---> For the Visually Impaired: It provides vivid descriptions of micro-expressions, lighting, and action, using audible timestamps as "spatial anchors" to keep the user perfectly oriented.

--->For the Hearing Impaired: It pivots from literal subtitles to Environmental Context. Instead of just "[Noise]", it identifies the source—e.g., "[The vibration shakes the floorboards]"—to create a visual representation of sound.

-->Reasoning Control: Users can adjust the Thinking Depth (Low→High), allowing the AI to perform deeper logical traces for complex scenes.

🛠️ How we built it

The project is built on a high-performance Python stack designed for multimodal efficiency:

--->**Core Intelligence'': Gemini 3 Flash handles the native video reasoning, allowing for direct frame-by-frame analysis without pre-processing.

--->Frontend: A responsive Streamlit interface that handles real-time data rendering.

--->Audio Pipeline: gTTS (Google Text-to-Speech) converts the AI’s reasoned output into a rhythmic, synchronized audio stream.

--->Logic Architecture: We utilized the google-genai SDK and custom Python logic to manage the flow between the AI's reasoning traces and the final accessibility assets.

🚧 Challenges we ran into

The primary hurdle was the Metadata Dilemma. We initially struggled with how to present timestamps. If we removed them, the user lost their place; if we kept them, the audio felt robotic. We decided to pivot our design philosophy to a **Unified Frame-Reference Architecture, treating the timecodes as a rhythmic "metronome" that provides a reliable, reference-grade stream for professional and power-user accessibility.

🏆 Accomplishments that we're proud of

--->Contextual Sound Translation: Successfully moving beyond transcription to describe the impact and vibration of sounds for deaf users.

--->Seamless Multimodal Integration: Building a system where a single video upload triggers complex reasoning, text generation, and audio synthesis in one fluid motion.

--->Human-Centric UI: Creating a clean interface that manages the complexity of AI "thinking" logs while remaining accessible.

🧠 What we learned

We gained deep insights into the Literacy Gap within the deaf community. We learned that for many, text is a second language compared to visual-spatial sign languages. This taught us the importance of Visual Grammar—using concrete, active verbs to describe the world rather than abstract metaphors.

🔮 What's next for AuraSync-Accessibility-AI

AuraSync is a foundation. Our next steps include:

--->Sign Language Avatars: Integrating SLT (Sign Language Translation) models to convert our AI-generated descriptions directly into spatial sign language.

--->Real-Time Camera Sync: Porting the logic to mobile via Gemini Live, allowing users to get "AuraSync" descriptions of the world around them in real-time.

--->Multi-Language Localization: Expanding the narrative engine to support diverse languages and cultural contexts for global accessibility.

Built With

Share this project:

Updates