✨ AuraSync: Bridging the Sensory Gap
💡 Inspiration
The inspiration for AuraSync came from a fundamental realization: accessibility tools often treat video as a series of labels rather than a cohesive experience. For the visually and hearing impaired, "what" is happening is only half the story; the "when" and "how" provide the emotional context. We wanted to build a bridge—a Universal Temporal Reference System—that gives users a cinematic yet precise understanding of video content, ensuring they never lose their place in the narrative timeline.
🚀 What it does
AuraSync is a dual-stream accessibility engine that transforms video into immersive, context-aware narratives:
---> For the Visually Impaired: It provides vivid descriptions of micro-expressions, lighting, and action, using audible timestamps as "spatial anchors" to keep the user perfectly oriented.
--->For the Hearing Impaired: It pivots from literal subtitles to Environmental Context. Instead of just "[Noise]", it identifies the source—e.g., "[The vibration shakes the floorboards]"—to create a visual representation of sound.
-->Reasoning Control: Users can adjust the Thinking Depth (Low→High), allowing the AI to perform deeper logical traces for complex scenes.
🛠️ How we built it
The project is built on a high-performance Python stack designed for multimodal efficiency:
--->**Core Intelligence'': Gemini 3 Flash handles the native video reasoning, allowing for direct frame-by-frame analysis without pre-processing.
--->Frontend: A responsive Streamlit interface that handles real-time data rendering.
--->Audio Pipeline: gTTS (Google Text-to-Speech) converts the AI’s reasoned output into a rhythmic, synchronized audio stream.
--->Logic Architecture: We utilized the google-genai SDK and custom Python logic to manage the flow between the AI's reasoning traces and the final accessibility assets.
🚧 Challenges we ran into
The primary hurdle was the Metadata Dilemma. We initially struggled with how to present timestamps. If we removed them, the user lost their place; if we kept them, the audio felt robotic. We decided to pivot our design philosophy to a **Unified Frame-Reference Architecture, treating the timecodes as a rhythmic "metronome" that provides a reliable, reference-grade stream for professional and power-user accessibility.
🏆 Accomplishments that we're proud of
--->Contextual Sound Translation: Successfully moving beyond transcription to describe the impact and vibration of sounds for deaf users.
--->Seamless Multimodal Integration: Building a system where a single video upload triggers complex reasoning, text generation, and audio synthesis in one fluid motion.
--->Human-Centric UI: Creating a clean interface that manages the complexity of AI "thinking" logs while remaining accessible.
🧠 What we learned
We gained deep insights into the Literacy Gap within the deaf community. We learned that for many, text is a second language compared to visual-spatial sign languages. This taught us the importance of Visual Grammar—using concrete, active verbs to describe the world rather than abstract metaphors.
🔮 What's next for AuraSync-Accessibility-AI
AuraSync is a foundation. Our next steps include:
--->Sign Language Avatars: Integrating SLT (Sign Language Translation) models to convert our AI-generated descriptions directly into spatial sign language.
--->Real-Time Camera Sync: Porting the logic to mobile via Gemini Live, allowing users to get "AuraSync" descriptions of the world around them in real-time.
--->Multi-Language Localization: Expanding the narrative engine to support diverse languages and cultural contexts for global accessibility.
Log in or sign up for Devpost to join the conversation.