Automated Audio Description

Inspiration

The Automated Audio Description extension was inspired by combining two things that I am passionate about—Twitch & making the world more accessible to people who are DeafBlind, blind, or low-vision.

Automated closed captioning extensions are readily available to help those who are Deaf, hard of hearing, or just prefer to have captions displayed. However, it's much harder to provide Audio Description (narrated or textual descriptions of visual elements) for community members who are DeafBlind, blind, or low-vision.

My goal with the Automated Audio Descriptions extension is to ensure that all community members can enjoy a similar stream experience, allowing them to engage and interact in ways they couldn’t before. It elevates the professionalism of the stream by setting a new gold standard for what Enhancing Stream Appearances means: Not only focusing on visual appeal, but also on providing the necessary resources for everyone in the community to fully experience the stream.

What it does

This extension provides an accessible interface for community members to fetch a description of what is currently happening on stream visually, and provides that description in text and audio format.

How we built it

We built the Automated Audio Description extension using the Twitch API, the OpenAI API with the GPT-4o model, AWS Polly, server side processing with PHP, and a custom extension configuration page. The Twitch API allows us to fetch a thumbnail of the stream, which we pass to the GPT-4o API with customized instructions that help it generate a great description of the visual content. Once we have the description, we pass the text along to AWS Polly to generate an audio file for the user. Finally, the text & audio file is displayed to the user in a panel, component, or video overlay, depending on the streamer's preference.

Challenges we ran into

One of the biggest challenges was ensuring the descriptions generated by AI were both accurate and meaningful to DeafBlind, blind, and low-vision users. We had to test with many different stream thumbnails to refine the prompts we send to the ChatGPT API to make sure that it would focus on relevant visual elements and strike a good balance balance between thorough and brief.

Accomplishments that we're proud of

We’re proud of creating a tool that bridges an accessibility gap in the streaming world. This extension gives streamers an easy, automated way to enhance their community's inclusivity with minimal effort.

We’re also proud of the fact that the descriptions generated by AI can capture nuanced visual details, bringing all users closer to the real-time experience of the stream.

We're proud to bring the findings back to an Audio Description focused project called UniDescription, of which I am the Chief Technology Officer.

What we learned

This was the first Twitch extension I built, so it was a great learning experience all around. I especially liked learning about and using the Extension Configuration Service as an easy way to store streamer preferences.

What's next for Automated Audio Description

We're committed to continually improving the accuracy of our audio descriptions. We plan to explore features that allow for more streamer customization, such as providing context-specific information or integrating with other popular streaming tools. Our ultimate goal is to make Automated Audio Descriptions a no-brainer extension used by streamers that want to create truly professional, inclusive, and accessible streams.

Built With

amazon-web-services
chatbot
laravel
octobercms
openai
php
polly
twitchapi

Updates

Joe Oppegaard started this project — Oct 20, 2024 12:41 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.