I was first inspired to create a Twitch Extension like this when attending a panel by Suz Hinton (@noopkat) at Twitch Con 2017. She mentioned she was an advocate for accessibility on the web and wanted to try doing Closed Captioning on Twitch. This seeded the idea in my mind and I had it in the back of my head for a long time. Before the hackathon I started to notice broadcasters using a Closed Captioning browser source in their OBS setups, a simple section on their stream overlays dedicated to displaying speech to text closed captioning for their viewer. Seeing this rekindled the idea in the back of my mind and when the call went out for this hackathon I thought that the time was right to tackle trying to do the same but doing it as a Twitch extension video overlay that will be configurable and increase user engagement.
What it does
Twitch extension video overlay that will display Closed Captioning to the viewer with the only requirement that the broadcaster uses a companion site to do the speech to text translation. All the broadcaster has to do is install the extension, enable it, login to stream-cc.gooseman.codes and click "On" when they go live. Once it's on viewers will see Closed Captioning being display on stream that they can hide or move around, with future features coming soon.
How I built it
The extension front end was built using standard web UI technologies with ReactJS. The Companion site is a standard Ruby On Rails application hosted on Elastic Beanstalk. When a broadcaster signs in using a Chrome browser (Firefox support for Speech To Text is on the Mozilla roadmap) and they click "On". The browser will start listening to broadcasters mic and convert speech to text at near real time speeds. From there I send that information to the Server over a websocket when it passes though a simple profanity filter and then relays the payload to the broadcaster channel using the Twitch PubSub API endpoint. Now all viewers of the channel will receive the the Closed Captioning text to display. Since all viewers have a different latency to the broadcaster before displaying the text, I check the users HLS video latency and delay the display of the text for the amount of time. This helps to keep the text as in sync as possible with the broadcasters voice.
Challenges I ran into
One of the biggest challenges I encountered with solving the issue of the Closed Captioning being positioned in a location that was blocking a viewers experience. I grappled with a couple of ideas of having preset locations a broadcaster can set to position the CC where they would like it. But not all streams are created equal and having presets will not solve all peoples issues. I then realized, why not put it the power of the user to move the position of the CC text? So I added the ability to drag around the text on the viewer side so they can tailor the experience for themselves. This simple solution made me happy since now its in the power of the viewer on how they want to see the CC text.
Accomplishments that I'm proud of
I am just proud I was able to build this extension using the APIs Twitch has offered. It works and it was fun building it.
What I learned
I learned a lot about AWS and deploying a Ruby on Rails application, I never actually built and deployed anything on my own. I also learned a lot more more about the Twitch Extension API and the data available through it.
What's next for Stream Closed Captioner
There are a lot of features I want to prototype and build for the extension.
A couple ideas I like to prototype for broadcaster:
- Downloadable transcripts of their CC session
- Enable text translation for another language, ie. Korean language streamer enables translating text from Korean to English. A viewer will then be able to toggle between
- Broadcaster can customize the font used for the closed captioning text
- When a broadcaster says the name of an emote its could display the emote on screen or replace the text with the emote
- Voice activated overlay events
Ideas for the viewer experience side of the extension:
- Bit enabled actions/events with the text
- Pay bits to cause the spoken text to rain down from the top of the screen and pile up for a period of time
- Still thinking about other ideas that users can trigger with bits but not at the cost of ruining the closed captioning experience for others.