MediaTextScribe

final high fidelity wireframe
user flow
initial sketches
initial sketches
video analysis and indexing
initial sketches
video analysis and live transcript
image analytics raw
raw output from image analysis

Inspiration

The internet has become such a fundamental pillar of our daily lives and greater society that most of us consider it ubiquitous. In fact, that is far from the case: of the 59% of the global population that has internet access, a large number of people cannot access the web to its fullest capacity, namely (but not limited to), those with vision impairments and those with low bandwidth capabilities, the latter of which is a pressing issue in the age of virtual learning during the COVID pandemic.

People with vision impairments often rely on screen readers to convert digital text into synthesized speech. However, with an ever growing percentage of web content being image and video-based, important context and content can only be accessed by these users if HTML alt text tags are present, which is often not the case: the W3C's Web Content Accessibility Guidelines, which have been updated twice since 1999, are only mandatory for federal agencies, leaving a tremendous proportion of the web’s accessibility standards as a non-essential practice in the hands of designers and developers.

In the case of individuals and households with low bandwidth, one accessibility issue they’re faced with is inability to efficiently stream video, a feature that’s become pivotal in the virtual learning landscape during the pandemic. From getting logged out of their learning platforms to lagging video, many students are being left behind with no viable intermediary solutions to their online learning technical limitations.

What it does

We aimed to change that. Our technology processes any URL input by the user into a webpage with image and graphic descriptions, independently of present or missing alt-text. Our optimized page presents text in a logical order resolving the issue of screen readers reading out unformatted text nonsensically. We’ve also included voiceover, translation, and low resolution video playback features for further accessibility. We also address Video playback streaming and quality issued by indexing, transcribing and labelling videos as scrolling text rather than a streaming video, which allows both users with low bandwidth as well as visually impaired users to have more equitable access to the web. The transcriptions, indexing and labelling are also translated using Azure translate into multiple languages.

How we built it

Using extensive quantitative research and user interviews to uncover the root of these issues coupled with our team’s design and development skills, we’ve built a tool that offers a viable solution to these problems, making the web more accessible to groups of people who greatly need it.

We have a Node.js front-end deployed to an Azure web app and an Azure API back-end for the image to text, translation, and video rendering. The front end handles the website parsing and sends the post request to the back end, which returns the desired output.

The backend is very Azure heavy, with multiple services being used for the functionality: Cognitive services, Vision, Video Indexing, Translation, CosmosDB and Azure Serverless Functions are all key components of the backend

Challenges we ran into

It was the first hackathon for 3 of us, so we had a lot of challenges figuring out how to make the app fit together, and to divide up tasks that took advantage of our skills.

For the front-end, the biggest challenge was working around the intricacies of JavaScript. Since it is a synchronous language, the back-end requests would not align with what was displayed to the user. We spent a lot of time debugging these issues and learning more about how JavaScript works.

As far as technical challenges go, Azure Cognitive services and other services all have different ways to authenticate, so combining all of them cohesively was a challenge. The Azure Video Indexer is a fairly new service and there was a learning curve involved. The API integration with Video Indexer has a lag so it is difficult to demo this in real time. Nailing down effective UX/UI and CSS styling in such a short amount of time on top of these technical considerations also proved to be a challenge for us.

Accomplishments that we're proud of

Despite these challenges, we were able to build a functioning tool that helps serve vulnerable communities that are often left behind in the tech space, and can continue to serve these people in the longterm. For users with impaired vision, this can reduce their dependence on alt-text across the web, giving them more control over the accessibility in their lives and giving them access to a massive amount of the internet they’re missing out on. For those with low bandwidth, this can remove the barrier to video-based information, giving people access regardless of whether a native transcription is available.

What we learned

Beyond the value of teamwork, we learned how to effectively leverage each one of our skills to create a viable product. Amy, as a UX designer, learned what it takes to effectively communicate with developers and engineers, develop a timeline, and pivot our direction in a user-centric way.

Meghna, having worked primarily on back-end before this, learned a lot about how Node.js bridges the gap between client and server, and how to use javascript.

Built With

Submitted to

DubHacks 2020
- Winner Best Use of Azure for Social Good

Created by

I worked on the Node.js front-end and connecting with back-end also. It was my first time with front end development and Node, and I learned a lot.

Meghna Shankar
Worked on Node.Js front-end and connecting with back-end.
Pug was new and fun to learn!

Vincent Yan
Muntaser Syed
i like to read good food reviews, then eat said good food but im usually broke :(
Amy Lima