Crystalrohr

Crystalrohr Home
Queryable Video
Encrypted History
Encrypted Notes

Inspiration

Crystalrohr is actively developed by a group of passionate individuals at Dreampiper, who share the desire to make the internet a more inclusive space for everyone, regardless of their abilities. While we don't have serious visual impairments, we are no strangers to having blurry vision and the need for glasses, so we know the struggles faced by people with vision disabilities when accessing video content.

What it does

Crystalrohr's platform provides automatic captions that make it easier for people with visual impairments to know what's happening in video content. It is now possible for anyone with a visual impairment to ask about what is happening in a video scene without needing a human guide, additionally, the in-video search feature (not yet in production) should allow users to find specific moments within a video, making it easier to navigate through long videos. With the help of Crystalrohr, people with vision disabilities can now enjoy the diverse content available on YouTube and experience the power of connection and creativity that the platform offers.

How we built it

The Canvas API gets an image off a playing video, which gets sub-divided into a 4 by 8 block segment, next we get the dominant color of each block and store them as an array in memory, we then repeat the process for the next incoming frame and sub-divide just like before, and get the dominant colors, then we make a comparison for the next step. Once we get an average of the color change using the CIEDE2000 formula read more on color difference, we ascertain above a certain threshold whether a scene has changed and then request an inference from the Blip 2 API, which is an advanced vision-language model hosted in Lavis Repository to get details about a video scene which is then interpreted back to the user by the best TTS on the market from Eleven Labs.

Challenges we ran into

We spent so much time trying to synchronize the video playback (we're still testing other methods to make this better) with the auto-caption feature due to inference time and lags, second, we didn't figure out, the best way to make the in-video search efficient on the client side without brute-forcing it, so we're still on it.

Accomplishments that we're proud of

In the end, we went from ideas and contemplations to a working prototype in less time than it took to get the inspiration for the topic of the hackathon in the first place! And that we can now ask context-based questions about a scene in a video.

What we learned

Crazy ideas aren't really crazy once you take the time to look around for solutions you can stack up like lego bricks! Secondly, Canvas API can do a lot more than we previously experimented with. Lastly, computers see colors in a way different way than we do!