Inspiration
Homework ocasionally requires that we rewatch excruciating, multi-hour documentaries; Video editing demands that we find precisely the point a sibling disrupted a delicate skit; Preventing bank robberies imposes the need to spot suspicious activity days in advance.
In the past we had no alternative to manual, human labour: we just had to wait and watch. But that's no longer the case.
Having worked in the neural search space for the last few months, we only recently began putting together the disparate pieces necessary to build something like Scry. This was just the perfect opportunity to save fellow humans from what we ourselves faced (okay, maybe not the bank robbing ;) ).
What it does
Scry lets you "google" (i.e., natural language search) your way through both the auditory and visual components of videos.
If you need to track every film shot taken through a window in Hitchock's Rear Window, or the last words of Steve Jobs in his biopic, Scry's your friend.
If you're a video editor, and you can't recall exactly when or where a frame or audio needs to be edited out, or you're a visually or auditorily disabled man who wants to understand what's happening in a video, Scry's your friend.
If you're an intelligence analyst, and you need to parse through millions of hours of security footage for a vaguely described suspect ("man wearing red jacket and sunglasses"), or identify who said what in a poorly recorded wiretap, Scry's your friend.
How we built it
We built the frontend using react, tailwind, and mantine-ui.
Challenges we ran into
Building and tuning the hyperparameters on the ML model.
Accomplishments that we're proud of
Successfully figuring out how to cache the ML model such that it isn't constantly reloaded.
What we learned
We also found that when under a great deal of pressure, we can accomplish far more than we expected to, and even daunting tasks, when chunked (as the videos are!) are accomplishable.
What's next for Scry
There're a couple ways we could move forward: we could adapt Scry into a full fledged video editor, an intelligence analysis platform, or a lecture/keynote/conversation idea analyzer (leap through the ideas from thought to thought, in graphical/map form).
Built With
- assemblyai
- cohere
- machine-learning
- python
- typescript

Log in or sign up for Devpost to join the conversation.