Inspiration
We're tired of plain old video players.
Too much utilizable space goes to waste, and ads that do pop up distract the user from the viewing experience. We created a better way for video players and companies to both achieve their goals: present everything in a pleasant, concise format but also promote a user experience that focuses on the commerce that drives the open video industry.
What it does
Our process utilizes a conversion method to take YouTube videos, asynchronously query the associated audio via Google Cloud Platform's Transcribe and Storage service, and create a searchable transcript for keywords associated with the video. Finally, we also utilize machine learning models to acquire visual data from video frames, such as the faces of individuals or visual factors to predict environments (e.g. changes in environment during a flood to forewarn individuals).
From there, we utilize the Kensho API to obtain information about the products visualized or cited and their associated organizations or corporations.
How we built it
We began by wireframing the presentation aspects of video visualizations we aimed to solve, specifically those of marketing individual products, displaying company trends, and identifying the video environments, specifically the people and natural context of the video. We then used a modular software design pattern to develop the individual aspects of our project independently before merging these individual modules into our final product.
In order to allow our APIs to function as expected, we first needed to acquire the video metadata, including the associated text and images. To best manage our time, we broke up the work into designing the frontend components, the API calls, and acquiring the functional data necessary in parallel.
Upon requesting and converting the video data from the YouTube into WAV audio format, we used Google Cloud's Storage platform to effectively store our audio. Then, we transcribed said audio file into text with Google Cloud's Speech API. Chosen video frames were also analyzed with a neural pathway for cascade classification and filters for edge detection.
We then created appropriate functions and display formats for the outputs of the API calls, ranging from text to graphs. We then created a front-end web-app using React.js to serve as the user-interface, serving the results of the API calls to the UI-platform.
Challenges we ran into
The GCP solution took time to appropriately instantiate, as documentation on some aspects of reading from files stored on GCP Buckets directly was out of date (using Python 2 instead of 3). Instead, we read data using GCP's speech-to-text functions directly, which does so asynchronously on cloud machines.
YouTube's video players were not implemented for visualization on external players, as much of the associated information for "playability" was not presented in a simple format. We counteracted this issue by building a signal processing unit that collects information from asynchronous events and handles them appropriately.
Accomplishments that we're proud of
We are proud of our utilizations of new technologies, having little familiarity with any of the APIs given, and successfully integrating these technologies together in an elegant and manageable platform. We are also impressed with the simplistic design of the platform and structural basis that allowed us to build our applications.
What we learned
- Using Python libraries and mathematical models for computer vision and file conversion (imageai, youtube-dl, cv2)
- Using a cloud platform for process management, namely Google Cloud Platform (google-cloud-{storage, speech})
- Playing YouTube videos via personal handlers
- Embedding events upon keyword or image analysis events
What's next for Lensflare
We hope to further expand our plans for Lensflare by involving more APIs, increasing the quantity and quality available for access by the user.
We were encouraged by the possibility for scalability implemented by our system. Apart from requesting the YouTube video itself and video preprocessing, by storing all required data uniquely on GCP and querying this information asynchronously, we allow for a simple method to request and transfer information with multithreading for better efficiency.
We hope to develop Lensflare into a full-force solution for video viewing with the appropriate foundational and financial support.
Built With
- amazon-web-services
- cv2
- google-cloud
- google-cloud-speech-v1
- imageai
- kensho
- material-ui
- natural-language-processing
- python
- react
- wave
- youtube-dl
Log in or sign up for Devpost to join the conversation.