Have you ever watched the "Transformers" movies? We took inspiration from the character "Bumblebee," who possesses the ability to speak by using sound-clips of recorded video. From that basic concept, we developed a similar mechanism to stitch video-clips into a single video for sharing. When one visits our website, they can select a person and enter a short phrase. Within seconds, we return a newly stitched video that emulates a text-to-video/speech as seen in movies and popular youtube videos (such as Obama singing Call Me Maybe).
When you click the button to generate a video, we send a request to our backend to find timestamps of words in Youtube videos using a script that utilizes linguistic theory and technologies such as natural language processing. The words entered are then stitched into videos from the selected speaker. On the client we utilize a queuing system for iFrames that resembles a server-side load-balancer.
Our primary challenges stemmed from design flaws we found in the APIs provided by YouTube. Several times we had to completely redesign the way we serve the video to ensure it was as smooth of a playback as possible.
We imagine that this method of stitching words together can be extended into a few different mediums. For example, it can be used for dialect mapping or combined with a machine learning algorithm to make a realistic voice. It could also be really powerful for language learning, by serving as example pronunciations.