Mr. President let's us write the President's script. Given a video of President Obama (or in the future, any video of a person speaking) and a sentence you want him or her to say, we synthesize a video of that person saying that phrase.
Jokes aside, Mr. President demonstrates several cool concepts.
The process is inspired by a technique called Video Textures created by Dr. Irfan Essa from the Georgia Institute of Technology. Video textures are matrices of transition probabilities between frames of video. By sampling this video you are able to generate infinite video because you can transition to new segments. In this fashion we generate a complete graph of transition probabilities between phonemes.
We begin the process by taking the video and extracting the audio signal. We use CMU's Sphinx library to slice the audio signal into segments of one phoneme unit or word each. The time stamps of each of these segments are stored in a directed complete graph, then apply dynamic programming algorithm to find the optimal path to synthesize the input sentence. We use approximation algorithms and dynamically cached data to optimize the process and make it possible to generate videos quickly on the fly. The appropriate phoneme or word time segments are selected, and the corresponding video clips are stitched. To polish the experience, Gaussian smoothing and facial localization is performed to ensure the head is stable.
We also normalize the sound of the generated slices over db space.
Lastly, Mr. President highlights the potential for security flaws in videos of the 21st century. A proof-of-concept of video synthesis that can impersonate famous and publicized figures can strongly affect our increasingly connected society.