The God Emperor himself
What it does
takes in a user's speech and Trumpify it, so that it sounds like Trump is saying what you are
How we built it
The voice conversion backend is built on the top of SPTK. Speech analysis/synthesis is done using World speech A/S library (Morise et al, 2016). A tool was written (in C) to convert GMM into differential GMM (Kobayashi et al, 2014). The dereverberation algorithm in (Lebart et al, 2001) is implemented in GNU Octave to pre-process the noisy training data. All these tools are glued together in a makefile.
Training Features: 24-order mel-cepstral coefs 5-order band aperiodicity Static feature + first-order differential GMM: 8 mixture components, 10 iterations for EM training Pitch transformation: mean + std log F0
Data: 40 parallel sentences of total duration ≈ 2min 40s, including silence Trump speech taken from Republican National Convention (they’ve got some tremendous reverberation)
Express An express server would host a GUI to allow users to give their own speech files, and the server would process their speech using the SPTK library above and play the Trumpified speech, using the Howler.js library. Style was added throguh bootstrap and interactivity through jquery.
Y. Stylianou, O. Cappé, and E. Moulines. "Continuous probabilistic transform for voice conversion." IEEE Transactions on Speech and Audio Processing 6, no. 2 (1998): 131-142. K. Kobayashi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura. "Statistical singing voice conversion with direct waveform modification based on the spectrum differential." In INTERSPEECH, pp. 2514-2518. 2014. K. Lebart, J. M. Boucher, and P. N. Denbigh. "A new method based on spectral subtraction for speech dereverberation." Acta Acustica united with Acustica 87, no. 3 (2001): 359-366. M. Morise, F. Yokomori, and K. Ozawa: WORLD: a vocoder-based high-quality speech synthesis system for real-time applications, IEICE transactions on information and systems, vol. E99-D, no. 7, pp. 1877-1884, 2016.
Challenges we ran into
merging the ML libraries to process files from the express server
Time to train the model
reverberation from interviews and test recordings
Accomplishments that we're proud of
Express server able to upload files speech was able to sound similar to Trump
What we learned
What's next for TalkTrumpToMe
Natural Language Processing Trump Speech Synthesizer