One day ago, I got the email from the EMNLP conference to tell me submit my paper slide and a presentation recording by Friday, which means I only have one day to do this while I am sick and lost my voice! How can I make this happens??? What we build: As a multimodal AI researcher, I immediately thought of why not let AI do this for me! Then I design an agent system that reads your paper, automatically generate the slide and the script for each of the slide, and automatically recording the presentation using your own voice!
What are the challenges:
- The first challenge is that the initial AI generated slide is too generic. to solve this problem, we prompt the model to follow the structure of the paper, and use the figure and table.
- The first challenge we encounter is that when the model generate the slide, it does not know how to insert the figure. To solve this problem, we use the raw LateX folder where the figures and the source of the paper PDF is given. Using the context, the model can find the figure and use that figure to generate the slide.
- The third challenge, we encounter is to generate the voice that match with the user voice, otherwise, the presentation recording will not be authentic. We use the 11Lab platform, and use the user voice to synthesize a voice that is exactly as the user.
Finally, we made this platform that could automatically generate the presentation recording for your paper. And I just used this to create the recording and submit my EMNLP paper presentation. It is amazing, I hope you find this useful too.
Built With
- 11lab
- python
Log in or sign up for Devpost to join the conversation.