What it does

LearnIt generates educational videos based on user prompts. LearnIt utilizes Google's Gemini 1.5 for script and video generation based on the user's prompts. To further aid in educating, DeepGram is used for audio narration to explain to users the concepts being taught on screen effectively.

How it was built

Manim - Powerful animation engine created by 3Blue1Brown used for video generation

Google Gemini 1.5 - LLM used for script and video code generation using Manim

DeepGram - SpeechAI used for script text-to-speech

React - Javascript Frontend library for web app

FastAPI - Python Backend library for server-side scripts

Pydub - Python library for audio manipulation. Used to sync text-to-speech audio files to generated video

Moviepy - Python library for video editing. Used to splice audio and video files together.

Challenges ran into

One challenge was having reliable Manim code to generate videos. Specifically, Gemini isn't always up-to-date with various libraries, Manim included. Therefore, if prompted to generate Manim code, it would generate non-functional code almost every time. To fix this issue, I fed up-to-date Manim documentation into the Gemini model to have it generate more accurate responses. This significantly increased the likelihood of generating usable code.

One second challenge was generating a script that would sync up with the video. At the start, the script duration would be over 2x as long as the video duration. I tried to reduce the amount of unnecessary dialogue, but only slightly fixed the issue. Since the script duration would most always be longer than the video duration, given it wasn't already the same duration, I increased the speed of the text-to-speech audio outputs to align with the video duration. Though this further helped the issue, audio files with significantly longer durations would be sped up to the duration of low-duration videos, warping the quality of the audio. Due to the time constraints, I didn't manage to find a more viable solution.

One last challenge was incorporating non-physics/math videos. Manim is meant for math-based animations. Therefore, it was not viable to have users generate videos on other subjects (history, economics, etc). In the future, I'd like to incorporate another engine/video/image generation to generate a larger variety of videos.

Accomplishments proud of

I've never used a LLM API before so I'm proud of what I was able to implement in such a short time-frame. I'm also proud to use more of a variety of different APIs/frameworks. I never considered using SpeechAI APIs prior to this but now that I've used DeepGram, I hope to continue using it for future projects. Furthermore, I've never used FastAPI before so I was glad to learn more web frameworks.

What was learned

As mentioned above, I learned to use LLM APIs, specifically Google's Gemini 1.5. Furthermore, I learned how to use DeepGram's text-to-speech API as well as FastAPI for backend development.

What's next for LearnIt

LearnIt currently only utilizes Manim for video generation, so I'd like to incorporate other engines/video/image generation to generate a larger variety of videos. Furthermore, I'd like to be able for users to prompt live questions regarding outputted videos and create subsequent video responses (i.e. clarifying a misconception that was mentioned in the initial video). Finally, I hope to host this publically for people across the world to utilize and optimize learning.

Built With

  • deepgram
  • fastapi
  • google-gemini
  • google-gemini-1.5
  • manim
  • manim-community
  • materialui
  • moviepy
  • pydub
  • react
Share this project:

Updates