Inspiration

Watching videos for note-taking and understanding takes a significant time in a student/researcher’s life.

What it does

We present a simple real-time video transcribing application to summarize and generate LaTeX code for any formulae and equations mentioned in the video.

How we built it

Introducing a pioneering approach using Google Gemini AI to analyze math videos and transcribe their content seamlessly into a single Latex document. This transformative process would help students get a clear understanding of the video while saving time in watching an entire video. Our model can also translate the prompt generated into the language of their choice.

Challenges we ran into

Integrating the Python script with Google Gemini came with a lot of technical difficulties, such as taking hours for our frames to process, which made inferencing difficult. Furthermore, formatting the transcribed text along with video summarization into a laTeX document required intensive debugging.

Accomplishments that we're proud of

Figuring out a way to automatically transcribe and summarize a video through Gemini API Translating the text from one language to another Optimization of the model to lower the time taken to process longer videos while keeping high fps(frames per second).

What we learned

OpenCV for computer vision, Gemini API for calling the summarizer model, and Figma for the interactive design of our prototype.

What's next for Ma.TeX

While this is just a way for us to put our step in the game, we have several ideas for the future to enhance our application: Adding the option to generate Python, SQL, or Java codes about the topic being discussed in videos Allowing for multimodal input such as audio recordings, handwritten notes Making the application more inclusive and accessible by converting text to speech

Libraries

Google-generativeai, pytube, ffmpeg, youtube-transcript-api, opencv-python, numpy, matplotlib, pillow

Built With

Share this project:

Updates