Mp3 Summerizer

Inspiration

Honestly, lectures are boring to listen to. I realized I spend so much time staring at the clock instead of actually retaining anything. I wanted a tool that could listen to audio for me, transcribe it, and give me a quick summary so I could get the key points without sitting through the whole thing.

What I Learned

While building this, I picked up a lot:

Using Whisper for transcribing audio into text.

How to work with Gemini AI to summarize long texts clearly.

Setting up a Flask backend to handle uploads and processing.

Making a Streamlit frontend that talks to the backend.

Some practical lessons on prompt design, debugging, and handling edge cases.

How I Built It

Backend

Flask app with an /api/upload route.

Saved uploaded MP3s locally.

Used Whisper to split audio into text segments.

Sent those segments to Gemini AI for summarization.

Returned JSON containing both segment summaries and an overall summary.

Frontend

Streamlit interface lets users upload MP3s.

Shows file info, plays audio, and sends it to the backend.

Displays segment summaries with timestamps and the overall summary.

AI Summarization

Used prompts that push Gemini AI to pull out the main ideas, not just reword the audio.

Tried different prompt styles until summaries actually captured the gist.

Challenges

Transcription issues: Whisper sometimes got words wrong or messed up timestamps.

Summaries felt repetitive: Initially, AI just rephrased sentences. I had to tweak prompts to get actual concise insights.

Setup headaches: Installing Whisper, PyTorch, and Gemini AI SDKs was messy, fixed with virtual environments.

File handling: Ensuring MP3 uploads worked correctly in Streamlit needed attention.

Future Improvements

Add speaker recognition.

Include multilingual summarization.

Let users download summaries as PDF or Markdown.

Make bullet-point or structured summaries.

Built With

flask
python
streamlit
whisper

Updates

Jocquise Green started this project — Nov 15, 2025 06:57 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.