PrepMaster AI: Slide-level Presentation Rehearsal Analysis

Sync audio with slides and manually "Ignore" non-essential keywords to instantly recalculate an accurate adjusted score.
report provides deep insights into fluency, energy curves, and semantic flexibility
Review previous rehearsal recordings alongside their corresponding AI-generated Executive Audits to track your presentation progress

Inspiration

Cracking the "Black Box" of Public Speaking Presentation mastery isn't just about beautiful slides or perfect grammar; it's about the Multimodal Alignment between visual content and verbal delivery. I realized that most rehearsal tools provide "hollow" feedback—a generic score that doesn't tell you where you failed.

I built PrepMaster AI to provide a granular, slide-level diagnostic that answers the "hard" questions: "Did I actually explain the complex chart on Slide 5, or did I just read the title?"

What it does

PrepMaster AI is a high-precision rehearsal engine that deconstructs a presentation session into actionable data.

Slide-Level Granularity: Instead of one overall score, users receive a diagnostic report for every single slide based on precise entry/exit timestamps.
Weighted Scoring (3:4:3): A proprietary algorithm that evaluates performance based on Content (30%), Fluency (40%), and Tone (30%).
Semantic Verification: Uses Sentence Embeddings to determine if the speaker's ideas match the slide content, even if they paraphrase.
Acoustic Tone Audit: Detects if the delivery is "Monotone" or "Dynamic" by analyzing pitch variability in the audio signal.

How I built it

The system follows a Three-Pillar Architecture designed to balance deterministic reliability with semantic flexibility.

1. The Deterministic Core (Python & Librosa)

To maintain user trust, core metrics are calculated using hard-coded mathematical logic:

Acoustics: I used librosa and the pyin algorithm to extract the Fundamental Frequency () and calculated the Standard Deviation (SD) to measure pitch variability.
Fluency: Algorithms calculate WPM (Words Per Minute) and detect filler words (uh, um, like) and mumbles using confidence scores from the STT engine.

2. The Semantic Logic (Sentence-Transformers)

To allow for natural speaking, I integrated the all-MiniLM-L6-v2 model.

Vector Embeddings: Slide text and spoken transcripts are converted into 384-dimensional vectors.
Cosine Similarity: The system measures the "distance" between ideas. If the slide says "revenue" and you say "income," the AI recognizes the successful coverage through semantic similarity.

3. The Synthesis Layer (GPT-4o)

Finally, all "Hard Metrics" are fed into GPT-4o. By providing the LLM with structured data context (scores, filler rates, missing concepts), it generates a professional Executive Coaching Report that is actionable and objective.

Challenges I ran into

Multimodal Synchronization: The biggest hurdle was aligning faster-whisper segments with manual slide transition timestamps. I built a custom overlap-calculation logic (using a 0.15s threshold) to ensure speech was mapped to the correct slide even if the user switched slides mid-sentence.
Tone Thresholding: Finding the right mathematical value for "boredom." After testing various voice samples, I determined that a Pitch SD < 12.0 Hz is the reliable threshold for flagging monotone delivery.

Accomplishments that I'm proud of

Human-in-the-Loop Calibration: I implemented a feature where users can "Ignore" specific keywords. The system then instantly recalculates the score, giving users control over the AI's judgment.
Full-Stack Integration: Successfully connecting a Streamlit frontend with a Firebase backend (Firestore & Cloud Storage) and a heavy local AI processing pipeline.
Data Transparency: Every piece of feedback is traceable back to a specific metric, moving beyond "black-box" AI evaluations.

What I learned

AI Orchestration: I learned that effective AI systems are about placing AI in the right role. I used code for math/determinism and AI for meaning/synthesis.
Privacy & Security: Managing audio files in Firebase using Signed URLs taught me how to handle sensitive user recordings securely with time-limited access.

What's next for PrepMaster AI

Vision-AI Integration: Using GPT-4o-vision to analyze the visual elements (charts/diagrams) of a slide to see if the speaker is explaining the data visuals correctly.
Real-time Haptic Feedback: A visual alert during the rehearsal if the speaker's pacing deviates significantly from the target.