Polly AI

Inspiration

Public speaking and debate are skills that can make or break opportunities — whether it's acing a presentation, winning a debate competition, or simply communicating ideas confidently. But most people don't have access to a personal coach who can give them real-time, objective feedback. We wanted to build an AI-powered debate coach that's available 24/7, analyzing not just what you say, but how you say it — tone, emotions, and body language. Polly AI was born from the idea that everyone deserves personalized coaching to become a better communicator.

What it does

Polly AI is a real-time debate coaching platform. When you connect, you're greeted and assigned a random debate topic. You can then practice your argument by typing or recording your voice. As you speak, Polly AI:

Tracks your facial expressions frame-by-frame using your webcam to detect emotions (happy, sad, confident, nervous, etc.)
Analyzes your voice for pitch, energy, confidence score, and tone characteristics
Transcribes your speech to text
Evaluates your argument structure, persuasiveness, and delivery using Google Gemini AI

You receive instant, personalized feedback on your performance — including what you did well, where you can improve, and actionable tips for your next practice session. It's like having a debate coach who never sleeps.

How we built it

Frontend: React.js with WebSocket integration for real-time communication, React Webcam for live video streaming, and React Markdown for formatted AI responses.

Backend: FastAPI handles WebSocket connections and concurrent processing. We built a modular service architecture:

Emotion Service: Uses DeepFace and OpenCV to analyze facial expressions from video frames
Voice Analysis Service: Leverages librosa for pitch detection, energy measurement, and confidence scoring
Speech Service: Converts audio to text (currently mock, designed for Google Speech-to-Text integration)
Chat Service: Integrates Google Gemini AI for intelligent coaching and feedback
Topic Service: Generates random debate topics from a database

Challenges we ran into

WebSocket synchronization: Managing real-time video frame processing while keeping the chat interface responsive was tricky. We had to carefully balance frame processing intervals to avoid overwhelming the backend.

Audio encoding issues: Converting browser-recorded audio (WebM) to a format suitable for speech analysis required handling multiple codec formats and base64 encoding correctly.

Emotion detection accuracy: DeepFace sometimes struggled with varying lighting conditions and camera angles. We had to add robust error handling and fallback mechanisms when faces weren't detected.

Context management: Making sure the AI understood the full context of a debate session (topic, previous messages, emotion state) while generating relevant feedback required careful prompt engineering.

CORS and WebSocket configuration: Getting the frontend and backend to communicate smoothly across different ports during development took significant debugging.

Accomplishments that we're proud of

Built a fully functional real-time coaching system with live emotion detection running at 1 frame per second
Successfully integrated multiple AI services (computer vision, audio analysis, and LLM) into a cohesive user experience
Created an intuitive chat interface with markdown support that makes AI feedback easy to read and actionable
Implemented a complete speech-to-text pipeline ready for production API integration
Designed a modular backend architecture that's scalable and easy to extend with new features
Got the entire tech stack working together seamlessly — from webcam capture to AI-generated feedback in under 5 seconds

What we learned

Technical skills: We deepened our understanding of WebSocket architecture, asynchronous Python programming, real-time video processing, audio signal analysis, and LLM prompt engineering.

AI integration: We learned how to combine multiple AI models (computer vision, audio analysis, NLP) into a single application and handle their different latency requirements.

User experience: Real-time feedback is powerful, but it needs to be presented in a way that's encouraging rather than overwhelming. We learned to balance detailed metrics with actionable insights.

System design: Building a system that processes video, audio, and text simultaneously taught us about resource management, concurrent processing, and graceful error handling.

What's next for Polly-AI

Real Speech-to-Text: Integrate Google Cloud Speech-to-Text API to replace mock transcription with production-grade accuracy.
Performance tracking: Build a dashboard showing progress over time — track improvements in confidence, reduce filler words, and monitor emotion consistency with a simple UI.
Advanced metrics: Add gesture recognition, body language analysis, and speaking pace visualization.
Practice modes: Different coaching modes for debates, presentations, mock interviews, and casual conversation practice.
Social features: Peer comparison, leaderboards, and the ability to share practice sessions with friends or coaches.
Mobile app: iOS/Android apps for practicing on the go
Custom topics: Allow users to create and practice with their own debate topics or upload presentation scripts that can be graded.

Built With

bytesio
faceapi
fastapi
genai
numpy
opencv
python
react
tailwindcss
whisperapi

Submitted to

HackHarvard 2025: Compile the Decade

Created by

I served as the Project Ideator and Designer/Front-End Developer. As the initiator, I developed the core idea, pitched it to the team with a full pros and cons analysis, and outlined the key features. As this was my first hackathon and my first time coding with React and Tailwind CSS, I quickly acquired the necessary skills to design the UI/UX and primarily implement the front-end, successfully delivering a functional prototype.

Sharlyn Barreto
Lead Front-End Developer & Designer — Directed the front-end architecture and visual design of the web platform. Coordinated task delegation among team members and facilitated backend–frontend integration of the chatbot, resolving interface and API communication issues to deliver a smooth, responsive user experience.

Giancarlo Forero
Bryan Mejia

Updates

Bryan Mejia started this project — Oct 05, 2025 03:43 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.