Inspiration
Every presenter knows the awkward moment — fumbling with a clicker, passing it between teammates, or struggling to hold it while demoing something hands-on. Keynote and Google Slides offer scripted recording, but that's rigid and time-restricted. Real presentations drift: Q&A sessions interrupt flow, a client asks you to jump back three slides, or you want to freestyle beyond your script. Turner was born from that frustration — the gap between how presentations should feel and how they actually go.
What it does
Turner is a fully touchless browser plugin for live presentations. It lets you:
- Navigate slides with hand gestures captured by your webcam
- Jump semantically to any slide by speaking naturally — say a topic, and Turner matches it to the right slide using LLM embeddings, no exact keyword required
- Control flow with voice commands via real-time speech transcription
- Monitor everything on a live dashboard showing current slide state, transcript feed, and gesture input
No clicker, no remote, no hands occupied. Just you and your audience.
How we built it
Turner is a multi-pipeline system running locally:
- Gesture layer: MediaPipe hand and pose landmarkers detect swipe, point, and twist gestures from a webcam feed in real time
- Voice layer: AWS Transcribe streams live microphone input, producing finalized transcript segments
- Semantic layer: Google Gemini processes your slide deck (PDF) and speaker script, building per-slide semantic context. As you speak, finalized transcript chunks are matched against this context to decide whether to advance, hold, or jump to a specific slide
- Slide control: PyAutoGUI sends keystrokes to control Keynote or PowerPoint running natively on your machine
- Dashboard: A Next.js frontend connects to a FastAPI + SSE backend, providing live visibility into slide state, camera/microphone selection, and debug controls
Challenges we ran into
- Latency in the semantic loop: AWS Transcribe only finalizes segments after a pause, creating a natural lag. We decoupled transcript intake from the Gemini retry path so slow LLM responses don't block transcript capture.
- Gesture false positives: Distinguishing intentional navigation gestures from natural presenter movement required careful tuning of the swipe and twist detectors, and extensive unit testing against edge cases.
- Semantic context caching: Building per-slide embeddings from a full PDF + script on first run is slow. We added pickle caching so subsequent runs on the same deck are instant.
- Cross-process slide control: Reliably sending keystrokes to Keynote or PowerPoint from a Python background process required careful handling of macOS focus and permissions.
Accomplishments that we're proud of
- A fully working end-to-end pipeline: speak a topic mid-presentation, and Turner jumps to the right slide in seconds — no clicking, no searching
- Hot-swappable camera and microphone selection through the live dashboard, without restarting the system
- Modular architecture that cleanly separates gesture, voice, semantic, and control layers — each independently testable
- A robust unit test suite covering swipe detection, body twist, hand geometry, and the gesture controller
What we learned
- Real-time AI pipelines require aggressive decoupling — any synchronous dependency between layers creates visible stalls for the user
- Semantic slide matching is surprisingly robust even with informal, wandering speech — the LLM handles paraphrasing and topic drift well
- Gesture recognition in presenter environments (variable lighting, background movement, expressive body language) is a much harder problem than controlled demos suggest
- Building for a real use case forced us to think about reliability and graceful degradation over raw demo polish
What's next for Turner
- Virtual laser pointer: highlight and annotate slides in real time using finger-tracking, no hardware required — partial implementation already in progress
- Presenter profiles: learn each presenter's personal gesture vocabulary and speaking style, reducing false positives and eliminating interruptions
- Enterprise customization: configurable gesture sets and semantic sensitivity for large pitch events, boardrooms, and conference stages
- Expanded platform support: beyond Keynote and PowerPoint to Google Slides and browser-based presentation tools.
Log in or sign up for Devpost to join the conversation.