SlideFlow

Inspiration

Controlling presentation slides can be tedious especially without wireless tools at the presenters disposal. So we created SlideFlow with the aim to provide AI assistance while presenting presentations. Starting with a rough idea of a voice recognition system in mind to control the slides, we moved to implementing a pre-trained AI model and the Google Cloud Vision API to add further functionality.

What it does

Our web app listens to the presenter’s voice and uses natural language plus computer vision to control and search slide content. Key features include:

Voice Commands for Navigation
Move between slides by simply saying “Next slide” or “Previous slide.”
Voice-Powered Search for Text & Images
Jump to specific slides by describing their content (e.g., “Go to the slide with Japan in the title” or “Go to the slide with a desert”).
Live Transcript
Everything the presenter says is transcribed in real time.
Audience-Friendly Search
Audiences can also search slides in natural language (e.g., “Show me slides with mountains”), and the AI automatically displays the relevant slides.

How we built it

We combined a React + Tailwind front end with a Flask-based back end to create a seamless voice-controlled slide experience. Users can upload PDF or PPT files; the system then extracts text and images for analysis. We used the Web Speech API for voice recognition, Google Cloud Vision API for image detection, and a Sentence-BERT model to handle semantic similarity. This allows us to navigate slides by voice commands (“Next slide”), search for specific topics (“Show me slides about Japan”), and match user queries with relevant content.

Challenges we ran into

Embedding PDFs in React
We encountered version mismatches and configuration hurdles while integrating PDF.js into our React app.
Navigating PDF.js Documentation
The docs were extensive but sometimes confusing, especially for advanced features like worker scripts and partial loading.
Choosing the Right AI Models
Balancing performance, cost, and accuracy led us to experiment with different approaches before selecting the most suitable ones.

Accomplishments that we're proud of

Polished UI/UX
We created a clean, intuitive interface that makes voice-based navigation and slide searches feel natural—even for non-technical users.
Seamless AI Integration
By blending Google Cloud Vision (and other AI tools) with our front end, the process of detecting and searching slide content is smooth and unobtrusive.
Robust Error Handling
We built in fallbacks and clear user prompts (e.g., handling misheard commands or slides without recognized content) to ensure reliability.
Extensibility
Our architecture makes it easy to add more advanced voice commands (like “zoom in on slide 5”) or integrate new AI models without extensive rewrites.

What we learned

We improved with implementing a robust Flask backend while integrating multiple AI services, including Google Cloud Vision for image analysis and a pre-trained Sentence-BERT model for language understanding. Alongside this, we focused on creating a user-friendly UI and orchestrating various APIs smoothly to deliver a cohesive, end-to-end solution.

What's next for SlideFlow

We plan to enhance SlideFlow by adding more advanced voice commands (e.g., “Highlight text about cybersecurity”) and potential multi-user collaboration so presenters and audiences can interact simultaneously. We’re also exploring integration with additional AI models for improved text summarization and slide recommendations, making presentations more interactive and insightful.

Built With

amazon-web-services
flask
framer
google-cloud
python
react
sentence-bert
tailwind
terraform
typescript

Submitted to

QHacks 2025
- Winner Theme Prize - Fujifilm Instax Mini 12 Instant Camera

Created by

I focused on the UI/UX design and implemented the frontend of SlideFlow with React, Typescript, TailwindCSS, and Framer Motion, ensuring a clean responsive and user-friendly interface.

James Song
I worked on the middleware, focusing on integrating the APIs with the backend. Ensuring seamless communication between components, including the Web Speech API, Google Cloud Vision API, and the backend services, making the system's core functionalities robust and efficient.

Kevin Yao
I contributed as the Web Designer and conceptualized the architecture and the idea behind SlideFlow. I also implemented the Web Speech API for voice recognition and developed the live transcription feature.

Ahmed Rizwan
Queen's CS 26'
I worked on the backend; it was fun working with the Google Vision API, and it really helped me enhance my skills in integrating AI-powered tools into a project. Additionally, the challenges of processing and analyzing PDF content gave me a deeper understanding of handling complex workflows

Vu Thanh Loc (Loc) Mai
Hello, welcome to my devpost's profile

Updates

Ahmed Rizwan started this project — Jan 26, 2025 06:40 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.