Inspiration

Controlling presentation slides can be tedious especially without wireless tools at the presenters disposal. So we created SlideFlow with the aim to provide AI assistance while presenting presentations. Starting with a rough idea of a voice recognition system in mind to control the slides, we moved to implementing a pre-trained AI model and the Google Cloud Vision API to add further functionality.

What it does

Our web app listens to the presenter’s voice and uses natural language plus computer vision to control and search slide content. Key features include:

  • Voice Commands for Navigation
    Move between slides by simply saying “Next slide” or “Previous slide.”

  • Voice-Powered Search for Text & Images
    Jump to specific slides by describing their content (e.g., “Go to the slide with Japan in the title” or “Go to the slide with a desert”).

  • Live Transcript
    Everything the presenter says is transcribed in real time.

  • Audience-Friendly Search
    Audiences can also search slides in natural language (e.g., “Show me slides with mountains”), and the AI automatically displays the relevant slides.

How we built it

We combined a React + Tailwind front end with a Flask-based back end to create a seamless voice-controlled slide experience. Users can upload PDF or PPT files; the system then extracts text and images for analysis. We used the Web Speech API for voice recognition, Google Cloud Vision API for image detection, and a Sentence-BERT model to handle semantic similarity. This allows us to navigate slides by voice commands (“Next slide”), search for specific topics (“Show me slides about Japan”), and match user queries with relevant content.

Challenges we ran into

  • Embedding PDFs in React
    We encountered version mismatches and configuration hurdles while integrating PDF.js into our React app.

  • Navigating PDF.js Documentation
    The docs were extensive but sometimes confusing, especially for advanced features like worker scripts and partial loading.

  • Choosing the Right AI Models
    Balancing performance, cost, and accuracy led us to experiment with different approaches before selecting the most suitable ones.

Accomplishments that we're proud of

  • Polished UI/UX
    We created a clean, intuitive interface that makes voice-based navigation and slide searches feel natural—even for non-technical users.

  • Seamless AI Integration
    By blending Google Cloud Vision (and other AI tools) with our front end, the process of detecting and searching slide content is smooth and unobtrusive.

  • Robust Error Handling
    We built in fallbacks and clear user prompts (e.g., handling misheard commands or slides without recognized content) to ensure reliability.

  • Extensibility
    Our architecture makes it easy to add more advanced voice commands (like “zoom in on slide 5”) or integrate new AI models without extensive rewrites.

What we learned

We improved with implementing a robust Flask backend while integrating multiple AI services, including Google Cloud Vision for image analysis and a pre-trained Sentence-BERT model for language understanding. Alongside this, we focused on creating a user-friendly UI and orchestrating various APIs smoothly to deliver a cohesive, end-to-end solution.

What's next for SlideFlow

We plan to enhance SlideFlow by adding more advanced voice commands (e.g., “Highlight text about cybersecurity”) and potential multi-user collaboration so presenters and audiences can interact simultaneously. We’re also exploring integration with additional AI models for improved text summarization and slide recommendations, making presentations more interactive and insightful.

Built With

Share this project:

Updates