Inspiration
The inspiration behind Meeting Insights Generator stemmed from the challenges faced during virtual meetings, where key points often get lost or forgotten. As more people work remotely, it's become increasingly difficult to track discussions, ensure clarity, and retain important insights. This project was developed to help streamline meeting workflows and enhance collaboration by automatically transcribing, summarizing, and visualizing key highlights from recorded meetings.
What it does
Meeting Insights Generator is a multimodal application designed to:
- Transcribe meeting audio/video files using AssemblyAI's speech-to-text technology.
- Summarize the transcriptions using the Facebook BART large CNN model for efficient, concise meeting highlights.
- Generate relevant images based on the summary keywords via the Pexels API.
- Convert these summaries into audio for easy review using the Deepgram API.
The app stores and retrieves meeting data through Supabase, ensuring easy access to both generated content and media files.
How we built it
The project is built with a combination of technologies:
- Frontend: React with Vite for a responsive and fast user interface.
- Backend: Node.js with Express, utilizing MongoDB to store meeting data and Supabase for file storage.
- AI APIs:
- AssemblyAI for audio transcription.
- Hugging Face’s Facebook BART for summarization.
- Pexels API for image generation based on the summary.
- Deepgram for converting summaries into speech.
We utilized React and Node.js to build the application’s structure, with MongoDB handling data persistence and Supabase for file storage and hosting images.
Challenges we ran into
Some of the key challenges encountered during the development of Meeting Insights Generator included:
- API Integration: Handling the integration of multiple APIs, each with their own rate limits and asynchronous requests, was a complex task. Ensuring smooth communication between the APIs without overwhelming the system was critical.
- Multimodal Data Handling: Combining audio, text, and images into a seamless user experience required thoughtful data management and organization. This involved careful consideration of how each modality would work together and how the user would interact with the generated content.
- Real-time Processing: Achieving quick and accurate results in a real-time environment, especially for the summarization and image generation components, was a significant challenge. Optimizing the backend processes for speed and efficiency was key to making the tool usable.
Accomplishments that we're proud of
- End-to-End Workflow: Successfully developed a comprehensive, end-to-end workflow that allows users to upload meeting files, get accurate transcriptions, and view summarized content with visual insights in just a few steps.
- AI-powered Summarization: The integration of advanced models like Facebook BART for summarization is a key accomplishment. The model produces concise, high-quality summaries, making it easier for users to grasp meeting highlights quickly.
- Image Generation: Generating relevant images based on summary keywords was an exciting feature that enhanced the meeting insights and provided a visual representation of key points.
- Seamless Integration with APIs: The integration of multiple APIs (AssemblyAI, Deepgram, Pexels, etc.) into a single user experience demonstrated strong technical problem-solving skills.
What we learned
- Multimodal AI Integration: This project gave us a deeper understanding of how to combine different AI technologies—speech, text, and images—to create a more interactive and insightful user experience.
- Backend Architecture: We gained hands-on experience in structuring and organizing backend services for a project with multiple dependencies, from transcription to file storage.
- Real-World AI Application: We learned how to leverage powerful AI APIs to solve real-world problems, especially in areas of productivity and team collaboration.
- Handling Asynchronous Operations: Managing asynchronous tasks, particularly when dealing with APIs that require complex responses, was one of the major learning curves.
What's next for Meeting Insights Generator
- Real-time Transcription: We plan to implement real-time transcription, so users can generate insights during live meetings, enhancing the app's usefulness.
- Speaker Identification: Adding speaker identification to the transcription process will help clarify who said what, making the summaries even more insightful.
- Collaboration Features: Future updates could include features that allow team members to collaboratively refine meeting summaries or provide feedback.
- Advanced Text-to-Speech: Improving the text-to-speech functionality to provide more natural-sounding audio summaries will enhance user experience.
- Scalability and Performance Improvements: As usage grows, optimizing performance for faster transcription, summarization, and image generation will be crucial for scalability.
Built With
- assemblyaiapi
- css3
- deepgramapi
- express.js
- huggingface
- mongodb
- node.js
- pexelsapi
- react
- supabase
- vite

Log in or sign up for Devpost to join the conversation.