Inspiration
We were inspired by our own experiences with Zoom fatigue. We thought about all the problems we face when we are tired from online meetings.
What it does
Firstly, preparation documents can be uploaded. An AI summarizes this so the participants can be introduced to the topic of the meeting quickly. Secondly, another AI does real-time speech recognition and transcribes everything. The speaker gets subtitles. Lastly, the transcription gets also summarized and this summary with the main takeaway points is sent in an email to the participants.
How we built it
The whole system is split into three parts:
The front end for uploading pre-meeting documents for summarization. This is done by creating a user interface with React for the user to upload the documents, and obtaining the AI summarization of the document. The front end for speech transcription during the meeting. This is done with python and we use the VOSK library for voice recognition. The node.js backend system for receiving pre-meeting documents and transcription from the front end. With the document or transcript of the meeting, we make use of OpenAI’s GPT-3 API to do document summary and transcription summary. For speech transcription, we make use of Nodemailer library in node.js to send the transcription summary to the meeting participants. For the second and third part, we use Heroku cloud service to host them.
Challenges we ran into
Finding a decent speech recognition tool was difficult. Setting the VOSK AI up was also not evident.
Accomplishments that we're proud of
We are proud about the real-time speech recognition. The integration of openAI in the code is something we are very proud of. Also the automated email with the summary at the end is a little touch we think is cool.
What we learned
Only 1 member had significant experience in AI. Some barely had any coding experience. We all learned a lot about working with AI tools. Because things didn't work out immediately, the succes afterwards was even more rewarding.
What's next for Summarscript
We have a working product, but there is still a lot of unrealized potential. We could for example expand the speech recognition to speaker recognition. VOSK supports speaker identification, so integrating this would improve the summary for meetings with multiple speakers. Another feature that we would like to add is real-time translation of the transcription.
Built With
- gpt-3
- heroku
- http
- javascript
- node.js
- openai
- python
- react
- vosk
Log in or sign up for Devpost to join the conversation.