Inspiration
The inspiration for this project came from one of our group members who would regularly take his phone out and begin recording himself speaking about all the things he had to do in the following days. This led to us thinking of how we could implement a solution with AI such that he wouldn't have to go back and listen to everything and just get the important details.
What it does
This project is a website that records a user speaking about all the tasks they have due in the upcoming days and then returns a bulleted TODO list. This is a summary of all the user's tasks as well as when they should be done.
How we built it
We built this website with a React front-end that records the user's voice as a .flac file and sends it to a Python backend with a Flask API call. We then pass the audio file to the Google Cloud speech-to-text API. That output is then given to the Open AI API which has already been given a fixed prompt to take the rambling input from the user and return a nicely summarised to-do list.
Challenges we ran into
We ran into several challenges along the way. One of the main challenges we ran into was that some of us had previously worked on full-stack websites before, however, none of us had worked very much if at all with React or Flask and we had to learn that on the spot. The workshop that was held earlier during the hackathon for React was very helpful. Another thing that set us back was the billing for Open AI's API since we were not familiar with how exactly it would work. We ended up getting quite comfortable with it once we saw how affordable it was, however.
Accomplishments that we're proud of
We are proud of creating a project that leverages many different technologies into something that is useful in everyday life.
What we learned
We learned a lot about React and Flask as well as using various API technologies, particularly from Open AI and Google Cloud. We also realized that there is an enormous amount of possibilities for applications of the Open AI and Google Cloud APIs when you combine powerful tools like gpt-3.5 with automation.
What's next for Speak Scribe
There are several different directions in which we can take this website. The first is we could connect it to a TODO API like Google Tasks or Microsoft To Do. We could also rewrite a mobile app like this with the same idea to make it more accessible for mobile users. Finally, we could make it so that whatever language the user speaks to the website as input will also be returned as the language of the bulleted list output.
Log in or sign up for Devpost to join the conversation.