Inspiration

The inspiration for this project came from one of our group members who would regularly take his phone out and begin recording himself speaking about all the things he had to do in the following days. This led to us thinking of how we could implement a solution with AI such that he wouldn't have to go back and listen to everything and just get the important details.

What it does

This project is a website that records a user speaking about all the tasks they have due in the upcoming days and then returns a bulleted TODO list. This is a summary of all the user's tasks as well as when they should be done.

How we built it

We built this website with a React front-end that records the user's voice as a .flac file and sends it to a Python backend with a Flask API call. We then pass the audio file to the Google Cloud speech-to-text API. That output is then given to the Open AI API which has already been given a fixed prompt to take the rambling input from the user and return a nicely summarised to-do list.

Challenges we ran into

We ran into several challenges along the way. One of the main challenges we ran into was that some of us had previously worked on full-stack websites before, however, none of us had worked very much if at all with React or Flask and we had to learn that on the spot. The workshop that was held earlier during the hackathon for React was very helpful. Another thing that set us back was the billing for Open AI's API since we were not familiar with how exactly it would work. We ended up getting quite comfortable with it once we saw how affordable it was, however.

Accomplishments that we're proud of

We are proud of creating a project that leverages many different technologies into something that is useful in everyday life.

What we learned

We learned a lot about React and Flask as well as using various API technologies, particularly from Open AI and Google Cloud. We also realized that there is an enormous amount of possibilities for applications of the Open AI and Google Cloud APIs when you combine powerful tools like gpt-3.5 with automation.

What's next for Speak Scribe

There are several different directions in which we can take this website. The first is we could connect it to a TODO API like Google Tasks or Microsoft To Do. We could also rewrite a mobile app like this with the same idea to make it more accessible for mobile users. Finally, we could make it so that whatever language the user speaks to the website as input will also be returned as the language of the bulleted list output.

Built With

Share this project:

Updates