Inspiration

The goal of this project was to experience developing in a completely unknown environment. With little to no knowledge of IBM-Watson, JavaScript, JSON files, and full stack web development we were looking to learn on the fly and experience new things. Being the President of a club and organizing interviews it would be much easier to come to decisions if I was able to re-listen to specific phrases or words that candidates said in a timely matter. Even if interviews were recorded however, having to re-listen to upwards of 40minutes of audio to locate 1 phrase would become tiresome and inefficient. This was the problem we planned to tackle.

What it does

This project has two parts; front-end, and back-end. The back-end of the project is written in python and has the goal of transcribing an audio file into a JSON file. Within the JSON file timestamps of each individual word are also saved. The front-end of the project reads and parses the JSON file in JavaScript while taking input from the user in HTML. The JavaScript will then send all requested data to our HTML webpage to be displayed.

How we built it

It was built using several IDE's with PyCharm being used predominantly.

Challenges we ran into

We first attempted to run a python flask server however problems arouse with multi-way communication and it was decided to be too complex for such a short period of time. This problem was fixed by switching to a semi-independent front-end/back-end. Another problem we encountered was the parsing of JSON files provided by Watson. This was fixed by print the structure that returned and re-directing stdout to a file that could then be renamed to a JSON file.

Accomplishments that we're proud of

We ran into a lot of issues when trying to get the .json file to be read by the javascript. After many iterations and some trial end error, we were finally able to parse the .json file correctly, which was a major accomplishment. Another accomplishment that we are proud of is the creativeness we had in converting the .txt file we generated into a .json file so it could be read by the HTML.

What we learned

We learned a lot of new things, as most of the technology we used was very new to us. The IBM-Watson speech to text API we used was definitely brand new technology we got to use and work with in the project. After getting familiar with it, we were able to learn a lot about its functionality and different features it offers, such as timestamps and alternative responses that can be attached to each word. Another major thing we learned as a group was how to read and parse through a .json file from our javascript front-end code. This was a big learning curve because these terms and languages are foreign territory for us, and provided us a great challenge and exciting opportunity to learn.

What's next for AudioToTextKeywordSearch

There are a lot of ideas we would like to implement further for this project. One of them would allow the user to submit an audio file through the HTML to be read by the IBM-Watson API. This would make the project more dynamic and versatile because the user wouldn't have to put the audio files into the backend, which is currently how the project operates. Another future improvement would be to create a heap map when the user searches for a word. This heap map would display the frequency of how many times this word was spoken, highlighting areas of the speech that used this word more often. This was originally a part of the scope, however due to challenges and new technology it was not able to be implemented in time. This would however allow the user to find an ,"area", of the speech they are looking more efficiently based on where the keyword appears more frequently.

Share this project:
×

Updates