AudioScribe AI

Homepage

AudioScribe AI

Inspiration

The inspiration for this project likely came from a desire to improve the workflow and efficiency of frontline customer service workers. By being able to easily record, transcribe, summarize, and even translate conversations in real-time, customer service workers are provided with invaluable tools that allow for better communication, understanding, and record-keeping. Furthermore, the additional features, such as extracting reminders and appointments, show an understanding of the challenges faced by customer service representatives and the kind of assistance they would benefit from in their everyday tasks.

What we learned

The code indicates a multi-faceted approach, integrating various tools and technologies:

Audio Recording & Playback: Used sounddevice and soundfile libraries for recording and playing back audio.
Natural Language Processing (NLP): Utilized the nltk library for sentence tokenization which helps in separating customer and agent dialogs.
Language Detection: Implemented language detection using the langdetect library.
Transcription: Employed the whisper library to transcribe audio recordings into text.
OpenAI API Integration: Integrated OpenAI's API for several functionalities, including text summarization, reminder extraction, and translation.
Web Development: Created a web application using the Flask framework and handled cross-origin requests using the flask_cors library.

How we built It

The project is structured as a Flask web application that offers several endpoints to control and interact with its functionality:

Recording: Endpoints allow for starting and stopping audio recordings.
Playback: Plays back the latest recorded audio.
Transcription: After stopping a recording, the audio is automatically transcribed into a dialog format, distinguishing between agent and customer.
Summarization: Provides a bullet-point summary of the transcribed conversation.
Reminders: Extracts and formats information about reminders, appointments, or meetings from the conversation.
Translation: Can translate the transcribed conversation to a supported target language.
Load Latest Recording: Loads the last recorded audio.

Challenges we faced

Audio Handling: Managing audio data, especially recording and playing back in real-time, can be challenging.
API Limitations: Working with APIs like OpenAI might come with rate limits or request limitations, ensuring the efficient use of API calls is crucial.
Language Support: While the project supports multiple languages for translation, it needs to handle the case where the language isn't supported or when the language detection isn't accurate.
Error Handling: Ensuring smooth user experience means anticipating and gracefully handling potential errors like missing recordings or failed API requests.

Built With

and
built-with-python
cors
dotenv
flask
langdetect
nltk
openai-api
sounddevice
soundfile
whisper

Submitted to

PlutoHacks 2023
- Winner UKG - ✨Sponsor Challenge✨

Created by

By using Flask, I routed the endpoints to allow data processing and information retrieval from customer service calls. I also troubleshot problems in both the React front-end and Flask back-end to increase productivity in the workflow.

Eric Gessa
I spearheaded the backend development of Audio Scribe AI, utilizing Python and Flask. I seamlessly integrated key APIs like OpenAI for real-time summarization and translation, coupled with the whisper library for transcription. Additionally, langdetect ensured our system's adaptability to various languages, enhancing the overall user experience.

Kevin Camacho
Student at FlU Passionate about AI, ML & Back End. Eager to enhance skills & tackle challenges.
My expertise was particularly evident in the integration and utilization of cutting-edge technologies. I leveraged the capabilities of React, a popular JavaScript library, to build efficient and modular components. The use of Vite, a next-generation frontend tooling, further optimized our development and production workflows, ensuring faster build times and enhanced performance.

NassG1214 Nassi
As a front-end developer on the project, my role was primarily focused on the visualization and styling of the project. I used CSS for styling and JavaScript for creating components. My aim was to create a visually appealing and user-friendly interface ensuring a positive first impression for visitors.

Daniel Gonzalez

Updates

Kevin Camacho started this project — Oct 14, 2023 02:25 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.