AudioScribe AI
Inspiration
The inspiration for this project likely came from a desire to improve the workflow and efficiency of frontline customer service workers. By being able to easily record, transcribe, summarize, and even translate conversations in real-time, customer service workers are provided with invaluable tools that allow for better communication, understanding, and record-keeping. Furthermore, the additional features, such as extracting reminders and appointments, show an understanding of the challenges faced by customer service representatives and the kind of assistance they would benefit from in their everyday tasks.
What we learned
The code indicates a multi-faceted approach, integrating various tools and technologies:
- Audio Recording & Playback: Used
sounddeviceandsoundfilelibraries for recording and playing back audio. - Natural Language Processing (NLP): Utilized the
nltklibrary for sentence tokenization which helps in separating customer and agent dialogs. - Language Detection: Implemented language detection using the
langdetectlibrary. - Transcription: Employed the
whisperlibrary to transcribe audio recordings into text. - OpenAI API Integration: Integrated OpenAI's API for several functionalities, including text summarization, reminder extraction, and translation.
- Web Development: Created a web application using the Flask framework and handled cross-origin requests using the
flask_corslibrary.
How we built It
The project is structured as a Flask web application that offers several endpoints to control and interact with its functionality:
- Recording: Endpoints allow for starting and stopping audio recordings.
- Playback: Plays back the latest recorded audio.
- Transcription: After stopping a recording, the audio is automatically transcribed into a dialog format, distinguishing between agent and customer.
- Summarization: Provides a bullet-point summary of the transcribed conversation.
- Reminders: Extracts and formats information about reminders, appointments, or meetings from the conversation.
- Translation: Can translate the transcribed conversation to a supported target language.
- Load Latest Recording: Loads the last recorded audio.
Challenges we faced
- Audio Handling: Managing audio data, especially recording and playing back in real-time, can be challenging.
- API Limitations: Working with APIs like OpenAI might come with rate limits or request limitations, ensuring the efficient use of API calls is crucial.
- Language Support: While the project supports multiple languages for translation, it needs to handle the case where the language isn't supported or when the language detection isn't accurate.
- Error Handling: Ensuring smooth user experience means anticipating and gracefully handling potential errors like missing recordings or failed API requests.
Log in or sign up for Devpost to join the conversation.