Insightly Audio Cloud

GIF
Example of a speech to text inaccuracy - transcribed incorrectly
Example of a speech to text inaccuracy - transcribed incorrectly
GIF
Scenario 1 - Customer Returns (Retail)
POST Audio error
POST Audio success

Inspiration

A picture is worth a thousand words.

Picture a picture made only of words. What do you get? A word cloud! ☁︎ It's the perfect bite-sized, visual, graphic representation of words holding the most weight. An ideal medium for summarizing content, including long audio and video transcripts.

The Vision

Issues & Incidents Tracker ★ Sentiment Analysis

Trackers, speaker diarization, bulk audio uploads, combined with custom queries and data inputs can provide interesting & useful insights on aggregate data. Symbl's APIs can be used to address specific questions like...

What % of 100+ inbound technical support calls are incidents related to troubleshooting printers? How long on average does it take (how much time is spent) on resolving these issues?
What % of calls pertaining to XYZ product during the holiday promotional sale were focused on refunds/returns in December of 2020 ?
What's the accuracy rate of calls forwarded to the correct business extensions? How many redirects before a customer's ticket/case is closed/resolved?

An Organizational Tool

Adaptable to the needs of a broad audience - working professionals, students, professors, content creators - anyone looking to filter & organize several tens to hundreds of unlabeled/mislabeled, hour plus long audio files (lecture recordings, business meetings, raw video footage, webinars, speeches, interviews, podcasts, etc.) without needing to sit through and listen to every single audio merely to extract their agendas.

For Marketing & Customer Retention

Customer segmentation + follow-ups based on overall sentiment analysis detected from phone calls. Better understanding purchase activity, promotions, and customer journeys. Categorize calls into batches by buyer stage - awareness, consideration, intent, purchase, and repurchase.

How I built it

APIs

Symbl Async Audio + Conversations APIs
QuickChart Word Cloud + Pie Chart APIs

Tech Stack

HTML, CSS, JavaScript (web)
React.js + JSX, Node.js + Express
material-ui, react-icons, react-router, Axios

Challenges and Limitations

Prototype specific

This prototype currently only supports basic insights for single, asynchronous audio without an option to upload audios via the front end interface. This means that the conversation ID must be obtained & inserted in server side code. There is also a known limitation with Cross-Origin Request Blocked (CORB), where certain audio transcripts + Quick Charts API trigger this blocking in spite of the crossOrigin="anonymous" attribute added to <img>. This could be caused by trying to upload an audio above a certain file size limit. Though unadvised, this protection setting can be temporarily disabled during local development. The Access-Control Headers needs to be set as a more permanent solution.

The word cloud should not include any stop words. I defined merged Sets of stop words (manually) for filtering, but the ideal and more efficient approach would be to fetch all words from some dictionary API containing a list of common words, or to scrape a web page, like this one), which lists nearly 1,000 stop words. The current algorithm implemented in my code ignores several of the most frequently used words, but it does so in an inefficient way that cannot be sustained as the Set of stop words continues to grow. Removing several hundreds of stop words from each transcript at optimal performance time requires a change in algorithmic approach.

Symbl + Quickchart API Limitations

Symbl Async Audio API - I provided a vocabulary list to trackers param for POST Audio and found that it decreased the accuracy of the sample audio that I had tested. But perhaps I was feeding the data incorrectly.

Symbl Conversation API's speech to text transcription is mostly accurate, but I've spotted a few misspellings here and there that changed the meaning or negatively affected an action item. Handling such data requires having checks in place. Here are two examples:

Speech to Text Error - Example 1 ~ Speech to Text Error - Example 2

Occasionally, an error arises with POST audio via the Async Audio API that obstructs workflow:

POST Audio Error

With QuickChart Word Cloud API, I set removestopWords to true. However, this did not appear to have any effect on removing stop words from the word clouds that were rendered.

Acquiring Raw Audio Recordings

I didn't know where I could go to download free sample audios recorded in real world settings/scenarios released under fair Creative Commons use for the development of this project. I'm not sure if they even exist. So I was limited to less than a handful of mock calls that I could find online. Having access to a variety of audios to work with (such as sales calls, customer service, tech support, business meetings, lectures) or a group of closely related raw audios within a specific niche/industry would have helped immensely.

What I Learned

Passing data client <-> server. React + Node. React Routers. Project architecture. Symbl APIs.

Project Repo

GitHub Repo

Built With

Updates

Ani T started this project — Oct 27, 2021 12:48 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.