• The original problem that Smart Voicenotes solved that many of the busiest and most productive people I know prefer sending and receiving messages via text, rather than WhatsApp voicenotes. Whether it's due to the fact that they aren't in a fitting location or situation to listen to them, or don't have the interest to sit through a 5-minute voice message from a stranger, the problem is they still need to get to the important bits of the message to come to a reasonable decision.
  • Unlike other messaging platforms, WhatsApp doesn't give opportunities for custom integrations in the traditional sense. Through the recently released WhatsApp Business API, we can facilitate things we hadn't before.
  • I already had just launched my SaaS startup, EveryWord, which generates accurate subtitle files and transcripts for audio and video. So I wanted to bring the solution of transcriptions to voice notes. That would solve the problem of having to listen to it. EveryWord's goal is to be the productivity tool For The Text Generation, so this became an easy fit because it matched the goals.
  • But why stop there I figured? If somehow we can deliver the most important details of the voicenote as a summary, we'd be allowing persons to save even more time by pointing them to the main details. That's how the idea for what I'm calling Smart Voicenotes came about.

What it does

Smart Voicenotes provides you with a text version of your WhatsApp voice notes and a detailed summary of the key aspects, without ever leaving your WhatsApp app. Each will highlight the tone of the message, any special names that were mentioned and the most important section of the voicenote.

It takes less than 15 seconds to set up, and is already iOS, Android, Mac & PC ready because was is built upon the world's most widely used messaging platform, ever. More than 2 billion persons use WhatsApp today, and any one of them can use Smart Voicenotes today.

You can see a demo of getting started below -

How we built it

Our solution made use of some key technology to make this solution easily accessible. It was built entirely in Node.js.

WhatsApp for the UI 😅

Twilio for WhatsApp Business API

EveryWord's Node.js server REST Endpoints as the project's backbone for the automatic speech recognition engine

and for sentiment analysis, named entity recognition and keyphrase extraction.

User Flow

Challenges we ran into

Having so many third-party solutions also brought a few problems during development.

  1. Incorporating so many powerful tools to build a new solution can really help when outsourcing the work, but it also means that you open yourself up to problems if your dependencies go down. This technical debt problem showed up when the speech recognition engine had a server error, close to the hackathon deadline, that stopped the project from working completely.
  2. One of those is ensuring that the communication is interoperable, and accounting for different limits enforced by each API. For example, while is able to parse a message that comes up to about 10,000 characters long, Twilio's WhatsApp API can only send a maximum of 1,600 characters at a time.
  3. Getting the video to be less than 3 minutes was no joke!

That meant quite a bit of time was spent code-wise to get the sentences to not break mid-sentence between messages that were being sent to the user, and ensuring the order was correctly providing FIFO on the message queue.

Accomplishments that we're proud of

When I sent the link out for people to test, they were impressed with the accuracy of the transcript, as well as the fact that it could figure out the most important sentence.

That's very important to me because if you've ever heard a Jamaican speak, you'll know it's definitely not the same English that the Speech engines were trained on. We have a special kind.

I'm also glad we found a practical way of pushing ML & AI applications into the mainstream.

What we learned

  • The importance of interoperability between multiple 3rd-party vendors.
  • That you can use phone numbers, fairly securely, as unique identifiers for people globally.

What's next for Smart Voicenotes™ by EveryWord

The main goal for this application is to become an add-on to the main product offering of EveryWord, as an additional way to use transcription to make life easier. This includes Video transcription and support for other languages for this specific plugin.

Why Enter This Hackathon?

I'm very good at software development, i.e. solving problems that people have with technology to help them. I'm not that great at marketing though, so although EveryWord was officially launched last November, we've only had about 8 free customers and 1 paid use it.

Also, working with and being posted on their blog could be a massive boost.

This is a good opportunity to try this out. I like the product you have, and think it would enhance my main product offering as well.

References 📚

Built With

Share this project: