monday.com voice interface

Widget status while recording the audio

Inspiration

monday.com is all about productivity (yes guys, we've seen your ads on YouTube. Repeatedly), and, in order to be productive, you need to have a smooth and natural interaction with any tool we use to organize ourselves.

monday is already one of the tools with the best experience out there (shoutout to the UX team!). But we wanted to make it even more natural... and what is more intuitive for human beings than speaking? Enter our voice interface.

What it does

It's a widget you can add to any dashboard on your monday.com account, bind it to a table, and use to create items on it straight out of the box by saying phrases like "add a task for me to reply that very important email tomorrow".

In this example, the widget would create an item on the table it's connected to with the text "reply that very important email", assign it to you, and set the due date for tomorrow.

How we built it

The widget interface was built with react using the standard tutorial. We record the audio input with a custom react component called react-audio-analyser, and then send the blob to our backend for processing.

For the NLP we used Dialogflow. It allows us to define our own sentence structures with a token-based syntax. This way we could train the IA model to detect the portions of the input that matches the name of the task, the name of the assignee, and the due date.

Dialogflow's API would then receive the audio blob, use Google's impressive Speach to Text solution to convert the audio to a string, process it using our custom rules, and return to us a JSON object with all the parameters we needed to create the item on the board.

After receiving the JSON response from Dialogflow, we simply used monday's SDK to create the item on the board.

All the work took around 48 hours (most of them on one weekend).

Challenges we ran into

The biggest one was the integration with Dialogflow's API. For starters, the way it's authenticated makes inviable to call it directly from the frontend. So we created a simple backend that just receives the audio blob, calls the API, and returns the JSON response received from Dialogflow. The backend is completely serverless using AWS Lambda.

After that, we still had problems with the audio file encoding format. The initial component we were using encoded the file in OGG and Dialogflow only really works with WAV format (despite what their documentation says). Took us a while to find this last component that already gives the file in WAV.

Accomplishments that we're proud of

We are very satisfied with the architecture we built. The serverless backend is 100% scalable, and the whole system is only limited by Dialogflow's quotas and limits, which can e increased if the demand gets too high.

What we learned

Each one of us worked with something we've never seen before on this project. The frontend didn't know React, the backend didn't know serverless and the IA engineer had never used Dialogflow. So we had to learn everything we used.

We also learned to always read the Terms and Conditions before starting a hackathon project. We are from Brazil and only saw that the competition was not open for us here after all the work was done. We still submitted the project because we're proud of our work anyway. :)

What's next for monday.com voice interface

Right now the assistant works fine for just one sentence model: "add a task for @someone to @task_text @due_date". for its broad usage it's fundamental we train the model with to understand other formats too.

Besides that, we can think of other interactions for the assistant to do inside monday. Basically we can add an intent for everything the monday SDK can do.

Last but not least: What we did here was a voice interface for monday's API, which uses GraphQL. What we are really doing under the hood is simply transpiling the JSON output from Dialogflow into the proper GraphQL query... So, with the right modeling, this project can turn into a general voice interface for any GraphQL API!

This way we could have a voice-enabled assistant that would instantly interface with any system that uses GraphQL. No time wasted integrating the assistant with every REST endpoint of every product on the market.