At work, conference calls usually involves multiple people on one side using the same microphone. It may be hard to know who's speaking and what their role is. Furthermore, some details of the meeting can be lost and it's tedious to note everything down.

What it does

Our app distinguishes/recognizes speakers, shows who's speaking and automatically transcribe the meeting in real time. When the meeting ends, our app can also export the meeting minutes (log of who said what at what time).


  • display who's currently speaking using speaker recognition
  • transcribe what's being said by who like a chat application
  • create and train a new speaker profile within 15 seconds
  • stream transcription to services such as Slack
  • export transcription to cloud storage such as Google Sheets

How I built it

  • Microsoft Speech Recognition API
  • Microsoft Speech to Text API
  • Google Cloud Speech to Text API
  • Google Sheets API
  • Slack API
  • stdlib for integrating services for the backend such as Slack and SMS
  • NodeJS with Express for the backend
  • Vue for the frontend
  • Python scripts for accessing Microsoft's APIs
  • Love ❤️

Challenges I ran into

Generating the audio file in the right format for Microsoft's API was tougher than expected; seems like Mac's proprietary microphone isn't able to format the audio in the way Microsoft wants it.

Accomplishments that I'm proud of

  • Learning how to use the APIs, Microsoft Azure, and sampling an audio input to a format the API needs.
  • Finishing an app before the deadline.

What I learned

Usage of many APIs, speech recording, and integration of multiple services.

What's next for Who Said What?

A year long worldwide tour to show.

Built With

Share this project: