Inspiration

While being in a classroom or meeting or a conference, we always juggle to multitask between listening to speaker and taking important notes from it for later reference. If you focus on taking notes you miss important points from talk and vice versa. We wish to solve this pain point which every person goes through.

What it does

This app let the users sit back and listen carefully to the speaker, while the app will automatically create notes for them. They just need to tap once to start it and tap again when session is done.

How we built it

We have used many openly publicly available APIs. We are using google's speech to text API and intellexer text summarization API apart from lot of custom audio and text processing in between.

Challenges we ran into

For speech to text, we explored many APIs including IBM watson, Microsoft oxford. None of them seems to be giving proper results. Though google APIs give decent results with Indian accent, they do not provide any punctuation anywhere. To add punctuation user has to explicitly say period, comma etc which wont work for our use case. Hence the output given by this API is a single sentence with no punctuation. Any summarizer API fails drastically here as no context can be derived from such statements.

Accomplishments that we're proud of

We have solved the above challenge to a reasonable extent by adding our own punctuation layer on top google APIs result. Our punctuation library works on to compose live text stream into semantical english grammar structures. Like if subject, object and verb has already come up, it is very likely that sentence is finished. Also we look at apply a combination of heuristic rules and grammar sentences to figure out which punctuation mark is most appropriate. Apart from that we try to understand speaker's style of speaking and continuously learn what is the right pause in speech to insert punctuation.

What we learned

That language processing a hard field of study. In the context we tried our algorithms performed better than IBM watson and Microsoft oxford API.

What's next for Namma Team

We will opensource the NLP API wrapper we wrote to create right punctuation in text. We will also work on to improve accuracy of algorithm into different noises and speaking style of English language.

Share this project:

Updates