Inspiration

allows owners of simple phones to use internet search engines through voice

What it does

program gets the audio file (because I can speak Swahili) and uses wit API to turn it into text. Initially the idea was to get a Swahili language pack and train the language model for offline use, that approach would've been more appropriate, but in 24 hours it is next to impossible to put it together (at least for me, hope other teams did a better job)

How I built it

It was build before, I just learned how to use it.

Challenges I ran into

I tried to make it work in offline mode, to train our own language model and use CMU Sphinx toolbox. It took me most of the night just get the Sphinx working (fairly complicated set-up process).

Accomplishments that I'm proud of

I was able to run the simple the following simple test on my machine. I didn't really solved their problem but it is a good start. If time allows I'll look into that more.

link to important sites to further study

[link] https://github.com/wit-ai/pywit

[link] https://github.com/Uberi/speech_recognition

[link] http://cmusphinx.sourceforge.net

this is where I found out that wit understands Swahili

[link] https://wit.ai/faq


!/usr/bin/env python3
###############################################
#############SCRIPT START#########################
##############################################

import speech_recognition as sr

###############################################
# obtain path to "swahili_sample.wav" in the same folder as this script
###############################################

from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "swahili_sample.wav")


###############################################
# use the audio file as the audio source
###############################################

r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source)  # read the entire audio file


###############################################
# recognize speech using Wit.ai
###############################################

WIT_AI_KEY = "INSERT WIT.AI API KEY HERE"  
# Wit.ai keys are 32-character uppercase alphanumeric strings
try:
    print("Wit.ai thinks you said " + r.recognize_wit(audio, key=WIT_AI_KEY))
except sr.UnknownValueError:
    print("Wit.ai could not understand audio")
except sr.RequestError as e:
print("Could not request results from Wit.ai service; {0}".format(e))
 
###############################################
###############END OF SCRIPT######################
###############################################

What I learned

Convolution neural networks does not work well for speech recognition.

What's next for Swahili voice-to-text

Dig dipper into CMU Sphinx toolbox get Swahili language pack to build grammar and phonetic models.

Built With

  • and-a-ton-of-extra-dependencies
  • https://github.com/uberi/speech-recognition
  • python
Share this project:
×

Updates