Inspiration

When attending crowded lectures or tutorials, it's fairly difficult to discern content from other ambient noise. What if a streamlined pipeline existed to isolate and amplify vocal audio while transcribing text from audio, and providing general context in the form of images? Is there any way to use this technology to minimize access to education? These were the questions we asked ourselves when deciding the scope of our project.

The Stack

Front-end : react-native

Back-end : python, flask, sqlalchemy, sqlite

AI + Pipelining Tech : OpenAI, google-speech-recognition, scipy, asteroid, RNNs, NLP

What it does

We built a mobile app which allows users to record video and applies an AI-powered audio processing pipeline.

Primary use case: Hard of hearing aid which:

  1. Isolates + amplifies sound from a recorded video (pre-trained RNN model)
  2. Transcribes text from isolated audio (google-speech-recognition)
  3. Generates NLP context from transcription (NLP model)
  4. Generates an associated image to topic being discussed (OpenAI API)

How we built it

  • Frameworked UI on Figma
  • Started building UI using react-native
  • Researched models for implementation
  • Implemented neural networks and APIs
  • Testing, Testing, Testing

Challenges we ran into

Choosing the optimal model for each processing step required careful planning. Algorithim design was also important as responses had to be sent back to the mobile device as fast as possible to improve app usability.

Accomplishments that we're proud of + What we learned

  • Very high accuracy achieved for transcription, NLP context, and .wav isolation
  • Efficient UI development
  • Effective use of each tem member's strengths

What's next for murmr

  • Improve AI pipeline processing, modifying algorithms to decreasing computation time
  • Include multi-processing to return content faster
  • Integrate user-interviews to improve usability + generally focus more on usability

Built With

Share this project:

Updates