This idea came because we sometimes consume audio content while still having a screen on that does nothing. So we thought we could use that screen to engage us by showing pictures related to the audio we're listening.

What it does

Ipsum listens to an audio file or stream, and shows images related to what it hears.

How we built it

For audio analysis, the process is composed of three main steps :

  1. Speech recognition : Extract text from the audio
  2. Topic extraction : Decide what is the topic of discussion
  3. Image search : Look for images related to the topic

Challenges we ran into

NLP is a challenge in itself. More than than that, coming up with the right algorithms.

Accomplishments that we're proud of

Even though we ran into a lot of difficulties, we can show a working prototype of both streamed audio, and audio file analysis, even though they function in a different manner :

What we learned

Even though we think we have solutions, problem can still appear anytime and anywhere !

What's next for Ipsum

A better topic recognition is the most desired feature for audio file analysis, and a more precise instant stream speech recognition is what's most important for the stream analysis.

