This idea came because we sometimes consume audio content while still having a screen on that does nothing. So we thought we could use that screen to engage us by showing pictures related to the audio we're listening.
What it does
Ipsum listens to an audio file or stream, and shows images related to what it hears.
How we built it
For audio analysis, the process is composed of three main steps :
- Speech recognition : Extract text from the audio
- Topic extraction : Decide what is the topic of discussion
- Image search : Look for images related to the topic
Challenges we ran into
NLP is a challenge in itself. More than than that, coming up with the right algorithms.
Accomplishments that we're proud of
Even though we ran into a lot of difficulties, we can show a working prototype of both streamed audio, and audio file analysis, even though they function in a different manner :
What we learned
Even though we think we have solutions, problem can still appear anytime and anywhere !
What's next for Ipsum
A better topic recognition is the most desired feature for audio file analysis, and a more precise instant stream speech recognition is what's most important for the stream analysis.