GM already has an app for audiobooks. Why not have one for podcasts?
What it does
It listens to the contents of the podcasts, and tries to pick out ones similar to the ones the user has already listened to.
How we built it
We use RevSpeech to transcribe the podcasts, and then use
doc2vec to make a vector representation of the transcription. These high dimensional vectors approximate the semantic meaning of the podcast. We use these to pick similar one to what the user has listened to.
We trained the vector representations on Spell's machine learning platform, using the API to initiate experiments.
Challenges we ran into
We needed to transcribe a large volume of podcast data. We have been able to do this thanks to the generosity of the RevSpeech team. Thank you!
We also had trouble interfacing the different components of the system, from missing data that should be attached to the transcript to the format of the feature vectors being passed around.
Accomplishments that we're proud of
We successfully used machine learning algorithms to create a system which recommends related podcasts. We have also managed to produce an attractive and useable interface.
What we learned
- Python is a great language for rapid prototyping, but the lack of interface definitions often make it hard to use.
- How to use Bottle
- Jetlag and hackathons don't play nice together.
What's next for HackMIT recommends...
We should be able to produce a more elaborate mathematical model for the recommendations. We might also expand to other platforms.