Ted Similarity Index
This is just a proof of concept app. I believe a ted talk's context can be broken down using nothing but the text presnet in the subtitles . We can actually use the subtitles of ted talks to compare them using nlp and then use the comparision score as a metric. So with this idea, I have created this proof of concept app.
The main basis for comparing the subtitles are the individual important features present in the subtitles.
A lengthy blog post about the procedure I have followed and the idea behind is available here
Note: To run this app locally you need to have installed scipy, numpy, scikit, nltk (with its data) installed
Local Installation & Running
git clone email@example.com:drreddy/tedtalks-similarity-index.git cd tedtalks-similarity-index pip install -r requirements.txt python server.py
As I have mentioned this is just a proof of concept app many improvements can be done like:
- User can search using string instead of using tags (which is implemented now).
- Implement a CRON like service so that the data gets updated periodically.
- Feature extraction and processing can be improved.
- UI improvements and also Client side data validation.
- Better algorithms for comparing the subtitles.
Finally coming to technical parts of the app, The app is made of:
- Python Tornado
- Bootswatch Lumen Template (with some customization)