Inspiration
Documentaries offer learning in a way that is un-attainable by traditional studying methods. They are immersive, exciting and overall an exciting experience. The issue is, that documentaries are hard to find, hard to explain and hard to watch.
What it does
Chronicles of Alexandria is an all-out library for all the documentaries one may ever need. It also supports and boasts semantic search capabilities which will allow users to search the documentaries using their synopses.
How we built it
We built it as a flask app with a html/css frontend. The semantic search was made possible using pinecone database with the all-MiniLM-L6-v2 provided free of charge by Sentence-Transformers. We got the documentaries from different sources across the internet. This was done mostly by web scraping.
Challenges we ran into
Documentaries are so sparsely distributed over the internet that it was extremely difficult to search an collect all of them or to even extract their metadata. Categorization of those documentaries was a tedious task as to their is no particular definition of the categories.
Accomplishments that we're proud of
The semantic search made possible by cosine-similarities was awesome to see in action. We were trying different search queries and seeing what documentaries are given by the it. It was awesome working with LLMs. Also the database of over 800 documentaries with their links, photo_links, synopsis, and their title.
What we learned
We learned a lot about web scraping, cosine similarities, LLMs, PyTorch, Flask, Render, urllibs, bs4, requests, jupyter, and a lot more. We also learned that making a database for a website is a very tedious task but its fun trying to figure out how to scrape the parts we need, etc.
What's next for Chronicles of Alexandria
- We are looking forward to enlarge the database for the website.
- We wanted to add a chatbot that would explain the plotlines of the documentaries to the viewers.
- Improve Frontend with the addition of more pages and dropdowns.
- We have the documentaries categorized in the backend but couldn't use them due to time restrains.
Log in or sign up for Devpost to join the conversation.