love in love in the time of cholera
Got inspired by this article to look for quantitative trends within books. Specifically, I wanted to create a tool that allowed a user to query for a theme and see how it progresses throughout the course of a book.
What it does
Given a search term and a book, the program traverses the book paragraph by paragraph and gives each paragraph a weighting for the score of the theme in that paragraph. The results are then visualized in a line graph.
How I built it
I used python to traverse the book, Calibre software to convert epub files to .txt, Twinword to find weighted associations for the query term, and plotly to visualize the results
Challenges I ran into
Connecting all the parts: getting the book, the search term, and visualizing into the ploy
Accomplishments that I'm proud of
The scoring scheme
What I learned
NLP is a lot more complicated than once thought
What's next for BookQL
Connecting the existing parts together and allowing for comparision between books or multiple query terms to a book