People are constantly browsing different websites. Whether for academic or personal reasons, they are reading articles, looking at pictures, and clicking hyperlinks. Users will often bookmark the pages they find interesting but rarely actually revisit the site. We made TL; DR to provide people an opportunity to develop a better understanding of what they've read in the past, to lock in concepts, and recall information more actively.
What it does
We provide a full system to annotate and view your bookmarks in an easy to read format. The first part of our system is a browser extension that allows a user to highlight any section of a webpage. The content (text and images) of this highlight are instantly uploaded to our backend and processed to be viewed on our web application. In our web app, we give you a selection for feeds based on the amount of time you have to study.
If you have 5 minutes on a bus-ride, tap on the 5 minute button. You’ll be sent a customized, summarized, and condensed feed of your bookmarks and notes that you can easily read and study in 5 minutes. Once you're done, you can exit for the next session, review entire original articles, or see the notes in context of the rest of your annotations.
How I built it
The system has two parts: a chrome extension and a web app. The chrome extension leverages multiple jquery libraries and applied mathematics to allow for selection of html on the page based on the viewport and content. Once the HTML was extracted by the extension, we parsed and converted it to HyperText, allowing the text to be transferred to a Parse Database. Once the information was transferred to the Database, we iterate multiple updating processes to tag the data with necessities to summarize as well as a ranking system to lay out the learning sequence. Notes are pushed to the web app from Parse. For longer annotations and during shorter time-periods, we provide an automatic, lightweight summarization technique based on tf-idf (term frequency - inverse document frequency). This summarization gives you a readable-length summary of the notes to read on the go. When eventually combined with data from annotations across users, the cross-validation will allow for more recommended articles and more accurate summarizations based on what people highlight.
Challenges I ran into
Summarization of text was troublesome because we could not support a heavyweight summarization technique, so we had to implement on what could be modular and processed quickly, while still giving a good summary of highlighted data. We also had to look into learning systems to figure out how we should display data and how the user’s notes should be displayed if they select a single time option multiple times (i.e. if someone chooses a 30-second summary of their notes, they will get new summaries each time to maintain a fresh intake of information). Syncing everything across to the backend required ample debugging due to the onset of repeated highlights and poor formatting. We attempted to account for these problems as much as possible by providing an easy-to-read interface.
Accomplishments that I'm proud of
We were excited when we knew that we had a cohesive entire system, from annotations to summarizations, implemented instead of just one part. We were also proud of creating something that we both will actually use and actually need on a daily basis (we solved one of our own problems, and hopefully others' too).
What I learned
We learned how to use TypeScript! We had to learn how also to efficiently utilitze tf-idf techniques to summarize text quickly. As well, pulling data from a webpage based on our highlighting tool was full of challenges that we had to google extensively and debug constantly to get right.
What's next for TL;DR
We want to implement a machine learning layer on the web application and database to allow for robust sharing of similar articles across all of our users. With more NLP features, such as keywords and concept tagging, we can provide users with even more resources to learn from - persisting their understandings of the world. By aggregating annotated data from all users, we can develop ML models to then factor into the summarization - we will know what people think is important to text, which is difficult to do with a context-independent summarization technique.
We also want to provide more robust highlighting/annotating features such as tagging, multi-part selections (in the same note), customizable interfaces and generally just more options as we see fit.