Inspiration
One of our group members noticed that when we went to online learning, comparing our thoughts and ideas from texts that we read became more difficult. We couldn't easily sit down and compare what we all annotated, so we thought we might be able to create an online software that could analyze pdf files and display useful information about the group of files.
What it does
On run, the program asks for an original pdf file that does not have highlights. Then, the program asks for a directory that contains many pdf files similar to the original, with the exception of the highlights made on each file. Once both file paths are selected, the program keeps track of all the highlights made in the files in the directory. It creates a new pdf in the same location as the original pfd, and opens it. The resulting pdf is highlighted based on the frequency of highlights. A lighter highlight has been highlighted at least once, but not often, and the darker a highlight is, the more often it was highlighted in the directory.
How we built it
Given that our group has little practice with developing programs like this one, we started by deciding what kind of input/output we wanted, and the kinds of classes we may need to get us from the input to the output. We knew we didn't have a ton of code to write, but the code we needed to write did a lot of things we didn't know how to do, so there was always at least one or two people googling and experimenting with APIs during the afternoon/night.
Challenges we ran into
Pdf libraries are not created equal! We spent a lot of time trying to figure out which pdf library we would use for reading and editing the files. A lot of time was spent searching online to find an API that did what we needed with the PDFs. We eventually settled on pdfBox for getting data and pfdClown for writing data. We changed our minds a couple of times throughout the project.
Accomplishments that we're proud of
We are proud of the project as a whole! We are aware that the code is a bit messy and things don't work quite how we want to, but when we run the program, our goal of differentiating the highlights that were made often or only once is evident.
What we learned
For most of us in the group, this was our first time using GUIs, using external libraries in a project, working as a team on a project, and our first time trying to read/edit pdf files. Needless to say, we learned a TON about project logistics and how powerful external libraries can be if we understand what they do. Likewise, we learned that external libraries can be confusing and difficult to work with, especially if they are not well documented or used often, like pdfClown.
What's next for PDF Highlight Combiner DLX
Given our lack of experience, we decided to create something very basic. In the future, we would like to use javascript and expand this project on the web. We imagine that this tool could be used in something like Canvas, where a professor can view a summary of all of the annotations from their class on that particular reading. We would like to expand the types of annotations to include notes, categories, tags, or include who highlighted specific passages, to allow for even more information sharing.
Log in or sign up for Devpost to join the conversation.