Inspiration

I wanted to do my class readings (semi technical papers in Economics, such as an Amartya Sen book and read papers (not BOOKS) (such as Cryptographic Moral Function by listening to them; I used Text to Speech (TTS) apps like Speechify and Apple's TTS in Preview; uploaded pdfs to them. But, they read out the junk too; references, headers and footers everytime, making them unusable. For older papers, they read the text character by character sometimes, instead of word by word!

What it does

Takes in a general pdf, outputs a text file which is cleaned out for input to a TTS engine

How we built it

  1. Read in pdf; segement it page by page using pdfminer in python. Steps 2 to 3 are per page
  2. Extract text using textract using OCR (note this is more powerful than typical text extraction from pdfs that does not use OCR)
  3. Supply the text and a targeted prompt to an Open AI Language Model to fix it
  4. Collecting individual pages into a single text output file

Challenges we ran into

  1. Getting to the right language model
  2. Getting the prompt right!!! I worked on this for about 4 hours.

Accomplishments that we're proud of

  1. The software! I'm going to use it everyday for my readings.

What we learned

  1. PROMPT ENGINEERING is a thing!
  2. Understood some limitations of certain language models.

What's next for make-doc-readable

  1. Make it more robust for even dirtier pdfs
  2. Integrate it with a simple tts engine, make it an app?
  3. High-quality summaries at different levels (section wise, full paper summary)

Built With

Share this project:

Updates