make-doc-listenable

Inspiration

I wanted to do my class readings (semi technical papers in Economics, such as an Amartya Sen book and read papers (not BOOKS) (such as Cryptographic Moral Function by listening to them; I used Text to Speech (TTS) apps like Speechify and Apple's TTS in Preview; uploaded pdfs to them. But, they read out the junk too; references, headers and footers everytime, making them unusable. For older papers, they read the text character by character sometimes, instead of word by word!

What it does

Takes in a general pdf, outputs a text file which is cleaned out for input to a TTS engine

How we built it

Read in pdf; segement it page by page using pdfminer in python. Steps 2 to 3 are per page
Extract text using textract using OCR (note this is more powerful than typical text extraction from pdfs that does not use OCR)
Supply the text and a targeted prompt to an Open AI Language Model to fix it
Collecting individual pages into a single text output file

Challenges we ran into

Getting to the right language model
Getting the prompt right!!! I worked on this for about 4 hours.

Accomplishments that we're proud of

The software! I'm going to use it everyday for my readings.

What we learned

PROMPT ENGINEERING is a thing!
Understood some limitations of certain language models.

What's next for make-doc-readable

Make it more robust for even dirtier pdfs
Integrate it with a simple tts engine, make it an app?
High-quality summaries at different levels (section wise, full paper summary)

Built With

gpt
openai
python
textract

Updates

Ayush Kanodia started this project — Feb 19, 2023 10:09 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.