Inspiration

We have been stuck in our houses for over a year at this point. All our classes and studies have been shifted to the untested online mode. It has created a lot of problems for both students and teachers. Since most of our books are now PDFs even our notes, this idea came into my mind. It will help both students and teachers to find the relevant data from a behemoth of a PDF in a matter of seconds. You can even search with the help of an Image to see all the relevant images inside the PDF!

What it does

It takes the PDF you upload, segments it, and then feeds it to two different neural nets trained for both text and images. They embed the docs, and then Jina does the searching. You can use text to get textual responses. The responses are ranked on the basis of their cosine similarity. You can use images to get image results, which again are sorted on the basis of similarities. Since I am using Neural Nets here, so it also has context awareness which makes it a much better searcher than already available.

How I built it

I built this using, Jina, Python (Flask), and Shell for the backend, and for the frontend, I have used electron js and hence made everything using HTML, CSS, and vanilla JS.

Challenges I ran into

Like seriously, this hack was far from easy. I'd safely say creating this, however, it may be perceived was a lot of challenge for me. I began with writing Jina backend using their Hub executors, but some error or the other kept rearing its ugly head which in itself put me back precious hours. So, I had to give up on the idea of having a Singular flow. I went with the next logical option, to use cross-modality, however, for that images needs descriptions, so for that I decided to learn captioning images and created a simple captioner. The results were very very far from ideal. So, I had to finally decided on having separate flows. I was also facing a lot of troubles getting the flows to work. There was also an unresolved issue, with Chrome getting in my way of making it work like the way I wanted it to. Getting rest to work also wasn't easy

Accomplishments that I am proud of

The fact that everything works

What I learned

I learned a lot about Jina, python, HTML, JS

What's next for PDF-er

Built With

Share this project:

Updates