Inspiration and What it is
One of the most tiring tasks of student life is to learn and read from a PDF document. It is boring, tiring and students spend 90% of their time looking for something! How about an assistant that reads the PDF document and answers your questions? This assistant would also give you the page where the answer lies. In the world of artificial systems, we should not be reliant on keyword searches anymore.
How I built it
We are using Wikipedia Open Domain Retriever to shortlist the page which could contain the answer. It uses Term Frequency - Inverse Document Frequency (and then cosine similarity) between the query and document to shortlist the page. Once we have the page, we have pi and pj variables that help us figure out the span of the answer. The query and the document is converted into vectors using Glove tokenizer. We are using a classifier for pi and pj (initial offset and final offset for spans) and that classifier (pytorch) is trained on SQuaD dataset. The classifier returns the offset values of span that results into the answer.
Challenges I ran into
-- Training the model on cloud server -- Debugging -- Pdf extraction and split and upload -- API management -- PDF in a iframe. (PDF reader)
Accomplishments that I'm proud of
-- Training the model
What's next for Interactive Reader Bot
-- Extend it as a free service to students / corporates alike.