My research advisor told me to read 20 research papers by Monday.... on a Thursday. When I started, I saw that although the citations on my assigned paper from earlier contained more than 20, all those citations were published in 2008 and out of date. The CS field, especially machine learning, advances faster than we can keep track of and it isn't enough to search on Google Scholar or the citations from old papers to find related papers in the field.
What it does
You upload a research paper, and using content-based filtering, the web app will give you a list of recommended research papers that are tagged (thanks to Microsoft Cognitive Services!) and given a similarity score. The most similar papers will appear at the top. Also, Microsoft will provide related links to your research paper if those recommended papers aren't enough for you.
How I built it
First I scraped arxiv for recent research papers adn then I fed that into my recommendation model (content-based filtering using nltk and tfdif). From those generated papers in the model, we would create tags for each one. I used Microsoft Cognitive Services to generate the tags and autosuggest, Microsoft App Services to host my app, Azure ML workspace to do some data exploration and experimentation with the recommendation model, Blob to save my uploaded files, and Bing Search to make searches for autogenerated url's. I used Flask to create the web app.
Challenges I ran into
I never made an entire hackathon project on my own, but that had its own benefits: I was able to get experience with the entire stack (usually I just create the ML model) and I can proudly say that I am very comfortable with Flask now! I also got to do a project that is very important to me. As a researcher and aspiring ML engineer, I face the issues that come with having to scour the web for relevant research papers in my field, and this is an app that I think would benefit the machine learning community.
Accomplishments that I'm proud of
Finishing my app! with (some) time to spare :) And finally having Flask click
What I learned
What's next for INeed20SourcesByMonday
More robust models that can tell the similarity between words that are spelled completely diffferent, but have the same meaning.