Its 11.59pm, you have your IS1103 report due in 1 minute. You barely made it to the end of your report but there's definitely no time left to write your citations. Don't worry, exCITEs is here to save the day (and your grades)!
What it does
exCITEs takes in a text document and generates a list of citations in APA format based on the text, saving you the time and effort needed to cite all your sources one by one. You don't even need the original citation links!
How we built it
We used LDA which is a topic modelling package built on Python to extract keywords from a given text and Selenium & PhantomJS to automate scraping of Google Scholar based on the extracted keywords. We then aggregate the citations and provide it for the user in text format.
Challenges we ran into
Trying to run the Python script on a web server. Google Captchas. Connecting the script with PhantomJS and web server.
Accomplishments that we're proud of
Getting everything to work.
What we learned
- Plenty of useful Python packages.
- Automated web scripts with PhantomJS & Selenium
- Running python and CGI scripts on a web server
- Some basic Natural Language Processing for Topic Modelling
What's next for exCITEs
Add a feature for in-text APA citations. More citation formats. Improving speed and accuracy.