Inspiration
During modreg, we wanted some way of quickly looking up old vacancy reports to gauge the trend in the number of vacancies left for a particular mod.
There was hardly any archive of past year vacancy reports (except for 1 reddit thread) and hence the idea for a vacancy report scraper/database and a python bot to query the database
What it does
The frontend is a bot that queries the postgresql database. On the back, pdfs of old vacancy reports are fed through a scraper to generate the relevant tables and stored into the database.
How we built it
- scrape pdfs using tabula
- perform data cleaning on the scraped data
- insert clean data into postgresql database
- write some functions to query the database
- have a python bot invoke these functions
Challenges we ran into
dealing with panda dataframes
Accomplishments that we're proud of
- hosting a postgresql db
- good workflow implemented into the scraper so more vacancy reports can be added into the database as they come.
What we learned
python-telegram-bot, postgresql, data cleaning
What's next for modrekt vacancy reports bot (mvrb)
- we are missing vacancy reports for sem 1!!!
- expand the different ways data can be queried
- move away from text-based to image-based data visualisation for better viewing experience
- perform normalization on the database
If bot not responding
Visit the following URL to wake the bot up, then maybe wait a few seconds https://modrekt-vacancies-bot.herokuapp.com/
Built With
- postgresql
- python
- python-telegram-bot
Log in or sign up for Devpost to join the conversation.