This project repository contains the submission for the CDP Hackathon by Team Merlion. Reproducibility can be obtained by following the instructions below.
The main coding language for this project is Julia. However, the NLP processing was done in Python (which can be called natively in Julia).
Julia projects can be initialized and installed. You can initialize this project by doing:
@quickactivate cdphackathon ] initialize cdphackathon
Which will install all dependencies and create a Manifest.toml
This project requires the Python package spaCy. Activate the Julia Conda environment (check the list of environments), and do the following:
conda install -c conda-forge spacy python -m spacy download en_core_web_lg
Obtaining Cleaned City List
The list of cities from which data was obtained from by CDP has been cleaned and compiled by relevant functions in
cityresort.jl. Uncomment the following three lines in
resortUScitylist("cities/uscitylist.csv") resortCAcitylist("cities/cacitylist.csv") cdpcitycsv("Cities_Data_2017-2019_mb2.csv")
However, it must be noted that this automated resorting only is able to find details for 2/3 to 3/4 of all the cities on the list. Details of the longitude/latitude for the remaining 60 cities were found manually.
Using the NLP package spaCy, we are able to identify the most common keywords and the frequency of their outputs in
The raw Tableau workbook can be found in the
submission folder, but a public link can be found [here]