Team Merlion
Submitted By:
- Fan Wei Liat Egwin (egwin159@hotmail.com)
- Nathanael Wong (nathanaelwong@fas.harvard.edu)
This project repository contains the submission for the CDP Hackathon by Team Merlion. Reproducibility can be obtained by following the instructions below.
The main coding language for this project is Julia. However, the NLP processing was done in Python (which can be called natively in Julia).
Project Initialization
Julia projects can be initialized and installed. You can initialize this project by doing:
@quickactivate cdphackathon
] initialize cdphackathon
Which will install all dependencies and create a Manifest.toml
Installing spaCy
This project requires the Python package spaCy. Activate the Julia Conda environment (check the list of environments), and do the following:
conda install -c conda-forge spacy
python -m spacy download en_core_web_lg
Obtaining Cleaned City List
The list of cities from which data was obtained from by CDP has been cleaned and compiled by relevant functions in cityresort.jl. Uncomment the following three lines in citiesresort.jl:
resortUScitylist("cities/uscitylist.csv")
resortCAcitylist("cities/cacitylist.csv")
cdpcitycsv("Cities_Data_2017-2019_mb2.csv")
However, it must be noted that this automated resorting only is able to find details for 2/3 to 3/4 of all the cities on the list. Details of the longitude/latitude for the remaining 60 cities were found manually.
Keywords
Using the NLP package spaCy, we are able to identify the most common keywords and the frequency of their outputs in regionresort.jl
Tableau Dashboard
The raw Tableau workbook can be found in the submission folder, but a public link can be found [here]
Log in or sign up for Devpost to join the conversation.