The purpose of this project is to analyze patterns and relationships in a set of city surveys and reveal the most important risks related to water/climate and the most susceptible regions.
What it does
Using a forward-looking measure of climate risk exposure based on textual analysis of city reports, we assess whether climate/water risks (as disclosed to CDP) are important in the development.
How I built it
The process begins in the order of data exploration, data preparation, modeling, and evaluation. In modeling part, we use a transformation method (TF-IDF) to vectorize the corpus and form a Document-Term Matrix (DTM). Since Topic Modeling is an unsupervised algorithm, we obtain a low-rank approximations to matrices. The data exploration will compare the documents in low-dimensional spaces (Document Similarity), locate recurring topics across documents (Topic Modeling), and find relations between terms (Text Synonymity).
Accomplishments that I'm proud of
Dealing with unstructured data and my first time working on the NLP. The NLP processing can be very useful in information retrieval especially if we would like to retrieve certain context that is relevant to our goal. Text mining and natural language processing techniques can be successfully applied to analyze city reports in text format.
What's next for Challenge 2 - Team Rainbow
Combine the available data with water footprints and create a mechanism that would highlight factors responsible for creation of economic water scarcity. With some effort, we can create an expert annotation platform where experts can interactively click on relevant answers in the snippets to convert a completely unsupervised approach to a supervised learning task.