As we were going through CDP data, we wanted to find what textual data can be analysed by us to provide most value. We felt finding the regions and facilities that are excluded by firms can be a good place to start. The data was in qualitative form and we saw that we could find a way to extract this data and map it very well.
What it does
We take questionnaire data of of Question W.06 and W.06a. Remove duplicates and empty data. Create entities from each sentence using spaCY in python. We then find entities that match entity "GPE" which are given to entities that are location type. Once the "GPE" type location data is found we find the relation of other entities to that entity to give us data on different type of facilities.
How I built it
Using python and spaCy
Challenges I ran into
It was first time for us to use NLP libraries, figuring out the basic concepts and being able to use them in real life.
Accomplishments that I'm proud of
The whole Project
What I learned
Environment data Collection, CDP and NLP
What's next for Challenge 2: Finding excluded regions and facilities
Make more advanced rule based entity matching modals, to extract complex data.