As we were going through CDP data, we wanted to find what textual data can be analysed by us to provide most value. We felt finding the regions and facilities that are excluded by firms can be a good place to start. The data was in qualitative form and we saw that we could find a way to extract this data and map it very well.

What it does

We take questionnaire data of of Question W.06 and W.06a. Remove duplicates and empty data. Create entities from each sentence using spaCY in python. We then find entities that match entity "GPE" which are given to entities that are location type. Once the "GPE" type location data is found we find the relation of other entities to that entity to give us data on different type of facilities.

How I built it

Using python and spaCy

Challenges I ran into

It was first time for us to use NLP libraries, figuring out the basic concepts and being able to use them in real life.

Accomplishments that I'm proud of

The whole Project

What I learned

Environment data Collection, CDP and NLP

What's next for Challenge 2: Finding excluded regions and facilities

Make more advanced rule based entity matching modals, to extract complex data.

Share this project: