Inspiration

We were inspired by the Melissa workshop!

What it does

The project cleans dirty address data and then defines a function that takes in a row number that corresponds to an address to search. The function then returns a dataframe of the most likely addresses it could be.

How we built it

We used parsing for our cleaning and then indexing to remove addresses our input could not be. We also used a jellyfish package to look at string similarity to order the most likely address match.

Challenges we ran into

We first began with using a machine learning model, but quickly realized it was too complicated. We also had to generalize our findings from identifying an exact address, to the the most likely addresses an input could be. This is comparable to any search engine that returns a list of the most related items.

Accomplishments that we're proud of

The parsing methods used were very cool and we learned about some awesome packages. We also were very excited about the jellyfish package to find the similarity between strings.

What we learned

We learned that when on a time crunch, you may not be able to reach the ultimate goal, but can still produce something of value!

What's next for Conquering the Impossible!

Fine tuning the cleaning of the data and filtering on more parameters to get closer address results.

Built With

  • cocalc
  • deepnote
  • jupyternotebook
  • python
Share this project:

Updates