Optum had an interesting challenge that we thought we would be able to have fun thinking up a solution for.
What it does
Standardizes mailing addresses according to USPS standards (Section 2 of their Postal Addressing Standards and then validates them according to a local database of US mailing addresses (which we provide a source for but don't actually use since only samples of the data set are available to us at this time)
For more details about how the project works, see the README.
How I built it
All in Python. A good five or so hours were spent researching possible data sources and brainstorming an implementation.
Challenges I ran into
Keeping the whole thing internal. The hardest part about the project was finding a reliable database(s) or US mailing addresses that did not require any external API calls to access. Our first reaction was to check out OpenStreetMap and download their database for use with our program, however the size of the raw map files scared us off, and the read-only Overpass API they offer is run as a third party service. After that, we discovered openaddresses.io and got something we could start working with, but the data we were able to glean from this database was incomplete.
Finally, we went back to the USPS website and found their ZIP + 4® Product which "contains raw data and does not contain any software", meaning that we would be able to use it and keep our project internal. While we were unable to actually download the database to test with (it requires you fill out and mail/fax a PDF form), we saw that both a sample of the database as well as technical documentation with information on how the data was organized (in the "ZIP+4 Product — Detail Record" section) were available for viewing.
Accomplishments that I'm proud of
Figuring out a good way to parse the input mailing address fields.
What I learned
Keeping things internal is hard work.
What's next for Address Standardizer - HackRuFall2018
Edge case checks. Making sure it works with the actual, huge database.