Data validation is a key principle of data engineering, and tools that do this well have the potential to shape the future of the industry and how we derive confidence from data.
What it does
We're contributing a number of custom expectations to the Great Expectations Hackathon:
- expectations that valid zip codes are in place
- expectations that shape files should either overlap or not
- expectations that lines should fall either within or outside of boundaries
- expectations that us state and territory codes are valid
- Geospatial expectations to see if geometries are valid, overlap, have elevation
- expectation to see if model results are fair from a binary model
How we built it
We followed their template and added code that would validate the fields we wanted to by using a combination of existing python libraries and data.
Challenges we ran into
No one on the team has contributed to open source projects before, and so a large challenge was not only coding custom expectations, but ensuring that we were following style guides, contribution guides, and standard procedures.
Accomplishments that we're proud of
_ number of custom expectations added!
What we learned
Open source projects are beautiful and messy.
What's next for Adding suite of custom expectations
Implementing the expectations we've developed into our own ecosystem