Data validation is a key principle of data engineering, and tools that do this well have the potential to shape the future of the industry and how we derive confidence from data.

What it does

We're contributing a number of custom expectations to the Great Expectations Hackathon:

  1. expectations that valid zip codes are in place
  2. expectations that shape files should either overlap or not
  3. expectations that lines should fall either within or outside of boundaries
  4. expectations that us state and territory codes are valid
  5. Geospatial expectations to see if geometries are valid, overlap, have elevation
  6. expectation to see if model results are fair from a binary model

How we built it

We followed their template and added code that would validate the fields we wanted to by using a combination of existing python libraries and data.

Challenges we ran into

No one on the team has contributed to open source projects before, and so a large challenge was not only coding custom expectations, but ensuring that we were following style guides, contribution guides, and standard procedures.

Accomplishments that we're proud of

_ number of custom expectations added!

What we learned

Open source projects are beautiful and messy.

What's next for Adding suite of custom expectations

Implementing the expectations we've developed into our own ecosystem

Built With

Share this project:


posted an update

Some of the pull requests that are also ours but weren't linked in time:

Log in or sign up for Devpost to join the conversation.