The typed entity is a kind of categorize data by the type of information it represents. It's widely adopted in real applications, for example, validating email addresses or URL. Great expectations, as the assertion of data type, is in need of such a module. Since I have worked in both data analysis and knowledge graphs, I am highly motivated to incorporate my knowledge into these great expectations!

What it does

All the expectations I created are column value validation expectations. Basically, it makes an assumption on certain column and validate & filter the values. For example, when the dataset is scraped from web, expectation of email address can easily filter out those negative samples.

How we built it

I built each expectations from the following steps:

  • Brainstorm the expectation in need
  • Implement it from scratch or with existing libraries
  • Create test cases and do code linting
  • Submit the PR

Challenges we ran into

For me, the most difficult & interesting part is to figure out the potential need/expectations. I need to think about the user need, refer to standards and create test cases. It's not very technical stuff but important for user experience.

Accomplishments that we're proud of

I am proud of submitting & merged many PRs, and hope that it could contributes to this great project! There are 41 PRs in total and some of them are still under review. I categorize my contributions into the following subjects: .

Networking Stuff (7 PRs)

Numbers (11 PRs)

Stock Market (5 PRs)

Real-world Entities (10 PRs)

Standard Code or Identifier (8 PRs)

What we learned

I have gained valuable experience to submit my first PR to an open-source project. Good communication in code review. Thank you Great Expectations!

What's next for Typed Entity Expectations

I suggest doing detailed specifications and classifying the typed entities into different categories. It will allow users to find what they need more easily.

Built With

Share this project: