Inspiration
The typed entity is a kind of categorize data by the type of information it represents. It's widely adopted in real applications, for example, validating email addresses or URL. Great expectations, as the assertion of data type, is in need of such a module. Since I have worked in both data analysis and knowledge graphs, I am highly motivated to incorporate my knowledge into these great expectations!
What it does
All the expectations I created are column value validation expectations. Basically, it makes an assumption on certain column and validate & filter the values. For example, when the dataset is scraped from web, expectation of email address can easily filter out those negative samples.
How we built it
I built each expectations from the following steps:
- Brainstorm the expectation in need
- Implement it from scratch or with existing libraries
- Create test cases and do code linting
- Submit the PR
Challenges we ran into
For me, the most difficult & interesting part is to figure out the potential need/expectations. I need to think about the user need, refer to standards and create test cases. It's not very technical stuff but important for user experience.
Accomplishments that we're proud of
I am proud of submitting & merged many PRs, and hope that it could contributes to this great project! There are 41 PRs in total and some of them are still under review. I categorize my contributions into the following subjects: .
Networking Stuff (7 PRs)
Numbers (11 PRs)
- Hex Color Code
- Roman Numeral
- Semiprime Number
- Pronic Number
- Sphenic Number
- Square Free Number
- Powerful Number
- MD5 Hash
- SHA-1 Hash
- Base32
- Base64
Stock Market (5 PRs)
- Dow Jones Stock Ticker
- Nasdaq Stock Ticker
- SP500 Stock Ticker
- Cryptocurrency Ticker
- Ethereum Address
Real-world Entities (10 PRs)
- IBAN Number
- ISO-3166 Country name
- Phone number
- Price
- Social Security Number
- Temperature
- Leap Year
- Hash Tag
- MBTI Code
- IMDB ID
Standard Code or Identifier (8 PRs)
What we learned
I have gained valuable experience to submit my first PR to an open-source project. Good communication in code review. Thank you Great Expectations!
What's next for Typed Entity Expectations
I suggest doing detailed specifications and classifying the typed entities into different categories. It will allow users to find what they need more easily.
Log in or sign up for Devpost to join the conversation.