EVL Validation Microservice

EVL Validate Microservice enables quick, easy-to-use and cost-effective validation of datasets. It is very useful in situations where complex automated testing tools may be too heavy and expensive. Good candidates for the validation tool are ETL or migration projects or quick quality checks of production data.
- Configuration via Excel or CSV files with pre-configuration option based on metadata
- Automatic data type and null values validation
- Setting of other validation criteria for e.g. number intervals, string lengths
- Possibility to add complex validation functions for entity relations
- Separating “wrong” data and logging of rejection reasons
- Setting conditions for breaking the job flow based on percentages, number of rows …
- Fast implementation and rapid change management
- Low implementation and operating costs
EVL Microservices are built on top of the core EVL software and retain its flexibility, robustness, high productivity, and ability to read data from various sources; including csv files, databases–Oracle, Teradata, SQL Server,
etc–and Hadoop streaming data like Kafka.
EVL Validation Functions
Data | Function | Description |
String | String length | Min/max string length |
Any | Null value check | Check nullability of a field |
String | Code Page | Identifying characters with wrong code page |
Date | Date Interval | Setting Min and Max date interval |
Date | Date Format | Identifying non-standard date and time format |
Number | Number Interval | Setting Min and Max interval for integers, floats and decimal |
Specific | Entity relation check | Relations checking between 2 or more attributes |
Specific | Validation function | Calling validation functions for complex conditions |
EVL Validation project
A validation project consists of following steps:- unzipping EVL distribution and defining a few variables and paths
- filling-in an Excel or CSV file defining source type (e.g. csv, Oracle, …), table or file name and field names and validations functions to be applied
- automatic generation of EVL jobs for each entity
- running EVL jobs in a batch or individually
- displaying rejected files containing wrong records and logs
Example
Following example shows an implementation of a very simplified validationSet variables:
# Source and target data directories
DATA_SOURCE_DIR="/some/path/source"
DATA_ANON_DIR="/some/path/validation"
Validation definition for files TEST1 and TEST2:
Src | Entity | Attribute | Datatype | Null | Min | Max | Validation condition | Description |
csv | TEST1 | ID | int | No | 0 | 5000 | Setting number interval | |
csv | TEST1 | ACC | int | No | ID > 500 && ACC > 3000 | Attribute relation | ||
csv | TEST1 | RC | string | check_rc(RC) | Calling custom function | |||
csv | TEST1 | Sex | string | 1 | 1 | substr(RC,2,2)>50 && Sex==”F” | Test of sex validity based on RC | |
ORCL | TEST2 | ID | string | No | String length check | |||
ORCL | TEST2 | Postcode | int | 5 | 5 | Postcode must be 5 digits | ||
ORCL | TEST2 | Text | string | Codepage: ISO-8859-2 | Checking characters codepage |
Run:
# generating evl jobs from the config file
evl run/generate_jobs.evl
# running the validation job for an entity “TEST1”
evl run/validate.test1.evl
# running the validation job for an entity “TEST2”
evl run/validate.test2.evl
# Example of custom function rc_check():
# stol(replace(RC,’/’,’’)) % 11 == 0