Login Register
Bigger Data,
Integrated Faster
EVL Anonymization EVL Validation EVL Data Generation EVL QVD Writer
FAQ Getting Started with EVL

EVL Anonymization Microservice

Why Anonymize Data?
Creating Anonymized data sets based on production data offers several benefits, including: GDPR legal compliance regarding personal information; and the protection of commercially sensitive data from developers, testers, and other outside contractors.
EVL Anonymization Microservice enables fast, automated and cost-effective anonymization of data sets. It can be used for pseudonymization and anonymization of the production data according to GDPR requirements as well as for the protection of commercially sensitive data from developers, testers and other outside contractors.
EVL Microservices are built on top of the core EVL software and retain its flexibility, robustness, high productivity, and ability to read data from various sources; including CSV files, databases–Oracle, Teradata, SQL Server, etc–and Hadoop streaming data like Kafka.


EVL Anonymization   Key Advantages

  • High productivity
  • Custom functions can be easily designed and embedded into the solution
  • Low implementation and operating costs
  • Combination of anonymization techniques: Encryption, Tokenization, Masking, Randomization

Whitepaper

EVL Anonymization white paper. Function guide and examples.
Download

Documentation

Full EVL Anonymization documentation.
Download

EVL Anonymization Functions

Method Data type Function Description
Masking string str_mask_left(), str_mask_right() str_mask_left(“1234 5678 9012 3456”,4,’X’) -> “XXXX XXXX XXXX 3456”, i.e. mask by “X” from left, but keep 4 characters from right
Random any random(min,max) return random value of given data type from the given range
Randomization date, timestamp randomize() randomize(date("2019-01-01"),5,6,15) returns random value with year 2019 plus/minus 5 years, January plus/minus 6 months and first day in month plus/minus 15 days
Anonymization string anonymize()1 anonymize(“abcd”,2,8) -> “s8L7df”, i.e. returns a string of the length between 2 and 8
Anonymization numbers anonymize()1 anonymize(573,0,1023) -> 850, i.e. returns an integer between 0 and 1023
Anonymization date, timestamp anonymize()1 anonymize(date(“2018-05-25”), 1, 6, 15) -> 2019-09-17, i.e. return given date plus/minus 1 year, plus/minus 6 months and plus/minus 15 days
Unique anonymization integral data types anonymize_uniq()2 anonymize_uniq((uint)133) -> 85.189.556 i.e. return uint, but no other than 133 can return 85.189.556, so this mapping is unique
Encryption string encrypt() encrypt(“abcd”) -> “99bd … c4u8” i.e. return encrypted value based on the algorithm and its length
Decryption string decrypt() decrypt(“99bd … c4u8”) -> “abcd” i.e. return decrypted value based on the algorithm and its length
Tokenization string anonymize_uniq(str, length(str), length(str)) Tokenzation is actually only specific application of anonymize_uniq() function
Hashing string sha256sum() sha256sum(“abcd”) -> “fc4b5fd6 … b801d62c”
Salted hash string sha256sum(str + salt) i.e. simply add a salt and do a checksum, but to keep the reasonable length better use anonymize() function
1 For given value and given salt produces the same output, but might happen that two different values obtain the same anonymized value.
2 For given value and given salt produces the same output, but in an unique way, so bijection is guaranteed. Particularly useful for IDs.
One bank needed to provide production data for the development team so the data couldn’t be re-identified by keeping the entity relationships. The source were 100+ tables stored in CSV files, SQL Server, Informix and Oracle. The target for the anonymized development data was Oracle database. Customer filled-in one configuration file containing all data definitions and anonymization types and parameters leading to the source files (directories for CSV files and connect strings to databases). The EVL anonymization jobs were created automatically and run in parallel batches with great performance: e.g. the anonymization of one file containing 10 million rows took 50 seconds.

EVL Anonymization project

An anonymization project consists of following steps:
  1. unzipping EVL distribution and defining a few variables and paths
  2. filling-in CSV file defining source type (e.g. CSV, Oracle, …), table or file name and field names and validations functions to be applied
  3. automatic generation of EVL jobs for each entity
  4. running EVL jobs in a batch or individually

Example

Following example shows an implementation of anonymization data for a development and test environment of one banking application.

Set variables:

# Source and target data directories DATA_SOURCE_DIR="/some/path/source" DATA_ANON_DIR="/some/path/anon" # Path to salt export EVL_ANON_SALT_PATH="/some/path/.salt"

Anonymization definition for file TEST:

Src Entity Ord Field name Data type Null Anon type EVL Function Description
FILE TEST 1 ID int No ANONYMIZE_UNIQ Unique identifier of the person
FILE TEST 2 ACCOUNT int No ANONYMIZE_UNIQ Unique account number
FILE TEST 3 RC string anonymize
_rc(IN)
Personal ID (must be Mod 11), custom function is used
FILE TEST 4 ST_DATE date
(“%m/%d/%Y”)
ANONYMIZE Start date of the account
FILE TEST 5 SCORE decimal(15,2) ANONYMIZE Score of the account holder
FILE TEST 6 DESC string ANONYMIZE Description of the account
FILE TEST 7 TEXT string Free text - no anonymization

Run:

# generating evl jobs from the config file evl run/generate_jobs.evl # running the anonymization job for an entity “TEST” evl run/anon.test.evl

Example data - one record

Before After
87981042 998451644
178 1716305276
5606206199 5802153599
06/08/2017 07/09/2016
1539.34 741133154.40
Account has been established on another name then changed jZy96jPqkiH8GMYhdj9Ti6O8TdPVQKDciDmd8Nyi
He prefers blue color He prefers blue color

Architecture

Position in DWH Architecture
EVL Anonymization Detail Architecture