Bigger Data,
Integrated Faster
Solution Services Microservices EVL Anonymization EVL Validation EVL Generation

EVL Anonymization Microservice

Why Anonymize Data?
Creating Anonymized data sets based on production data offers several benefits, including: GDPR legal compliance regarding personal information; and the protection of comercially sensitive data from developers, testers, and other outside contractors.
EVL Anonymization Microservice enables fast, automated and cost-effective anonymization of data sets. It can be used for pseudonymization and anonymization of the production data according to GDPR requirements as well as for the protection of commercially sensitive data from developers, testers and other outside contractors.
EVL Microservices are built on top of the core EVL software and retain its flexibility, robustness, high productivity, and ability to read data from various sources; including csv files, databases–Oracle, Teradata, SQL Server, etc–and Hadoop streaming data like Kafka and Flume.


EVL Anonymization   Key Advantages

  • High productivity
  • Custom functions can be easily designed and embedded into the solution
  • Low implementation and operating costs
  • Combination of anonymization techniques: Encryption, Tokenization, Masking, Randomization
EVL Data Anonymization white paper. Function guide and examples.

Download

EVL Anonymization Functions

Method Data Function Description
Masking string str_mask_left(), str_mask_right() str_mask_left(“1234 5678 9012 3456”,4,’X’) -> “XXXX XXXX XXXX 3456” i.e. mask by “X” from left, but keep 4 characters from right
Anonymization string anonymize()1 anonymize(“abcd”,2,8) -> “s8L7df” i.e. returns a string of the length between 2 and 8.
Anonymization numbers anonymize()1 anonymize(573,0,1023) -> 850 i.e. returns an integer between 0 and 1023.
Anonymization date, timestamp anonymize()1 anonymize(date(“2018-05-25”), 1, 6, 15) -> 2019-09-17 i.e. return given date plus/minus 1 year, plus/minus 6 months and plus/minus 15 days
Unique anonymization integral data types anonymize_uniq()2 anonymize_uniq((uint)133) -> 85.189.556 i.e. return uint, but no other than 133 can return 85.189.556, so this mapping is unique
Encryption string encrypt() encrypt(“abcd”) -> “99bd … c4u8” i.e. return encrypted value based on the algorithm and its length
Encryption string decrypt() decrypt(“99bd … c4u8”) -> “abcd” i.e. return decrypted value based on the algorithm and its length
Tokenization string anonymize(str, length(str), length(str)) Tokenzation is actually only specific application of anonymize() function
Hashing string sha256sum() sha256sum(“abcd”) -> “fc4b5fd6 … b801d62c”
Salted hash string sha256sum(str + salt) i.e. simply add a salt and do a checksum. To keep the reasonable length better use anonymize() function.
1 For given value and given salt produces the same output.
2 For given value and given salt produces the same output, but in an unique way, so bijection is guaranteed. Particularly useful for IDs.
One bank needed to provide production data for the development team so the data couldn’t be re-identified by keeping the entity relationships. The source were 100+ tables stored in csv files, SQL Server, Informix and Oracle. The target for the anonymized development data was Oracle database. Customer filled-in one configuration file containing all data definitions and anonymization types and parameters leading to the source files (directories for csv files and connect strings to databases). The EVL anonymization jobs were created automatically and run in parallel batches with great performance: e.g. the anonymization of one file containing 10 million rows took 50 seconds.

EVL Anonymization project

A anonymization project consists of following steps:
  1. unzipping EVL distribution and defining a few variables and paths
  2. filling-in an Excel or CSV file defining source type (e.g. csv, Oracle, …), table or file name and field names and validations functions to be applied
  3. automatic generation of EVL jobs for each entity
  4. running EVL jobs in a batch or individually

Example

Following example shows an implementation of anonymization data for a development and test environment of one banking application.

Set variables:

# Source and target data directories DATA_SOURCE_DIR="/some/path/source" DATA_ANON_DIR="/some/path/anon" # Path to salt export EVL_ANON_SALT_PATH="/some/path/.salt"

Anonymization definition for file TEST:

Src Entity Ord Attr Data type Null Anon type EVL Function Description
csv TEST 1 ID int No ANONYMIZE_UNIQ Unique identifier of the person
csv TEST 2 ACCOUNT int No ANONYMIZE_UNIQ Unique account number
csv TEST 3 RC string anonymize
_rc(IN)
Personal ID (must be Mod 11), custom function is used
csv TEST 4 ST_DATE date
(“%m/%d/%Y”)
ANONYMIZE Start date of the account
csv TEST 5 SCORE decimal(15,2) ANONYMIZE Score of the account holder
csv TEST 6 DESC string ANONYMIZE Description of the account
csv TEST 7 TEXT string Free text - no anonymization

Run:

# generating evl jobs from the config file evl run/generate_jobs.evl # running the anonymization job for an entity “TEST” evl run/anon.test.evl

Example data - one record

Before After
87981042 998451644
178 1716305276
5606206199 5802153599
06/08/2017 07/09/2016
1539.34 741133154.40
Account has been established on another name then changed jZy96jPqkiH8GMYhdj9Ti6O8TdPVQKDciDmd8Nyi
He prefers blue color He prefers blue color

Architecture

Position in DWH Architecture
EVL Anonymization Detail Architecture