Microservices Overview Microservices Overview EVL Anonymization EVL Anonymization EVL Validation EVL Validation EVL Data Generation EVL Data Generation QVD Utils QVD Utils EVL Manager
Omlouváme se, tato stránka ještě není dostupná v českém jazyce.

Anonymization Terms and Techniques

Jul 15, 2019 - Jan Štěpnička

Generally anonymization means conversion of personal data into anonymized data by using various anonymization techniques. The EU GDPR regulation lays down rules relating to the protection of natural persons with regard to the processing of personal data quite clearly. All companies dealing with personal data of EU citizens have to respect those rules.

Anonymization terms and properties

Pseudonymization – GDPR defines the term pseudonymization: the anonymized personal data cannot be attributed to a specific data subject without the use of additional information. It means there is still a possibility to re-identify the original data from the additional data that should be kept separately.

Reversibility – if it is possible to get the original value or at least how difficult it is. This is something different from pseudonymization, because here is supposed you don’t have the (secret) additional information. In other words: if it is possible to break the algorithm.

Repeatability – same value will be anonymized to the same anonymized value again. I.e. rerun of anonymization will always produce the same result. Or some value will be replaced in different tables always to the same anonymized number. This is usually very important to anonymize IDs.

Uniqueness – two different original values will be anonymized into two different anonymized values. In other words such anonymization is bijective function, i.e. one-to-one correspondence. This is quite important to anonymize IDs for example.

Preserve data type – anonymize values keeps the data type. So an integer for example cannot be anonymized to a string or the anonymized timestamp must be again valid timestamp.

Preserve length – anonymized values are of the same length or keeps the maximal length of given data type. This is very important for anonymization for testing purposes.

Salt is an arbitrary additional string or value added to given data, usually before doing checksum. It is necessary for example when creating hash of some short or somehow bounded value, e.g. phone number. By the information that some hash is made by sha256 from a phone number, one can use sha256 to produce translation table with all possible values. But when some salt, like string "M8SKC7WOFP975WUS", is added to phone number, then without knowing this salt, usual checksum sha256 is not possible to revert.

Choosing the appropriate anonymization approach and techniques highly depends on the purpose and type of processing of personal data e.g. running production systems, archiving or providing data to partners or development teams. The responsibility lies on the Data controller (the natural or legal person, public authority, agency ...) who determines the purposes and means of the processing of personal data.

Anonymization Techniques

Scrambling – means permutation of letters. But quite often it is possible to revert the original data. Example:

Peter Sellers ---> Teepr Resells

Shuffling – permutes values within a whole column. Example of shuffling ID:

idname idname
2Pierre Richard 5Pierre Richard
3Richard Matthew Stallman  →   2Richard Matthew Stallman
5Donald E. Knuth 3Donald E. Knuth

In general this technique is not repeatable, but it is bijection.

Randomization – simply replace the original value by any random one. Example:

1st run Michael Raynolds ---> 5dtZ4twxx7896avkf78ad+0p 2nd run Michael Raynolds ---> 6shk8t9we6fgos7rthj98d

It is clear that randomization is not repeatable and also not bijective function.

Encryption – uses a key to encrypt the original value. Then such key must be kept secret or can be deleted immediately, depends on the purpose, if we’d like to be able to decrypt the data or not.

Masking – allows an important/unique part of the data to be hidden with random characters or other data. For example a credit card number:

9370 4442 9037 4197 ---> **** **** **** 4197

The advantage of masking is the ability to identify data without manipulating actual identities.

Tokenization – replaces sensitive data with non-sensitive substitutes, referred to as tokens, and usually stored in some secret mapping table. Tokenization keeps the data type and usually also length of data, so it can be processed by legacy systems that are sensitive to data length and type. That is achieved by keeping specific data fully or partially visible for processing and analytics while sensitive information is kept hidden.

Table with overview of anonymization techniques

technique property
revertible uniqueness pseudo-
repeatable preserve
data type
scrambling often yes not yes depends on
yes yes
shuffling not yes not depends on
yes yes
randomization not not not not yes yes
encryption not *) not yes, but that’s
the purpose here
yes not not
masking not not not yes mostly yes yes
checksum often yes almost yes yes yes not not
salted checksum not *) almost yes not yes not not
tokenization not *) depends on
yes yes yes yes
data type preserving
not *) almost yes not yes yes yes
data type preserving
unique anonymization
not *) yes not yes yes yes

*) Additional information must be kept somewhere secretly. Either permanently or temporarily. Like a token table, an encryption key or a salt.

EVL Tool version 1.3 is out!

Mar 20, 2019 - Jan Štěpnička

After successful version 1.2 EVL tool brings its users new features in the newest 1.3 version. What does it bring?

  • Read/write XML
  • Sample data generator
  • New string manipulation functions
  • New components:
    • Generate – generate records
    • Readxml
    • Writexml

Interested? You can get this latest version from GitHub.

And be ready, 1.4 is coming soon!

EVL Tool goes to Silicon Valley

Feb 14, 2018 - Petr Horčička

What would you say to a three month stay in a foreign country to develop business and gain new experiences? We say yes – thanks to a chance from Czech Accelerator. The acceleration program offers a three month incubator full of mentoring, counseling, workshops, networking events, and other services. This is a chance to kick off one’s business while gaining invaluable experiences.

EVL Tool, as well as other projects, applied for this unique opportunity in Silicon Valley, and after a demanding and critical jury evaluation, our team was chosen to take part in this program. Starting mid-April, we will be developing our potential in California for three months. We believe it will expand our horizons and bring us a lot of useful insights and skills. Our team is really excited and cannot wait to share this experience with you. During our stay, we will regularly inform you about EVL Tool news. Stay with us!