Skip to main content
Version: 2.7

EVM-Mappings

14 EVM Mappings

Important: Any C++ functions can be used in EVL mapping. Many of the following EVL functions only helps handling ‘nullptr’, which represents NULL values.

All the functions are further sorted by name, so for better orientation here is an overview by usage groups.

Mapping Functions

Important: There are several special characters in an EVD field name which must be handled different way in EVM mapping:

Number at the beginning

When the field name starts with a number then in the mapping must be used prefixed by underscore. E.g. field 01_bill_type would be referenced in mapping as _01_bill_type.

Non-alphanumeric characters

All non-alphanumeric characters in field name have to be referenced in mapping as underscore. E.g. field $bill type (9) would be referenced in mapping as _bill_type__9_.


14.1 Output Functions

There are several functions which can modify standard component behaviour regarding output of each record.

14.1.1 discard and reject

Using ‘discard()’ simply doesn’t output current record anywhere. (It has better performance than output to /dev/null, i.e. using Trash.)

But then it doesn’t end processing the mapping, so mostly the use would be:

discard(); return;

which immediately ends up the current mapping and iterate to another record.

Compare to ‘discard()’, ‘reject()’ function redirects input (with input evd) into specified output following way:

reject();                    // redirect input record to reject port
reject(6); // redirect input record to output /dev/fd/6
reject("out.csv"); // redirect input record to file "out.csv",
// if such exists, will be overwritten
reject("out.csv", open_mode::overwrite); // same as previous
reject("out.csv", open_mode::append); // same as previous,
// but append, not overwrite
reject("out.csv", open_mode::create); // redirect input to "out.csv",
// but fail if such exists

Function headers:

void discard() const;

void reject() const;
void reject(const int file_descriptor) const;
void reject(const char* const path, \
const open_mode mode = open_mode::overwrite);
void reject(const std::string& path, \
const open_mode mode = open_mode::overwrite);

And variants for join mapping:

void reject_left(const char* const path, \
const open_mode mode = open_mode::overwrite);
void reject_right(const char* const path, \
const open_mode mode = open_mode::overwrite);
void reject_left(const std::string& path, \
const open_mode mode = open_mode::overwrite);
void reject_right(const std::string& path, \
const open_mode mode = open_mode::overwrite);

14.1.2 add_record and output

Both these functions produce records with the output evd. Example first:

add_record();                 // add new record to stdout
add_record(4); // add new record to output /dev/fd/4
add_record("out.csv"); // add new record to file "out.csv",
// if such exists, will be overwritten
add_record("out.csv", open_mode::overwrite); // same as previous
add_record("out.csv", open_mode::append); // same as previous,
// but append, not overwrite
add_record("out.csv", open_mode::create); // add record to "out.csv",
// but fail if such exists

Function ‘add_record()’ sends current output record to the specified output and continue in processing of the mapping.

Function ‘output()’ behave the same way as ‘add_record()’, but doesn’t produce additional record, only redirect the current output record.

The ‘open_mode’ is taken only from the very first call of the function, the others are ignored and the file is still open for writing in the same mode. So for example calling in some mapping

output("out.csv")

will delete file out.csv if it exists and starts to append every output record, keeping this file open.

Functions headers:

void add_record() const;
void add_record(const int file_descriptor) const;
void add_record(const char* const path, \
const open_mode mode = open_mode::overwrite) const;
void add_record(const std::string& path, \
const open_mode mode = open_mode::overwrite) const;

void output(const int file_descriptor = 4);
void output(const char* const path, \
const open_mode mode = open_mode::overwrite);
void output(const std::string& path, \
const open_mode mode = open_mode::overwrite);

Note:output(-1)’ is the same as ‘discard()’.

14.1.3 unmatched_left, unmatched_right

In Join component there these two functions which catch unmatched left and/or right join.

Usual example is for Join of --type left:

out->field = left->field;
if (!right) unmatched_left();

It means, that to the output goes inner joined records and to the --left-unmatched port goes not-joined records from left.

Function headers:

void reject_left(const int output = 7);
void reject_right(const int output = 7);

14.1.4 reject_left, reject_right

Actually similar to ‘unmatched_left/right()’ functions, just redirect input record (with input evd).

Function headers:

void reject_left(const int output = 7);
void reject_right(const int output = 7);

14.1.5 warn and fail

Function ‘warn()’ add a warning message to the standard error of the job, but the mapping continue processing the input records. To terminate the mapping and let the job fail, use function ‘fail()’.

if (!in->name) fail("Name is missing and is mandatory.");
if (!in->email) warn("Careful, e-mail is missing for: " + *in->name);

14.2 String Functions

All string manipulation functions can be used in two ways:

  • with pointers (preferred)
  • without pointers (i.e. as referenced values, “with star”)

Option with pointers is preferred as it can handle NULL values (‘nullptr’ in fact). So these two examples:

out->field  = str_function(in->field);
*out->field = str_function(*in->field);

are basically the same, but the second one will fail in case ‘in->field’ will be NULL (i.e. ‘nullptr’).

There are these two rules in all string manipulation functions described in this section:

  • When the first argument is a pointer, the function returns also a pointer.
  • When the first argument is ‘nullptr’, the function returns ‘nullptr’ as well.

14.2.1 length

(since EVL 2.0)

Returns the length of given string.

For ‘nullptr’ it returns again ‘nullptr’.

Example:

length((string)"Some text")     // return 9
length(nullptr) // return nullptr

In mapping it might look like this (without pointers):

out->str_len = length(in->first_name);

14.2.2 split

(since EVL 1.3)

Example:

split("Some text, another text.", ' ')
// returns vector ["Some", "text,", ‘another’, "text."]

When the first argument is ‘nullptr’, it returns ‘nullptr’.

In mapping it might look like this (without pointers):

static std::vector<std::string> name_vec;

name_vec = split(*in->full_name", ' ');
*out->first_name = name_vec[0];
*out->last_name = name_vec[1];

or (preferably) using pointers:

static std::vector<std::string*>* name_vec;

name_vec = split(in->full_name", ' ');
out->first_name = name_vec[0];
out->last_name = name_vec[1];

Function headers:

std::vector<std::string>   split(const std::string& str, \
const char delimiter);
std::vector<std::string*>* split(const std::string* const str, \
const char delimiter);

14.2.3 starts_with, ends_with

(since EVL 2.0)

True if a string starts or ends with the given substring.

When the first argument is ‘nullptr’, it returns False.

Example:

starts_with("Some text", "Some")   // return True
starts_with("Some text", "x") // return False
starts_with(nullptr, "x") // return False
ends_with("Some text", "ext") // return True
ends_with("Some text", "x") // return False

In mapping it might look like this:

*out->test_field = starts_with(in->test_field ? "OK" : "NOK" ;

Function headers:

bool starts_with(const std::string& str, const char* const prefix);
bool starts_with(const std::string* const str, const char* const prefix);
bool starts_with(const std::string& str, const std::string& prefix);
bool starts_with(const std::string* const str, const std::string& prefix);
bool ends_with(const std::string& str, const char* const suffix);
bool ends_with(const std::string* const str, const char* const suffix);
bool ends_with(const std::string& str, const std::string& suffix);
bool ends_with(const std::string* const str, const std::string& suffix);

14.2.4 str_compress, str_uncompress

(since EVL 2.0)

Compress/uncompress the given string. Examples which return pointers:

str_compress(in->string_field_to_compress)       // snappy by default
str_compress(in->string_field_to_compress, compression::gzip)
str_compress(in->snappy_field) // snappy by default
str_compress(in->gzipped_field, compression::gzip)

Examples which return string values:

str_compress(*in->string_field_to_compress)      // snappy by default
str_compress(*in->string_field_to_compress, compression::gzip)
str_compress(*in->snappy_field) // snappy by default
str_compress(*in->gzipped_field, compression::gzip)

When the first argument is ‘nullptr’, it returns ‘nullptr’.

In mapping it might look like this:

out->gzipped_field = str_compress(in->string_field);

Function headers:

std::string str_compress(const std::string& str, \
const compression method = compression::snappy);
std::string* str_compress(const std::string* const str, \
const compression method = compression::snappy);
std::string str_uncompress(const std::string& str, \
const compression method = compression::snappy);
std::string* str_uncompress(const std::string* const str, \
const compression method = compression::snappy);

14.2.5 str_count

(since EVL 1.3)

It counts the number of occurrences of given string or character. Example:

str_count("Some text, another text.", ' ')     // returns 3
str_count("Some text, another text.", "text") // returns 2

When the first argument is ‘nullptr’, it returns ‘nullptr’.

In mapping it might look like this (using pointers):

out->jan_cnt  = str_count(in->first_name", "Jan");

or without pointers:

*out->jan_cnt = str_count(*in->first_name", "Jan");

Function headers:

std::size_t  str_count(const std::string& str, const char ch);
std::size_t* str_count(const std::string* const str, const char ch);
std::size_t str_count(const std::string& str, const char* const substr);
std::size_t* str_count(const std::string* const str, \
const char* const substr);
std::size_t str_count(const std::string& str, const std::string& substr);
std::size_t* str_count(const std::string* const str, \
const std::string& substr);

14.2.6 str_index, str_rindex

(since EVL 2.0)

str_index(str,substr)
it returns the index (counted from 0) of the first occurrence of the given substring,

str_rindex(str,substr)
it returns the index (counted from 0) of the last occurrence of the given substring.

When no match, then ‘-1’ is returned.

When the string is ‘nullptr’, it returns ‘nullptr’.

Examples:

str_index("Some text text", "text")   // return 5
str_index("Some text text", "xyz") // return -1
str_index(nullptr, 'x') // return nullptr
str_rindex("Some text text", "text") // return 10

Function headers:

std::int64_t  str_index(const std::string& str, const char* const substr);
std::int64_t* str_index(const std::string* const str, \
const char* const substr);
std::int64_t str_index(const std::string& str, const std::string& substr);
std::int64_t* str_index(const std::string* const str, \
const std::string& substr);
std::int64_t  str_rindex(const std::string& str, const char* const substr);
std::int64_t* str_rindex(const std::string* const str, \
const char* const substr);
std::int64_t str_rindex(const std::string& str, const std::string& substr);
std::int64_t* str_rindex(const std::string* const str, \
const std::string& substr);

14.2.7 str_join

(since EVL 2.4)

str_join(vector_of_strings,delimiter)
it returns the string of concatenated vector members, delimited by a specified delimiter.

When the vector is ‘nullptr’, it returns ‘nullptr’.

Examples of a mapping:

static std::vector<std::string> x{"Here", "is", "a", "hardcoded", "vector."};

*out->x_spaced = str_join(x,' ') // return "Here is a hardcoded vector."
*out->x_dashed = str_join(x,'-') // return "Here-is-a-hardcoded-vector."
*out->x_longer = str_join(x,"---") // return "Here---is---a---hardcoded---vector."

Function headers:

std::string  str_join(const std::vector<std::string>& strings, \
const char delimiter);
std::string* str_join(const std::vector<std::string*>* strings, \
const char delimiter);
std::string str_join(const std::vector<std::string>& strings, \
const std::string_view delimiter);
std::string* str_join(const std::vector<std::string*>* strings, \
const std::string_view delimiter);

14.2.8 str_mask_left, str_mask_right

(since EVL 2.1)

Functions return string with visible characters replaced by given character from given direction, but keep the specified number of character unchanged.

Example:

str_mask_left("abcd  text efgh", 6)   // returns "abcd  tex* ****"
str_mask_right("1234567890", 3, '-') // returns "---4567890"

Without the second argument, asterisk ‘*’ is assumed.

When the first argument is ‘nullptr’, these functions return ‘nullptr’.

Function headers:

std::string  str_mask_left(const std::string& str, \
const std::size_t keep, const char ch = '*');
std::string* str_mask_left(const std::string* const str, \
const std::size_t keep, const char ch = '*');
std::string str_mask_right(const std::string& str, \
const std::size_t keep, const char ch = '*');
std::string* str_mask_right(const std::string* const str, \
const std::size_t keep, const char ch = '*');

14.2.9 str_pad_left, str_pad_right

(since EVL 2.1)

Add from left/right the specified character (space by default), up to the given length. It counts Bytes, not characters, so be careful with multibyte encodings.

Example:

str_pad_left("123",7,'0')     // returns "0000123"
str_pad_right("text",7) // returns "text "
str_pad_right("text",2) // returns "text"
str_pad_left("Groß",6,'*') // returns "*Groß" as "ß" has 2 Bytes

When the first argument is ‘nullptr’, these functions return ‘nullptr’.

Function headers:

std::string  str_pad_left(const std::string& str, \
const std::size_t length, const char ch = ' ');
std::string* str_pad_left(const std::string* const str, \
const std::size_t length, const char ch = ' ');
std::string str_pad_right(const std::string& str, \
const std::size_t length, const char ch = ' ');
std::string* str_pad_right(const std::string* const str, \
const std::size_t length, const char ch = ' ');

14.2.10 str_replace

(since EVL 1.3)

Examples:

str_replace("Some text", ' ', '-')        // returns "Some-text"
str_replace("Some text", "Some", "Any") // returns "Any text"
str_replace("Some text", ' ', "SPACE") // returns "SomeSPACEtext"

When the first argument is ‘nullptr’, it returns ‘nullptr’.

In mapping it might look like this:

out->name = str_replace(in->name", ' ', '-');

Function headers:

std::string  str_replace(const std::string& str, \
const char old_ch, const char new_ch);
std::string* str_replace(const std::string* const str, \
const char old_ch, const char new_ch);
std::string str_replace(const std::string& str, \
const char* const old_substr, const char* const new_substr);
std::string* str_replace(const std::string* const str, \
const char* const old_substr, const char* const new_substr);
std::string str_replace(const std::string& str, \
const std::string& old_substr, const std::string& new_substr);
std::string* str_replace(const std::string* const str, \
const std::string& old_substr, const std::string& new_substr);

14.2.11 str_index, str_rindex

(since EVL 2.0)

str_index(str,substr)
it returns the index (counted from 0) of the first occurrence of the given substring,

str_rindex(str,substr)
it returns the index (counted from 0) of the last occurrence of the given substring.

When no match, then ‘-1’ is returned.

When the string is ‘nullptr’, it returns ‘nullptr’.

Examples:

str_index("Some text text", "text")   // return 5
str_index("Some text text", "xyz") // return -1
str_index(nullptr, 'x') // return nullptr
str_rindex("Some text text", "text") // return 10

Function headers:

std::int64_t  str_index(const std::string& str, const char* const substr);
std::int64_t* str_index(const std::string* const str, \
const char* const substr);
std::int64_t str_index(const std::string& str, const std::string& substr);
std::int64_t* str_index(const std::string* const str, \
const std::string& substr);
std::int64_t  str_rindex(const std::string& str, const char* const substr);
std::int64_t* str_rindex(const std::string* const str, \
const char* const substr);
std::int64_t str_rindex(const std::string& str, const std::string& substr);
std::int64_t* str_rindex(const std::string* const str, \
const std::string& substr);

14.2.12 str_to_base64, base64_to_str

(since EVL 2.6)

Encode/decode string to/from Base64 form.

When the first argument is ‘nullptr’, it returns also ‘nullptr’.

Examples:

str_to_base64("Some\r\nbíňářý text.")   // return "U29tZQ0KYsOtxYjDocWZw70gdGV4dC4="
base64_to_str("U29tZQ0KYsOtxYjDocWZw70gdGV4dC4=") // return "Some\r\nbíňářý text."

Function headers:

std::string  str_to_base64(const std::string& str);
std::string* str_to_base64(const std::string* const str);
std::string base64_to_str(const std::string& str);
std::string* base64_to_str(const std::string* const str);

14.2.13 str_to_hex, hex_to_str

(since EVL 2.0)

Convert string or ustring to its hexadecimal representation and vice versa. (Ustring support has been added in EVL v2.6.)

When the first argument is ‘nullptr’, it returns also ‘nullptr’.

Examples:

str_to_hex("Some text")            // return "536f6d652074657874"
hex_to_str("536f6d652074657874") // return "Some text"

Function headers:

std::string  str_to_hex(const std::string& str);
std::string* str_to_hex(const std::string* const str);
ustring str_to_hex(const __detail::u16str& str);
ustring* str_to_hex(const ustring* const str);
std::string hex_to_str(const std::string& str);
std::string* hex_to_str(const std::string* const str);
ustring hex_to_str(const __detail::u16str& str);
ustring* hex_to_str(const ustring* const str);

14.2.14 str_compress, str_uncompress

(since EVL 2.0)

Compress/uncompress the given string. Examples which return pointers:

str_compress(in->string_field_to_compress)       // snappy by default
str_compress(in->string_field_to_compress, compression::gzip)
str_compress(in->snappy_field) // snappy by default
str_compress(in->gzipped_field, compression::gzip)

Examples which return string values:

str_compress(*in->string_field_to_compress)      // snappy by default
str_compress(*in->string_field_to_compress, compression::gzip)
str_compress(*in->snappy_field) // snappy by default
str_compress(*in->gzipped_field, compression::gzip)

When the first argument is ‘nullptr’, it returns ‘nullptr’.

In mapping it might look like this:

out->gzipped_field = str_compress(in->string_field);

Function headers:

std::string str_compress(const std::string& str, \
const compression method = compression::snappy);
std::string* str_compress(const std::string* const str, \
const compression method = compression::snappy);
std::string str_uncompress(const std::string& str, \
const compression method = compression::snappy);
std::string* str_uncompress(const std::string* const str, \
const compression method = compression::snappy);

14.2.15 substr

(since EVL 2.0)

Return a substring starting after given position with the specified length.

Example:

substr("123456789",0,2)      // returns "12"
substr("123456789",6) // returns "789"

Without the third argument, it returns the rest of the string.

When the first argument is ‘nullptr’, function returns ‘nullptr’.

Function headers:

std::string  substr(const std::string& str, const std::size_t pos = 0,
const std::int64_t count = std::numeric_limits<std::int64_t>::max());
std::string* substr(const std::string* const str, const std::size_t pos = 0,
const std::int64_t count = std::numeric_limits<std::int64_t>::max());

14.2.16 trim, trim_left, trim_right

(since EVL 1.0)

Example:

trim("  text ")              // returns "text"
trim_left(" text ") // returns "text "
trim_right("--text---", '-') // returns "--text"

Trim character ‘char’ from both sides, from left, from right, respectively. Without the second argument, space is assumed.

When the first argument is ‘nullptr’, these functions return ‘nullptr’.

Function headers:

std::string  trim(const std::string& str, const char ch = ' ');
std::string* trim(const std::string* const str, const char ch = ' ');

std::string trim_left(const std::string& str, const char ch = ' ');
std::string* trim_left(const std::string* const str, const char ch = ' ');

std::string trim_right(const std::string& str, const char ch = ' ');
std::string* trim_right(const std::string* const str, const char ch = ' ');

14.2.17 uppercase, lowercase

(since EVL 1.0)

Examples:

uppercase("AbCd")   // returns "ABCD"
lowercase("AbCd") // returns "abcd"

When the argument is ‘nullptr’, these functions return ‘nullptr’.

Without specifying the second parameter it acts only on ‘A-Z’ and ‘a-z’.

When there is a need to acts also on national letters (with diacritics for example), there can be the second parameter specified with the locale:

static std::locale de_locale("de_DE.utf8");
*out->field_upcase = uppercase(*in->field, de_locale);

It is possible to specify the locale in the function as string, but using the static specification of locale is recommended due to performance.

Function headers:

std::string  uppercase(const std::string& str);
std::string* uppercase(const std::string* const str);
std::string uppercase(const std::string& str, const std::locale& locale);
std::string* uppercase(const std::string* const str, const std::locale& locale);

std::string lowercase(const std::string& str);
std::string* lowercase(const std::string* const str);
std::string lowercase(const std::string& str, const std::locale& locale);
std::string* lowercase(const std::string* const str, const std::locale& locale);

14.3 Date and Time Functions

get_millisecond(timestamp*) get_microsecond(timestamp*) get_nanosecond(timestamp*) timezone_shift()

Difference:

auto diff = dt - datetime(2018,5,31,19,36,57); // 61 (seconds)
auto diff = d - date("2017-04-02"); // -6 (days)

Let’s summarize the logic:

date - int  => date          datetime - int      => datetime
date - date => int datetime - datetime => int

14.4 Randomization Functions

For randomization functions are used same rules regarding ‘nullptr’ as for string functions.

randomize()

(since EVL 2.1)

Examples:

// random int from whole int range
out->random_int = randomize(in->value);
// random int from interval < value - 1000 , value + 2000 >
out->random_int_range = randomize(in->value,-1000,2000);

random_int()
random_long()
random_short()
random_char()

(since EVL 2.1)

Examples:

// random value from whole int range
out->random_value = random_int();
// random value from interval <1000,2000>
out->random_range = random_int(1000,2000);

random_float()
random_double()

(since EVL 2.1)

Examples:

// random value from whole float range
out->random_value = random_float();
// random float value from interval <1000,2000>
out->random_range = random_float(1000,2000);

random_decimal()

(since EVL 2.1)

Examples:

// random value from whole decimal range
out->random_value = random_decimal();
// random float value from interval <1000,2000>
out->random_range = random_decimal(1000,2000);

random_date()
random_datetime()
random_timestamp()

(since EVL 2.1)

Examples:

// random date between 1970-01-01 and 2069-12-31
out->random_value = random_date();
// random date from this century
out->random_range = random_date(date("2000-01-01"), date("2099-12-31"));

random_string()

(since EVL 2.1)

Examples:

// random string of length between 0 and 10
out->random_value = random_string();
// random string of length 5
out->random_range = random_string(5,5);

14.5 Anonymization Functions

For all anonymization functions there are again the same rules as for string functions, i.e.:

  • when the argument is ‘nullptr’, it returns again ‘nullptr’;
  • when the (first) argument is ‘pointer’, it returns again ‘pointer’.

14.5.1 anonymize

anonymize(str, keep_chars, keep_char_class = false)
anonymize(str, min_length, max_length)

(since EVL 2.1)

First argument ‘str’ is mandatory and is of data type ‘string’ or ‘ustring’. The function then returns such data type as well.

Parameter ‘keep_chars’ is a string of characters which should be kept as is, i.e. such characters are not anonymized. Mostly it makes sense to use a space here, but for example to anonymize an email you can specify "@.". For ‘ustring’ input it must be ‘ustring’ as well, so for an email example u"@."

When parameter ‘keep_chars_class’ is ‘true’, then capital letters will be again capitals, lowercase letters stay lowercased and numbers will be numbers again.

Arguments ‘min_length, max_length’ says how long the result could be. When no ‘min_length, max_length’ parameters are used, then it returns a string or ustring of the same length as input.

Mapping examples:

out->anonymized_name = anonymize(in->name);
// "Mircea Eliade" -> "icDoudVhaXYll" (same length)

out->anonymized_name = anonymize(in->name, " ");
// "Mircea Eliade" -> "kJsqzt ZhGFts" (keep space)

out->anonymized_name = anonymize(in->name, " Maeiou");
// "Mircea Eliade" -> "Misqea Jhiade" (keep also letters M,a,e,i,o,u)

out->anonymized_name = anonymize(in->name, " ", true);
// "5 Mircea Eliade" -> "9 Piosdf Kiudpp" (keep space and char class)

out->anonymized_name = anonymize(in->name, 2, 10);
// "Mircea Eliade" -> "jTro" (length between 2 and 10)
// "Franz Kafka" -> "ksgTzDhoQf" (length between 2 and 10)

out->anonymized_name = anonymize(in->name, 0, length(in->name));
// "Mircea Eliade" -> "lkdUuZytSd"
// "Franz Kafka" -> "" // might be a NULL if 'name' is nullable

anonymize(ustr, locale, keep_chars, keep_char_class = false)
anonymize(ustr, locale, min_length, max_length)

(since EVL 2.5)

First argument ‘ustr’ is mandatory and is of data type ‘ustring’. The function returns such data type as well.

Arguments ‘keep_chars’, ‘keep_chars_class’ and ‘min_length, max_length’ are the same as for previous variant of the function. Just ‘keep_chars’ must be of ustring data type here.

Parameter ‘locale’ is an instance of class ulocale defined in mapping, so for example the following mapping will produce anonymized (ustring) output consists of Spanish letters.

static ulocale my_locale("es_ES");
out->text_field =
anonymize(u"Some text in Spanish.", my_locale, 1, 10);

Mapping examples with name and anonymized_name as ustring data type:

out->anonymized_name = anonymize(in->name);
// "Leoš Janáček" -> "fQlKUHlduGus" (same length)

out->anonymized_name = anonymize(in->name, u" ");
// "Leoš Janáček" -> "hGrT iUjSFeQ" (keep space)

out->anonymized_name = anonymize(in->name, u" š");
// "Leoš Janáček" -> "jTDš oIZqqWv" (keep also letter š)

out->anonymized_name = anonymize(in->name, u" aeiou", true);
// "8 Leoš Janáček" -> "3 Peoi Kařawec" (keep vowels and char class)

out->anonymized_name = anonymize(in->name, 2, 10);
// "Bedřich Smetana" -> "SwpAq" (length between 2 and 10)
// "Antonín Dvořák" -> "Qs" (length between 2 and 10)

out->anonymized_name = anonymize(in->name, 0, length(in->name));
// "Bedřich Smetana" -> "HsgIusTFErq"
// "Antonín Dvořák" -> "" // might be a NULL if 'name' is nullable

anonymize(number, min, max)

(since EVL 2.1)

To be used for ‘number’ of all integral data types, for decimals and for floats. The function returns such data type then. Example (for :

anonymize((int)100, -5, 10);
// return integer between 95 and 110 (incl.)
anonymize( 100.00, -5, 10);
// return float between 95 and 110 (incl.)

14.5.2 anonymize_uniq

anonymize_uniq()

(since EVL 2.1)

Example:

out->anonymized_username = anonymize_uniq(in->id);

14.5.3 anonymize_iban

anonymize_iban()

(since EVL 2.4)

Example:

string iban  = "NL91 ABNA 0417 1643 00"
string iban2 = "NL91ABNA0417164300"

anonymize_iban(iban)
// return .... .... .... ....
anonymize_iban(iban2)
// return ..................
anonymize_iban(iban, iban_anon::keep_country)
// return NL.. .... .... .... ..
anonymize_iban(iban, iban_anon::keep_country_and_bank)
// return NL.. ABNA .... .... ..
anonymize_iban(iban, iban_anon::whole, iban_form::grouped)
// return .... .... .... .... ..
anonymize_iban(iban, iban_anon::whole, iban_form::compact)
// return ..................
anonymize_iban(iban, iban_anon::keep_country, iban_form::compact)
// return NL................

14.6 Encryption Functions

For all encryption functions there are again the same rules as for string functions, i.e.:

  • when the argument is ‘nullptr’, it returns again ‘nullptr’;
  • when the (first) argument is ‘pointer’, it returns again ‘pointer’.

rsa_encrypt_string(str, public_key)
rsa_encrypt_ustring(ustr, public_key)

(since EVL 2.6)

First argument ‘(u)str’ is mandatory and is of data type ‘string’ or ‘ustring’. In both cases the function returns ‘string’ data type.

Second argument is also mandatory and contains the public key previously defined in the mapping by

static rsa_public_key public_key("/path/to/key.pub");

Encrypted string is actually binary data, so if there is a need to store this encrypted data in text mode, e.g. in CSV file, then for example ‘str_to_base64’ function can be used. See example below.

rsa_decrypt_string(str, private_key)
rsa_decrypt_ustring(str, private_key)

(since EVL 2.6)

First argument ‘str’ is mandatory and is a (binary) string previously encrypted by ‘rsa_encrypt_string’ or ‘rsa_encrypt_ustring’. It is necessary to keep the string or ustring couple, e.g. ‘rsa_decrypt_ustring’ use for string encrypted by ‘rsa_encrypt_ustring’. In both case the function returns ‘string’ data type.

Second argument is mandatory and contains the private key previously defined in the mapping by

static rsa_private_key private_key("/path/to/key.priv");

Mapping example for encryption

Both output fields are of type string, input field ‘name’ is of type ustring.

static rsa_public_key pubkey("/path/to/key.pub");

out->name_encrypted_binary
= rsa_encrypt_ustring(in->name,pubkey);
out->name_encrypted_textual
= str_to_base64(rsa_encrypt_ustring(in->name,pubkey));

// Proper way in this particular example would be of course to use
// out->name_encrypted_textual
= str_to_base64(out->name_encrypted_binary);
// to avoid applying encryption function twice

Mapping example for decryption afterwards

Both output fields are of type ustring and store the same value.

static rsa_private_key privkey("/path/to/key.priv");

out->name1 = rsa_decrypt_ustring(in->name_encrypted_binary,privkey);
out->name2 = rsa_decrypt_ustring(
base64_to_str(in->name_encrypted_textual),privkey);

14.7 Conversion Functions

14.7.1 to_<type>

(since EVL 1.0)

to_char(value)
to_uchar(value)
to_short(value)
to_ushort(value)
to_int(value)
to_uint(value)
to_long(value)
to_ulong(value)
return value of any (reasonable) data type converted to given integral data type.

to_float(value)
to_double(value)
return value of any (reasonable) data type converted to float or double,

to_decimal(value,n)
return value of any (reasonable) data type converted to decimal with scale ‘n’ (i.e. decimal places).

to_date(value)
to_time(value)
to_time_ns(value)
to_interval(value)
to_interval_ns(value)
to_datetime(value)
to_timestamp(value)
return value of any (reasonable) data type converted to given date/time data type.

str_to_ipv4(str)
str_to_ipv6(str)
(since EVL 2.4)
convert string ‘str’ to IPv4 or IPv6.


14.8 IP Addresses Functions

Typical IPv4 manipulation usage within a mapping:

// convert and assign IPv4 string into unsigned integer
out->ipv4_uint = str_to_ipv4(in->ipv4_string);
// or the other way
out->ipv4_string = ipv4_to_str(in->ipv4_uint);

Typical IPv6 manipulation usage within a mapping:

// suppose in->ipv6_string = "4567::123"
out->ipv6_normalized = ipv6_normalize(in->ipv6_string);
// return "4567:0000:0000:0000:0000:0000:0000:0123"

// suppose in->ipv6_string = "0000:0000:0000:0004:5678:9098:0000:0654"
out->ipv6_compressed = ipv6_compress(in->ipv6_string);
// return "::4:5678:9098:0000:654"

Or one can distinguish both IP versions:

if ( is_valid_ipv4(in->ip_string) ) {
// act on IPv4
}
else if ( is_valid_ipv6(in->ip_string) ) {
// act on IPv6
}
else {
// act when neither is valid
}

There are these two rules in all IP manipulation functions described in this section:

  • When the first argument is a pointer, the function returns also a pointer.
  • When the first argument is ‘nullptr’, the function returns ‘nullptr’ as well.

14.8.1 IPv4 Functions

(since EVL 2.4)

ipv4addr
constructor

str_to_ipv4()
convert string to uint32,

ipv4_to_str()
convert uint32 to ipv4 string,

is_valid_ipv4()
to check whether the string is valid IPv4.

14.8.2 IPv6 Functions

(since EVL 2.4)

str_to_ipv6()
convert string to uint128,

ipv6_to_str()
convert uint128 to ipv6 string,

is_valid_ipv6()
to check whether the string is valid IPv6,

ipv6_normalize()
convert string to normalized IPv6 string,

ipv6_compress()
convert string to compressed IPv6 string,

Examples

To get normalized and compressed IPv6:

// suppose in->ipv6_string = "0000:0000:22::0003:4"
out->ipv6_normalized = ipv6_normalize(in->ipv6_string);
// "0000:0000:0022:0000:0000:0000:0003:0004"
out->ipv6_compressed = ipv6_compress(in->ipv6_string);
// "0:0:22::3:4"

14.9 Logical Functions

14.9.1 is_equal

(since EVL 2.7)

is_equal(value1,value2)
return TRUE if ‘value1’ is equal to ‘value2’ or if both are null, otherwise it is FALSE. ‘value’s might be also pointers. For example following example is applicable:

is_equal(in->value_field1, in->value_field2)

14.9.2 is_in

(since EVL 2.4)

is_in(value, compare1, compare2, ...)
return TRUE if ‘value’ is equal ‘compare1’ or equal to ‘compare2’, etc., otherwise it is FALSE. ‘value’ doesn’t need to be the same data type as compared values, but must be comparable. ‘value’ and also compared list of values might be also pointers. For example following example is applicable:

is_in(in->some_uint, 123, in->some_long, 12.00, nullptr)

is_in(value, vector)
return TRUE if ‘value’ is equal at least one of the ‘vector’ elements, otherwise it is FALSE.

14.9.3 is_valid_<type>

(since EVL 1.0)

is_valid_char(str)
is_valid_uchar(str)
is_valid_short(str)
is_valid_ushort(str)
is_valid_int(str)
is_valid_uint(str)
is_valid_long(str)
is_valid_ulong(str)
to check if given string ‘str’ is valid integral data type,

is_valid_float(str)
is_valid_double(str)
to check if given string ‘str’ is valid float or double,

is_valid_decimal(str,m,n,dec_sep,thous_sep)
to check if given string ‘str’ is valid decimal number with precision ‘m’ and scale ‘n’, and with decimal separator ‘dec_sep’ and thousand separator ‘thous_sep’,

is_valid_date(str,format)
is_valid_datetime(str,format)
is_valid_timestamp(str,format)
to check if the given string ‘str’ is valid date and time data type in specified ‘format’,

is_valid_ipv4(str)
is_valid_ipv6(str)
(since EVL 2.4)
to check whether the string ‘str’ is valid IPv4 or IPv6.


14.10 Checksum Functions

md5sum(str)
sha224sum(str)
sha256sum(str)
sha384sum(str)
sha512sum(str)

(since EVL 1.0)

these standard checksum functions can be used in mapping this way for example:

*out->anonymized_username = sha256sum(*in->username);

When the argument is ‘nullptr’, it returns ‘nullptr’. But in such case you need to use pointer manipulation, so the example would look like:

out->anonymized_username = sha256sum(in->username);

Functions headers:

std::string  md5sum(const char* const str);
std::string md5sum(const std::string& str);
std::string* md5sum(const std::string* const str);

std::string sha224sum(const char* const str);
std::string sha224sum(const std::string& str);
std::string* sha224sum(const std::string* const str);

std::string sha256sum(const char* const str);
std::string sha256sum(const std::string& str);
std::string* sha256sum(const std::string* const str);

std::string sha384sum(const char* const str);
std::string sha384sum(const std::string& str);
std::string* sha384sum(const std::string* const str);

std::string sha512sum(const char* const str);
std::string sha512sum(const std::string& str);
std::string* sha512sum(const std::string* const str);

14.11 Mathematical Functions

abs(x)

(since EVL 2.8)

min(x)
max(x)

(since EVL 2.8)

round(x)
ceil(x)
floor(x)
trunc(x)

(since EVL 2.8)

round’ to the nearest integer, when exactly in the middle (e.g. 2.5) round it up. ‘ceil’ and ‘floor’ round always down or up to the nearest integer. ‘trunc’ truncates the fractional part.

*out->rounded   = round(2.56);    // 3.00
*out->ceiling = ceil(2.56); // 3.00
*out->floored = floor(2.56); // 2.00
*out->truncated = trunc(2.56); // 2.00

*out->rounded = round(-2.56); // -3.00
*out->ceiling = ceil(-2.56); // -2.00
*out->floored = floor(-2.56); // -3.00
*out->truncated = trunc(-2.56); // -2.00

When the argument is ‘nullptr’, all functions return ‘nullptr’. But in such case you need to use pointer manipulation, so the example would look like:

out->rounded = round(in->value);

pow(x)
sqrt(x)

(since EVL 2.8)


14.12 Lookup Functions

For higher level overview check Lookup tables.

14.12.1 index, index_range, index_all, get_<type>

(since EVL 2.0)

To avoid running lookup function several times to return different fields for given ‘key_value(s)’ use ‘index’ functions, which return only an index to the whole record found in a lookup.

Function ‘index_all’ return all occurrences as a vector.

Functions ‘get_<type>’ then return particular field value for given index.

index(key_value(s))
to lookup by given ‘key_value(s)’ and return an index of an occurrence. It doesn’t care about the order of an occurrence, simply return that one which reach the first. Use better only when sure there is only one such value in a lookup, i.e. use with ‘table::unique_key’ flag of ‘table’ definition.

index_range(key_value(s))
to lookup by given ‘key_value(s)’, where the last key value is the one to fit within the range, and return an index of an occurrence. It doesn’t care about the order of an occurrence, simply return that one which reach the first. Use better only when sure there is only one such value in a lookup, i.e. use with ‘table::unique_key’ flag of ‘table’ definition.

index_all(key_value(s))
to lookup by given ‘key_value(s)’ and return a vector of all occurrences.

get_char(field_name,index)
get_uchar(field_name,index)
get_short(field_name,index)
get_ushort(field_name,index)
get_int(field_name,index)
get_uint(field_name,index)
get_long(field_name,index)
get_ulong(field_name,index)
get_int128(field_name,index)
get_uint128(field_name,index)
get_float(field_name,index)
get_double(field_name,index)
get_decimal(field_name,index)
get_date(field_name,index)
get_datetime(field_name,index)
get_timestamp(field_name,index)
get_time(field_name,index)
get_time_ns(field_name,index)
get_interval(field_name,index)
get_interval_ns(field_name,index)
get_string(field_name,index)
get_ustring(field_name,index)
once having an ‘index’ of the record, these functions return value of particular ‘field_name’.

Usage example

// get path to lookup dir from an environment
static string lookup_dir = std::getenv("LOOKUP_DIR");

// define a lookup table (file is sorted and binary, key is unique in the file)
static table CompanyGroupID(lookup_dir + "/DimCompany.CompanyGroupID.hist.evf",
"generated/evd/Lookup/DimCompany.CompanyGroupID.hist.1.evd",
"CompanyGroupID",
table::unique_key);

// assign necessary fields
out->company_group_id = in->company_group_id;

// lookup and store as an vector
auto group_ids = CompanyGroupID.index_all(in->company_group_id);

// loop over such vector
for ( auto ind : group_ids ) {
out->company_group = CompanyGroupID.get_ustring("CompanyGroupName",ind);
out->company_id = CompanyGroupID.get_int("CompanyID",ind);
out->company_name = CompanyGroupID.get_ustring("CompanyName",ind);
add_record(); // produce a record for each Company in a group
}
discard(); // to avoid last record of the group to be doubled

14.12.2 lookup_<type>

(since EVL 1.0)

lookup_char(field_name,key_value(s))
lookup_uchar(field_name,key_value(s))
lookup_short(field_name,key_value(s))
lookup_ushort(field_name,key_value(s))
lookup_int(field_name,key_value(s))
lookup_uint(field_name,key_value(s))
lookup_long(field_name,key_value(s))
lookup_ulong(field_name,key_value(s))
lookup_int128(field_name,key_value(s))
lookup_uint128(field_name,key_value(s))
lookup_float(field_name,key_value(s))
lookup_double(field_name,key_value(s))
lookup_decimal(field_name,key_value(s))
lookup_date(field_name,key_value(s))
lookup_datetime(field_name,key_value(s))
lookup_timestamp(field_name,key_value(s))
lookup_time(field_name,key_value(s))
lookup_time_ns(field_name,key_value(s))
lookup_interval(field_name,key_value(s))
lookup_interval_ns(field_name,key_value(s))
lookup_string(field_name,key_value(s))
lookup_ustring(field_name,key_value(s))
to lookup by given ‘key_value(s)’ and return value of ‘field_name’ of given data type.

Usage example

// define a lookup table (it is a sorted text file, ignore case of the key)
static table company("/data/dimensions/company.csv",
"evd/dimensions/company.evd",
"Company_ID",
table::text_read | table::ignore_case);

// assign looked-up field
out->company_name = company.lookup_string("Name", in->company_group_id);

14.12.3 lookup_range_<type>

(since EVL 2.0)

lookup_range_char(field_name,key_value(s))
lookup_range_uchar(field_name,key_value(s))
lookup_range_short(field_name,key_value(s))
lookup_range_ushort(field_name,key_value(s))
lookup_range_int(field_name,key_value(s))
lookup_range_uint(field_name,key_value(s))
lookup_range_long(field_name,key_value(s))
lookup_range_ulong(field_name,key_value(s))
lookup_range_int128(field_name,key_value(s))
lookup_range_uint128(field_name,key_value(s))
lookup_range_float(field_name,key_value(s))
lookup_range_double(field_name,key_value(s))
lookup_range_decimal(field_name,key_value(s))
lookup_range_date(field_name,key_value(s))
lookup_range_datetime(field_name,key_value(s))
lookup_range_timestamp(field_name,key_value(s))
lookup_range_time(field_name,key_value(s))
lookup_range_time_ns(field_name,key_value(s))
lookup_range_interval(field_name,key_value(s))
lookup_range_interval_ns(field_name,key_value(s))
lookup_range_string(field_name,key_value(s))
lookup_range_ustring(field_name,key_value(s))
to lookup by given ‘key_value(s)’, where the last one is the one to fit within the range, and return ‘field_name’ of given data type.


14.13 Other Functions

14.13.1 first_not_null

(since EVL 2.8)

first_not_null(value1,value2,...)
return first object which is not null. Last value in the list can be fixed value.

For example following example can be used in mapping:

out->value = first_not_null(in->value_field1,
in->value_field2, in->value_field3);

which means that in output ‘value’ will be assigned ‘value_field1’ if it is not null, otherwise ‘value_field2’ if it is not null, otherwise ‘value_field3’ (even if it is null).

Example with default value:

out->value = first_not_null(in->value_field1,
in->value_field2, in->value_field3, "N/A");

which is the same as previous example, except in case also ‘value_field3’ is null, then string ‘N/A’ is assigned to ‘*out->value’.

14.13.2 getenv_<type>

(since EVL 2.8)

To get environment variable into the mapping (i.e. in the ‘evm’ file) standard C++ code can be used:

// This is not recommended example!
static const string release = std::getenv("RELEASE");
static int batch_id = atoi(std::getenv("BATCH_ID"));
*out->ID = batch_id++;
*out->release = release;

However when there is no variable set in the environment, the failure of the mapping is not handled properly. So better use following EVL functions, which can use also default values.

Always use ‘static’ key word and in the case that the variable should not be changed in the mapping also use ‘const’.

getenv_<integer_type>(<env_var>, [<default_value>])

To read an environment variable as an integral type, there is also an optional argument with the value to be used in the case that the variable is not set. So for example

static int batch_id = getenv_int("BATCH_ID");

will fail in case of undefined ‘BATCH_ID’ variable, but

static int batch_id = getenv_int("BATCH_ID",1);

will use number 1 in such case and do not fail.

getenv_float(<env_var>, [<default_value>])

getenv_double(<env_var>, [<default_value>])

For float types it behaves the same as for integral types, example with default value:

static float accumulate = getenv_float("START_VALUE",1000.00);

getenv_decimal(<env_var>, <scale>, [<default_value>])

For decimal there must be <scale> specified, e.g.:

static decimal X = getenv_decimal("X_VALUE",2);

except that it behaves the same as for integral types.

Example with default value:

static decimal X = getenv_decimal("X_VALUE",2,decimal(1000,2));

getenv_<datetime_type>(<env_var>, [<format>], [<default_value>])

There is a second optional argument <format> for date and time types which specifies in which format is the value of the environment variable. By default it uses format from environment variables EVL_DEFAULT_<type>_PATTERN which are by default set to

EVL_DEFAULT_DATE_PATTERN="%Y-%m-%d"
EVL_DEFAULT_DATETIME_PATTERN="%Y-%m-%d %H:%M:%S"
EVL_DEFAULT_TIMESTAMP_PATTERN="%Y-%m-%d %H:%M:%E*S"
EVL_DEFAULT_TIME_PATTERN="%H:%M:%S"
EVL_DEFAULT_TIME_NANO_PATTERN="%H:%M:%E*S"

So for example having environment variable CURRENT_TIMESTAMP=20250319154034 would be used in mapping like this:

static const datetime curr_datetime = 
getenv_datetime("CURRENT_TIMESTAMP","%Y%m%d%H%M%S");

and with default value it would be:

static const datetime curr_datetime = 
getenv_datetime("CURRENT_TIMESTAMP",
"%Y%m%d%H%M%S",
datetime("2000-01-01 00:00:00"));

getenv_string(<env_var>, [<default_value>])

getenv_ustring(<env_var>, [<default_value>])

String types has second argument optional and can specify the default value in case the variable is not set in environment. Example with default value:

static const ustring author_name =
getenv_ustring("AUTHOR",u"Jan Štěpnička");

The error message in case of undefined environment variable (and no default value) is

fail("Environment variable " + env_var + " is not set.")

Example of mapping using getenv_<type> function

static int batch_id = getenv_int("BATCH_ID");
static const uchar initial_load_flag = getenv_int("INITIAL_LOAD",0);
static const string release = getenv_string("RELEASE","no_release");

*out->ID = batch_id++;
if (initial_load_flag)
out->release = nullptr;
else
*out->release = release;

The error message in case of undefined BATCH_ID would be

Environment variable BATCH_ID is not set.