Appendix-A-EVD-and-Data-Types

Version: 2.8

Appendix A EVD and Data Types

‘EVD’ stands for ‘EVL Data Definition’ and it is the way how to specify structure of data sets in EVL. It can be used either inline, as a component option, or in an *.evd file.

EVL uses mostly standard C++ data types, so most of them are well known.

A.1 EVD Structure

Example first: Let’s have a CSV file:

1;Otto Wichterle;27.10.1913;12,345.78;2025-03-19 14:34:07
2;;1.1.1970;0.00;2025-03-19 14:35:44

then following evd file would describe its structure: ¹

ID           int                sep=";"
Name         string             sep=";"  null=""
"Birth Date" date("%-d.%-m.%Y") sep=";"  null="1.1.1970"
Amount       decimal(12,3)      sep=";"    thousands_sep=","
"Created At" datetime           sep="\n" null="0000-00-00 00:00:00"

In general each nonempty line of EVD file looks like this:

<indent> Field_Name <blank> Data_Type <blank> EVD_Options

where

‘<indent>’
might be empty, 2 spaces, 4 spaces, 6 spaces, etc., to define a substructure of compound data types, see Compound Types for details.

‘Field_Name’
is a sequence of any printable ASCII characters below 128. When a space is used, then whole field name must be quoted by double quotes. Special characters (also only ASCII ones under 128) must be escaped, e.g. ‘\n’, ‘\r’, ‘\t’, ‘\v’, ‘\b’, ‘\f’, ‘\a’, ‘\"’, ‘\\’, or in hexa ‘\x??’. Characters other than letters, numbers and underscore are replaced by underscore in mappings. All these field names are valid:

                            // Name in mapping:
recommended_field_name      // recommended_field_name
"Field with a Space"        // Field_with_a_Space
'field-with-a-hyphen'       // _field_with_a_hyphen_
"$field_with_dollar"        // _field_with_dollar
'single_quoted'field'       // _single_quoted_field_
"with\nnewline"             // with_newline

‘Data_Type’
is one of:

string, ustring, see String
char, uchar, short, ushort, int, uint, long, ulong, int128, uint128, see Integral Types
decimal, see Decimal
float, double, see Float and Double
date, time, time_ns, interval, interval_ns, datetime, timestamp, see Date and Time

‘EVD_Options’
is <blank> separated list of options, see EVD Options

‘<blank>’
is one or more spaces and/or tabs.

A.1.1 Comments

Standard C-style comments can be used in evd file, for example:

street_id    int
street_name  string
street_code  string  null=""  // but NOT NULL in DB
/* COMBAK: street_code will be replaced by street_num later this year
street_num   long
*/

A.1.2 Inline EVD

For the most of the EVL Components an inline EVD can be specified as an option. In such case comments are not allowed and the format is simply the same as for EVD in a file, just instead of newlines, commas are used to separate each field definition.

The same structure, as in above EVD Example, but as a component option (a comma separated list of fields with data types and options):

--data-definition='id int sep=";",
    name string sep=";" null="",
    birth_date date sep=";" null="1970-01-01",
    amount decimal(12,3) sep=";" thousands_sep=",",
    created_at datetime sep="\n" null="0000-00-00 00:00:00"'

A.2 EVD Options

Structure of the data is described in an EVD file – an EVL data types definition file – with file extension .evd.

A.2.1 Separator Definition

Field separator is defined by ‘sep="X"’, where ‘X’ can be an empty string or an ascii character below 128 specified as normal string or special character ‘\n’, ‘\r’, ‘\t’, ‘\v’, ‘\b’, ‘\f’, ‘\a’, ‘\"’, ‘\\’, or in hexa ‘\x??’ (0-7E) (as it is always a single character, ‘\x?’ is also possible).

Default separators can be defined:

EVL_DEFAULT_FIELD_SEPARATOR
defines default field separator, when not set, EVL_DEFAULT_FIELD_SEPARATOR='|' is used,

EVL_DEFAULT_RECORD_SEPARATOR
defines default record separator, i.e. the last field separator, when not set, EVL_DEFAULT_RECORD_SEPARATOR='\n' is used.

When these variables are set, then no ‘sep=’ options are needed in the above EVD example and these defaults are used instead.

Note: It is recommended to use these variables only for project-wide settings in project.sh. Try to avoid to set them in jobs. Better use ‘sep=’ option in evd file.

In case we want to have an empty separator, for example after fixed length field, we can use ‘sep=""’.

A.2.2 Null Option

A null string by ‘null="X"’ or list of strings ‘null=["X","Y",...]’ can be specified. Then such string(s) will be read as ‘null’ values when ‘--text-input’ is used by the component.

When writing the ‘null’ value by the output component with ‘--text-output’ option, such string will be used instead.

When the list of null values is specified, then the first one will be used to write.

To type a special character, like newline or ‘TAB’, standard hexadecimal notation can be used: ‘\x??’, or also special notation for often used special characters: ‘\n’, ‘\r’, ‘\t’, ‘\v’, ‘\b’, ‘\f’, ‘\a’, ‘\"’, ‘\\’. So then to interpret a tabulator as NULL value use ‘null="\t"’.

A.2.3 Quote Option

When reading csv files, fields might be quoted by some character, usually by double quotes: ‘"’.

Proper parsing of such field is done by specifying attributes ‘quote=’ or ‘optional_quote=’.

Specified string might be any ascii character below 128 specified as normal string or special character ‘\n’, ‘\r’, ‘\t’, ‘\v’, ‘\b’, ‘\f’, ‘\a’, ‘\"’, ‘\\’, or in hexa ‘\x??’ (0-7E) (as it is always a single character, ‘\x?’ is also possible).

quote="<quote_char>"
Use this attribute when the field is always quoted.

optional_quote="<quote_char>"
Using this attribute, the field doesn’t need to be quoted.

A.2.4 Encoding and Locale

(since EVL 2.5)

enc="<encoding>"
To specify an encoding of given field, string functions then behaves according to that.

locale="<locale>"
To specify a locale of given field, components (like sort) then behaves according to that.

Examples

czech_string_in_utf8  string  enc="utf8" locale="cs_CZ"
en_string_in_utf8     string  enc="utf8" locale="en_GB"

When there is no encoding or locale specified in an EVD, then following environment variables can be used:

EVL_DEFAULT_STRING_ENC=""
defines default encoding, when not set, empty encoding is used,

EVL_DEFAULT_STRING_LOCALE="C"
defines default locale, when not set, generic ‘C’ locale is used.

A.2.5 Max string length

(since EVL 2.5)

Attributes which are used to specify maximal length of given string field. So far used only in case of load/unload tables.

max_bytes="<number>"
To specify maximum Bytes of given string field. Is populated when generated based on table definition, e.g. ‘VARCHAR(100 BYTES)’.

max_chars="<number>"
To specify maximum characters of given string field. Is populated when generated based on table definition, e.g. ‘VARCHAR(100 CHARS)’.

Examples:

string_20_bytes  string  enc="utf8" max_bytes="20"  // VARCHAR(20 BYTES)
string_20_chars  string  enc="utf8" max_chars="20"  // VARCHAR(20 CHARS)

Both attributes are currently used in ‘Writeora’ component to know the maximal length of a string field.

A.2.6 QVD options

(since EVL 2.4)

qvd:format="<format_string>"
To specify a format string for ‘timestamp’, ‘datetime’, and ‘date’ data types when read/write Qlik’s QVD files. Example:

request_dt  timestamp   qvd:format="%d/%m/%Y %H:%M:%S"
some_date   date        qvd:format="%d.%m.%Y"

qvd:interval
qvd:time
To be used as an attribute for ‘timestamp’ and ‘datetime’ data types to get an interval or time data type in Qlik’s QVD files. Example:

request_time1  timestamp  qvd:time
request_time2  timestamp  qvd:interval

Compared to Qlik’s time data type, interval can be larger than 24 hours. For example input timestamp ‘1970-01-02 03:05:30’ would be ‘03:05:30’ as time, but ‘27:05:30’ as interval.

A.3 String

Standard C++ library ‘std::basic_string’ is used for strings. For details see

http://en.cppreference.com/w/cpp/string/basic_string

string
size: up to 2⁶⁴ Bytes (i.e. limited only by memory)

An EVD file Example:

field_name1  string(10)
field_name2  string(10) sep=""
field_name3  string     sep=";" null="NULL"
field_name4  string             null=""              quote="\""
field_name5  string             null=["","N/A","NA"]
last_field   string

where

‘field_name1’
cannot be NULL and has fixed length 10 bytes, followed by the value of $EVL_DEFAULT_FIELD_SEPARATOR environment variable.

‘field_name2’
cannot be NULL and has fixed length 10 bytes, with no separator.

‘field_name3’
is nullable and string ‘NULL’ is interpreted as NULL value. End of the field is represented by character ‘;’.

‘field_name4’
is nullable and empty string is interpreted as NULL value. Field is quoted by ‘"’, but for an empty string, quotes are not needed. The end of the field is represented by $EVL_DEFAULT_FIELD_SEPARATOR.

‘field_name5’
is nullable and empty string, ‘N/A’ and ‘NA’ are interpreted as NULL value when reading, but when writing into text file, NULL is represented by the first one, i.e. an empty string. The end of the field is represented by $EVL_DEFAULT_FIELD_SEPARATOR.

‘last_field’
cannot be NULL and the end of the field is represented by $EVL_DEFAULT_RECORD_SEPARATOR.

Example of four records which can be parsed by above EVD file definition.

          |          NULL;"second string field"|NA|last field
0123456789|0123456789first string field;""|N/A|last field
----------|----------;"  ;  second field  |  "|third string field|last field
abcdefghij|abcdefghij       ;||last field

Neither EVL_DEFAULT_FIELD_SEPARATOR nor EVL_DEFAULT_RECORD_SEPARATOR is set, so default values are used, i.e. ‘|’ and ‘\n’.

A.4 Integral Types

All integral data types are standard C++ ones.

char
size: 1 Byte, min: − 128, max: 127

uchar
size: 1 Byte, min: 0, max: 255

short
size: 2 Bytes, min: − 32 768, max: 32 767

ushort
size: 2 Bytes, min: 0, max: 65 535

int
size: 4 Bytes, min: − 2 147 483 648, max: 2 147 483 647

uint
size: 4 Bytes, min: 0, max: 4 294 967 295

long
size: 8 Bytes, min: − 2⁶³ (approx. − 9 × 10¹⁸), max: 2⁶³− 1 (approx. 9 × 10¹⁸)

ulong
size: 8 Bytes, min: 0, max: 2⁶⁴− 1 (approx. 18 × 10¹⁸)

int128
size: 16 Bytes, min: − 2¹²⁷ (approx. − 1.7 × 10³⁸), max: 2¹²⁷− 1 (approx. 1.7 × 10³⁸)

uint128
size: 16 Bytes, min: 0, max: 2¹²⁸− 1 (approx. 3.4 × 10³⁸)

Except ‘sep=’, ‘null=’, ‘quote=’, ‘optional_quote=’, no other options are possible for these data types.

A.5 Decimal

Decimal data type is defined by ‘decimal(m,n)’, where ‘m’ is number of all digits and ‘n’ is the number of decimal places. Decimal is EVL custom data type.

decimal(m,n)
when ‘n’ is missing, zero is supposed
size: 8 Bytes for ‘m’ up to 18 digits
size: 16 Bytes for ‘m’ from 19 to 38 digits

Next to standard EVD options (i.e. ‘sep=’, ‘null=’, ‘quote=’, ‘optional_quote=’) decimal and thousands separator can be specified:

decimal_sep="."
to specify a decimal separator, which can be any single ascii character below 128; by default it is a decimal point

thousands_sep=""
defines how to separate thousands, it can be any single ascii character below 128; by default there is no thousands separator.

An EVD file example:

revenues  decimal(9,4)  decimal_sep="," thousands_sep="."  // e.g. 12.345,6789
expenses  decimal(18)                        // e.g. 123456789012345678
taxes     decimal(18,6) thousands_sep=" "    // e.g. 123 456 789 012.345678
latitude  decimal(10,6)                      // e.g. 49.8197203
longitude decimal(10,6) decimal_sep=","      // e.g. 18,1673552

A.6 Float and Double

Float and double are standard C++ data types.

float
size: 4 Bytes, range: ± 3.4 × 10^±38 (about 7 digits)

double
size: 8 Bytes, range: ± 1.7 × 10^±308 (about 15 digits)

Except ‘sep=’, ‘null=’, ‘quote=’, ‘optional_quote=’, no other options are possible for these data types.

Note: Compared to decimal(m,n) data type, operating with floats and doubles (doing summations for example), usually leads to approximated values. So it is usually good idea to avoid using these data types for money and such.

Example

With EVD file

sent_mb      float  sep="|"  null=""
received_mb  float  sep="\n" null=""

you can read source csv file like this:

0.321e12|1.234E-02
12.78E11|3.798

A.7 Date and Time

Date, time, time_ns, interval, interval_ns and datetime are data types based on standard C++ library ‘std::time’. Timestamp is built upon Google’s ‘cctz’ library.

date

(since EVL 1.0)

to store a date, i.e. day, month and year
size: 4 Bytes, range: 1970-01-01 ± approx. 6 × 10¹¹ years
first 2 Bytes keeps a year, then 1 Byte for month and 1 Byte for day
example: 2008-04-20

time

(since EVL 2.8)

to store a day time, i.e. hour, minute and second
size: 4 Bytes, range: 00:00:00 – 23:59:59
example: 13:35:00

time_ns

(since EVL 2.8)

to store a day time with nanoseconds
size: 8 Bytes, range: 00:00:00.000000000 – 23:59:59.999999999
example: 13:37:00.350000000

interval

(since EVL 2.8)

to store a time interval in hours, minutes and seconds
size: 4 Bytes, min: 00:00:00
example: 165:35:00

interval_ns

(since EVL 2.8)

to store a time interval with nanoseconds
size: 8 Bytes, min: 00:00:00.000000000
example: 165:35:00.123456789

datetime

(since EVL 1.0 as timestamp, since EVL 2.4 as datetime)

to store a date and time, i.e. year, month, day, hour, minute and second
size: 8 Bytes, range: 1970-01-01 00:00:00 ± approx. 6 × 10¹¹ years
example: 2010-07-01 09:02:00

timestamp

(since EVL 2.4)

to store a date and time with nanoseconds and with a time zone, i.e. year, month, day, hour, minute, second, nanoseconds and possibly a time zone
size: 12 Bytes, range: 1970-01-01 00:00:00 ± approx. 6 × 10¹¹ years
example: 2015-05-09 13:37:00.000 +02:00

A.7.1 Format string

As an argument (in curly brackets) formatting pattern can be specified. Standard C notation is used.

When no argument to date and time data types are provided, defaults are used:

EVL_DEFAULT_DATE_PATTERN
to specify default formatting string for ‘date’ data type,
by default it is "%Y-%m-%d"

EVL_DEFAULT_TIME_PATTERN
to specify default formatting string for ‘time’ data type,
by default it is "%H:%M:%S"

EVL_DEFAULT_DATETIME_PATTERN
to specify default formatting string for ‘datetime’ data type,
by default it is "%Y-%m-%d %H:%M:%S"

EVL_DEFAULT_TIMESTAMP_PATTERN
to specify default formatting string for ‘timestamp’ data type,
by default it is "%Y-%m-%d %H:%M:%E*S"

All possible format strings:

%%
a literal ‘%’

%a
locale’s abbreviated weekday name (e.g. ‘Sun’)

%A
locale’s full weekday name (e.g. ‘Sunday’)

%b
locale’s abbreviated month name (e.g. ‘Jan’)

%B
locale’s full month name (e.g. ‘January’)

%c
locale’s date and time (e.g. ‘Thu Mar 3 23:05:25 2005’)

%C
century; like ‘%Y’, except omit last two digits (e.g. ‘20’)

%d
day of month (e.g. ‘01’)

%D
date; same as ‘%m/%d/%y’

%e
day of month, space padded; same as ‘%_d’

%Ez
RFC3339-compatible numeric UTC offset (+hh:mm or -hh:mm)

%E*z
full-resolution numeric UTC offset (+hh:mm:ss or -hh:mm:ss)

%E#S
seconds with # digits of fractional precision

%E*S
seconds with full fractional precision (a literal ’*’)

%E#f
fractional seconds with # digits of precision

%E*f
fractional seconds with full precision (a literal ’*’)

%E4Y
four-character years (-999 ... -001, 0000, 0001 ... 9999)

%ET
the RFC3339 "date-time" separator "T"

%F
full date; same as ‘%Y-%m-%d’

%g
last two digits of year of ISO week number (see ‘%G’)

%G
year of ISO week number (see ‘%V’); normally useful only with ‘%V’

%h
same as ‘%b’

%H
hour (‘00’..‘23’)

%I
hour (‘01’..‘12’)

%j
day of year (‘001’..‘366’)

%k
hour, space padded (‘ 0’..‘23’); same as ‘%_H’

%l
hour, space padded (‘ 1’..‘12’); same as ‘%_I’

%m
month (‘01’..‘12’)

%M
minute (‘00’..‘59’)

%n
a newline

%p
locale’s equivalent of either ‘AM’ or ‘PM’; blank if not known

%P
like ‘%p’, but lower case

%r
locale’s 12-hour clock time (e.g. ‘11:11:04 PM’)

%R
24-hour hour and minute; same as ‘%H:%M’

%s
seconds since ‘1970-01-01 00:00:00 UTC’

%S
second (‘00’..‘60’)

%t
a tab

%T
time; same as ‘%H:%M:%S’

%u
day of week (‘1’..‘7’); ‘1’ is Monday

%U
week number of year, with Sunday as first day of week (‘00’..‘53’)

%V
ISO week number, with Monday as first day of week (‘01’..‘53’)

%w
day of week (‘0’..‘6’); ‘0’ is Sunday

%W
week number of year, with Monday as first day of week (‘00’..‘53’)

%x
locale’s date representation (e.g. ‘12/31/99’)

%X
locale’s time representation (e.g. ‘23:13:48’)

%y
last two digits of year (‘00’..‘99’)

%Y
year

%z
+hhmm numeric time zone (e.g., -0400)

%Z
alphabetic time zone abbreviation (e.g., EDT)

By default, date pads numeric fields with zeroes. The following optional flags may follow ‘%’.

-
(hyphen) do not pad the field

_
(underscore) pad with spaces

0
(zero) pad with zeros

^
use upper case if possible

#
use opposite case if possible

A.7.2 EVD Example

Following dates definition are equivalent.

valid_from  date
valid_from  date("%F")
valid_from  date("%Y-%m-%d")

Following datetimes are all the same.

request_dt  datetime
request_dt  datetime("%F %T")
request_dt  datetime("%Y-%m-%d %H:%M:%S")

Following timestamps are all the same.

request_dt  timestamp
request_dt  timestamp("%F %T.%E9f")
request_dt  timestamp("%Y-%m-%d %H:%M:%S.%E9f")

QVD’s format string can be specified:

request_dt  timestamp  qvd:format="%d/%m/%Y %H:%M:%S"
some_date   date       qvd:format="%d.%m.%Y"

A.7.3 Qlik’s time

When time need to be specified in QVD file, then standard timestamp need to be provided, just with ‘qvd:time’ option. Then the date is simply cut off from the timestamp to be stored in QVD:

request_time  timestamp("%H:%M:%S")  qvd:time

A.7.4 Qlik’s interval

When interval data type need to be specified in QVD file, then standard timestamp need to be provided, just with ‘qvd:interval’ option. Then the time is taken since ‘1970-01-01’:

request_time  timestamp("%Y-%m-%d %H:%M:%S")  qvd:interval

Note: Compared to Qlik’s time data type, interval can be larger than 24 hours. For example input timestamp ‘1970-01-02 03:05:30’ would be ‘03:05:30’ as time, but ‘27:05:30’ as interval.

A.8 Compound Types

Note: Compound data types cannot be used to read/write QVD/QVX files. These file formats do not support it.