Skip to main content
Version: 2.6

Read-Components

10 Read Components

There is a generic ‘Read’ component (see Read), which read source file(s) and parse them based on file suffix from location based on URI Scheme. It can also read a table based on URI.

But for example in the case you need to read and parse particular file format from an input flow, there are also specialized components which parse such format:

Reading various file formats

And in the case you need to use some DBMS specific options to read a table, there are also DB specific read components:

Reading tables and streams


10.1 Read

(since EVL 1.0)

Read <source>(s) (file mask can be specified) and sends it to output <f_out>. Multiple <source>s are concatenated.

It automatically parses various file formats: ‘Avro’, ‘json’, ‘Parquet’, ‘QVD’, ‘xls’, ‘xlsx’ and ‘xml’, just based on file suffix.

Also when compression suffix is recognized, like ‘gz’, ‘tar’, ‘bz2’, ‘zip’, ‘Z’, data are decompressed automatically.

In general the <source> is of the form

[scheme:][//[user@@]host[:port]]/path/basename[.format][.compression] [scheme:][//[user@@]host[:port]/]database?(table=[schema.]<table>|query=<query>)

When <source> starts with ‘file:’, ‘sftp:’, ‘hdfs:’, ‘s3:’, ‘gs:’ or ‘smb:’ it uses appropriate utility to get data from such location. If no URI Scheme is presented, it reads from local file system.

When <source> starts with ‘mysql:’, ‘mssql’, ‘postgres:’, ‘oracle:’, ‘sqlite:’ or ‘teradata:’ it uses appropriate utility to get data from such database.

Besides below mentioned options, which changes file suffix behaviour, one can use generic ‘--cmd=<cmd>’ option, which calls ‘echo <source>... | xargs <cmd>’ to obtain the input for this component. <cmd> can be also a pipeline (that is the reason for xargs). See examples below for inspiration.

Read
is to be used in EVS job structure definition file. <f_out> is either output file or flow name.

evl read
is intended for standalone usage, i.e. to be invoked from command line and and write to standard output.

EVD is EVL data definition file and EVS defines EVL job structure, for details see evl-evd(5) and evl-evs(5).

URI Scheme for file:

Based on the URI Scheme in the <source>, it calls appropriate utility to get files or tables.

no scheme, ‘``file:``’,
suppose local filesystem

‘``gdrive:``’
calls ‘gdrive’ utility

‘``gs:``’
calls Google’s ‘gsutil’ utility

‘``hdfs:``’
calls ‘hdfs dfs’ utility

‘``s3:``’
calls AWS’s ‘aws s3’ utility

‘``sftp:``’
calls ‘ssh’ utility

‘``smb:``’
calls ‘smbclient’ utility

URI Scheme for table:

‘``mysql:``’
calls Readmysql component to read MySQL/MariaDB table

‘``mssql:``’
calls Readmssql component to read MySQL/MariaDB table

‘``postgres:``’
calls Readpg component to read PostgreSQL table

‘``oracle:``’
calls Readora component to read Oracle table

‘``sqlite:``’
calls Readsqlite component to read SQLite table

‘``teradata:``’
calls Readtd component to read Teradata table

Compression:

Compressed file suffix behaviour (applied by following the order):

‘``*.tgz``’, ‘``*.tar.gz``’
calls ‘tar -zxO

‘``*.tar.Z``’
calls ‘tar -ZxO

‘``*.tar.bz2``’
calls ‘tar -jxO

‘``*.tar``’
calls ‘tar -xO

‘``*.gz``’, ‘``*.GZ``’, ‘``*.Z``’, ‘``*.zip``’, ‘``*.bz2``’
calls ‘gunzip -c

‘``*.zip``’, ‘``*.ZIP``’
calls ‘unzip -p

File Type:

Read component behaves according to the <source> suffix.

Specific file formats suffix behaviour:

‘``*.avro``’, ‘``*.AVRO``’
calls ‘evl readavro

‘``*.csv``’, ‘``*.CSV``’, ‘``*.txt``’, ‘``*.TXT``’
read file(s) with ‘--text-input’ option, other than standard Unix end-of-line character (‘\n’) can be specified by option ‘--dos-eol’ or ‘--mac-eol

‘``*.json``’, ‘``*.JSON``’
calls ‘evl readjson

‘``*.parquet``’, ‘``*.parq``’, ‘``*.PARQUET``’, ‘``*.PARQ``’
calls ‘evl readparquet

‘``*.qvd``’, ‘``*.QVD``’
calls ‘evl readqvd

‘``*.xls``’, ‘``*.XLS``’
calls ‘evl readxls

‘``*.xlsx``’, ‘``*.XLSX``’
calls ‘evl readxlsx

‘``*.xml``’, ‘``*.XML``’
calls ‘evl readxml

Synopsis

Read
<source>... <f_out> (<evd>|-d <inline_evd>)
[--footer=<n>] [--header=<n>] [--cmd=<cmd>]
[<file_type_options>]
[--ignore-suffix] [--allow-missing-file]
[-y|--text-output [--dos-eol | --mac-eol] ]
[-w|--where=<condition>] [--filter=<filter>]
[--validate]

evl read
<source>... (<evd>|-d <inline_evd>)
[--footer=<n>] [--header=<n>] [--cmd=<cmd>]
[<file_type_options>]
[--ignore-suffix] [--allow-missing-file]
[-y|--text-output [--dos-eol | --mac-eol] ]
[-w|--where=<condition>] [--filter=<filter>]
[--validate]
[-v|--verbose]

evl read
( --help | --usage | --version )

Options

--allow-missing-file
don’t fail if <source> doesn’t exist, and produce empty output

-d, --data-definition=<inline_evd>
either this option or the file <evd> must be presented. Example: ‘-d \"user_name string, user_sum int\"

--filter=<filter>
when ‘--where’ option is used and replacing of SQL syntax is not valid, use <filter> when reading file(s)

-f, --footer=<n>
skip last <n> records. When multiple files, skip last <n> records in each of them. Command ‘evl head -n-<n> --skip-parse’ is used for this job.

-h, --header=<n>
skip first <n> records. When multiple files, skip first <n> records in each of them. Command ‘evl tail -<n>+(N+1) --skip-parse’ is used for this job.

--cmd=<cmd>
bash command <cmd> is used to read the <source>s. In such case recognizing file’s suffix is switched off. See examples below for inspiration.

--ignore-suffix
ignore <source>’s suffix, act only based on options.

--validate
without this option, no fields are checked against data types. With this option, all output fields are checked

-w, --where=<condition>
use this where condition instead of reading whole file/table. In case of reading a table it sends the query to the database with this where condition. In case of a file it reads the whole file and apply the evl-filter component right after. For the filter it replaces these SQL logical operators to C++ ones:

  • AND’ -> ‘&&
  • OR’ -> ‘||
  • =’ -> ‘==
  • <> -> ‘!=

so one can use also SQL notation to specify a condition. It also removes quotes around field names and replaces single quotes by double quotes for proper string notion:

  • \"field_name\"’ -> ‘field_name
  • '’ -> ‘\"

Can be useful to have the same syntax for files and for tables.

-x, --text-input
suppose the input as text, not binary

--dos-eol
suppose the input is text with CRLF as end of line

--mac-eol
suppose the input is text with CR as end of line

-y, --text-output
write the output as text, not binary

Standard options:

--help
print this help and exit

--usage
print short usage information and exit

-v, --verbose
print to stderr info/debug messages of the component

--version
print version and exit

File type options:

--avro
whatever <source>’s suffix, act as reading ‘avro’ file format

--gz
whatever <source>’s suffix, act as reading ‘gz’, ‘Z’, ‘zip’, ‘bz2’ compressed file format

--json
whatever <source>’s suffix, act as reading ‘json’ file format

--parquet
whatever <source>’s suffix, act as reading ‘parquet’ file format

--qvd
whatever <source>’s suffix, act as reading Qlik’s ‘QVD’ file format

--xls, --xlsx
whatever <source>’s suffix, act as reading MS Excel ‘xls’ or ‘xlsx’ file format

--tar
whatever <source>’s suffix, act as reading tar file

--xml
whatever <source>’s suffix, act as reading ‘xml’ file format

QVD, XLS, XLSX, XML and JSON specific option:

--match-fields
for other than QVD, XLS(X), XML and JSON file is this option ignored.

XML and JSON specific option:

--all-fields-exist
for other then XML and JSON file is this option ignored.

XML specific options:

--document-tag=<tag>
for other then XML file is this option ignored. Check ‘man evl readxml’ for details.

--record-tag=<tag>
for other then XML file is this option ignored. Check ‘man evl readxml’ for details.

--vector-element-tag=<tag>
for other then XML file is this option ignored. Check ‘man evl readxml’ for details.

XLS and XLSX specific options:

--sheet-index=<n>
read <n>-th sheet, starting from number 0. ‘--sheet-index=0’ is default

--sheet-name=<name>
read sheet with name <name>

Examples

Standard examples of standalone usage:

  1. Read tar.gz, skip header line and validate data types Write into ‘example.csv’ the content of the tarred and gzipped source without the header line and with validated data types:

    evl read -d 'id int sep=";", value string sep="\n"' \
    -h1 -vxy <example.csv.tar.gz >example.csv
  2. Gzipped json file:

    evl read sample.json.gz sample.evd -y >sample.csv

    As the file has standard file suffixes ‘gz’ and ‘json’, they are automatically recognized a gunzipped and parsed as JSON.

Standard examples of usage in EVL Job:

  1. Gzipped json file. The same as example 2., but to be used in evs file:

    Read   sample.json.gz SRC sample.evd
    Write SRC sample.csv sample.evd

10.2 Readevd

(since EVL 2.5)

Read EVD file from stdin and output using this evd structure:

parents vector null=""
string
name string
data_type string
format string null=""
comment string null=""
null vector null=""
string
separator string null=""
quote struct null=""
char string(1)
optional uchar
options vector
struct
tag string
value string null=""
decimal struct null=""
precision uchar
scale uchar
decimal_separator string(1) null=""
thousands_separator string null=""
string struct null=""
length ulong null=""
locale string null=""
encoding string null=""
max_bytes ulong null=""
max_chars ulong null=""
ustring struct null=""
length ulong null=""
locale string null=""
encoding string null=""
max_bytes ulong null=""
max_chars ulong null=""

Readevd
is to be used in EVS job structure definition file. <f_out> is either output file or flow name.

evl readevd
is intended for standalone usage, i.e. to be invoked from command line and and write to standard output.

EVD and EVS are EVL definition files, for details see evl-evd(5) and evl-evs(5).

Synopsis

Readevd
<f_in> <f_out> [-y|--text-output]

evl readevd
[-y|--text-output] [-v|--verbose]

evl readevd
( --help | --usage | --version )

Options

-y, --text-output
write the output as text, not binary

Standard options:

--help
print this help and exit

--usage
print short usage information and exit

-v, --verbose
print to stderr info/debug messages of the component

--version
print version and exit


10.3 Readjson

(since EVL 1.2)

Parse <f_in> into <evd>.

In general not all input fields need to exist in the input JSON, but if they are, then the option ‘--all-fields-exist’ will speed up the processing.

When the input JSON has not the same order of fields as defined in <evd>, then option ‘--match-fields’ has to be used.

Usually when reading JSON file written by ‘Writejson’, it is good to call ‘Readjson’ with option ‘-a’, as there are always all fields from <evd>.

Readjson
is to be used in EVS job structure definition file. <f_out> is either output file or flow name.

evl readjson
is intended for standalone usage, i.e. to be invoked from command line and and write to standard output.

EVD and EVS are EVL definition files, for details see evl-evd(5) and evl-evs(5).

Synopsis

Readjson
<f_in> <f_out> (<evd>|-d <inline_evd>)
[-a|--all-fields-exist] [-m|--match-fields] [-y|--text-output]

evl readjson
(<evd>|-d <inline_evd>)
[--all-fields-exist] [--match-fields] [-y|--text-output]
[-v|--verbose]

evl readjson
( --help | --usage | --version )

Options

-d, --data-definition=<inline_evd>
either this option or the file <evd> must be presented. Example: -d ’user_sum long’

-a, --all-fields-exist
when the input contain all fields (e.g. output of evl-writejson), then using this option increase the performance

-m, --match-fields
when field are not in the same order as used in evd, this option must be used

-y, --text-output
write the output as text, not binary

Standard options:

--help
print this help and exit

--usage
print short usage information and exit

-v, --verbose
print to stderr info/debug messages of the component

--version
print version and exit

In general not all input fields need to exist in the input JSON, but if they are, then the option "–all-fields-exist" will speed up the processing.

When the input JSON has not the same order of fields as defined in "EVD" file, then option "–match-fields" has to be used.

Usually when reading file written by "EVL" component ‘Writejson’, it is good to call "Readjson" with option "-a", as there are always all fields from "EVD".


10.4 Readkafka

(since EVL 1.1)

Component calls kafka consumer command, specified by ‘EVL_KAFKA_CONSUMER_COMMAND’, which is by default ‘kafka-console-consumer.sh’. and run it with options:

--bootstrap-server "<server>:<port>" --topic "<topic>" ``<kafka_consumer_opts>``

and send the output to <f_out>.

Readkafka
is to be used in EVS job structure definition file. <f_out> is either output file or flow name.

evl readkafka
is intended for standalone usage, i.e. to be invoked from command line and and write to standard output.

EVS is EVL job structure definition files, for details see evl-evd(5) and evl-evs(5).

Synopsis

Readkafka
<topic> <f_out>
-s|--bootstrap-server <server:port>
[<kafka_consumer_opts>]

evl readkafka
<topic>
-s|--bootstrap-server <server:port>
[<kafka_consumer_opts>]
[-v|--verbose]

evl readkafka
( --help | --usage | --version )

Options

Standard options:

--help
print this help and exit

--usage
print short usage information and exit

-v, --verbose
print to stderr info/debug messages of the component

--version
print version and exit


10.5 Readmysql

(since EVL 2.4)

Write to stdout or <f_out> MariaDB/MySQL <table>.

Password is taken from file ‘$EVL_PASSFILE’, which is by default ‘$HOME/.evlpass’. When such file has not permissions 600 (or 400), it is ignored! For details see ‘evl-password’.

Readmysql
is to be used in EVS job structure definition file. <f_out> is either output file or flow name.

evl readmysql
is intended for standalone usage, i.e. to be invoked from command line and writing records to standard output.

EVD and EVS are EVL definition files, for details see evl-evd(5) and evl-evs(5).

Synopsis

Readmysql
[<schema>.]<table> <f_out> (<evd>|-d <inline_evd>)
[-b|--dbname=<database>] [-h|--host=<hostname>] [-p|--port=<port>]
[-q|--query=<query>] [-u|--username=<mysqluser>]
[--mysql=<mysql-options>] [-y|--text-output]

evl readmysql
[<schema>.]<table> (<evd>|-d <inline_evd>)
[-b|--dbname=<database>] [-h|--host=<hostname>] [-p|--port=<port>]
[-q|--query=<query>] [-u|--username=<mysqluser>]
[--mysql=<mysql-options>] [-y|--text-output]
[-v|--verbose]

evl readmysql
( --help | --usage | --version )

Options

-d, --data-definition=<inline_evd>
either this option or the file <evd> must be presented. Example: -d ’id int, user_id string enc=iso-8859-1’

-q, --query=<query>
Use SQL <query> instead of reading whole table. With this option <table> might be an empty string.

-y, --text-output
write the output as text, not binary

Standard options:

--help
print this help and exit

--usage
print short usage information and exit

-v, --verbose
print to stderr info/debug messages of the component

--version
print version and exit

’mysql’ options:

-b, --dbname=<database>
this option is provided to ‘mysql’ command as ‘--database=<database>

-h, --host=<hostname>
this option is provided to ‘mysql’ command

-p, --port=<port>
using other than standard port 3306. This option is provided to ‘mysql’ command.

-u, --username=<mysqluser>
if not mentioned, then current system username is used as mysql user. This option is provided to ‘mysql’ command as ‘--user=<mysqluser>’.

--mysql=<mysql-options>
other mysql options can be specified here


10.6 Readora

(since EVL 2.0)

Write to standard output or <f_out> Oracle <table>.

When <schema> is not present, environment variable ‘ORADATABASE’ is used.

Password is taken from file ‘$EVL_PASSFILE’, which is by default ‘$HOME/.evlpass’. When such file has not permissions 600 (or 400), it is ignored! For details see ‘evl-password’.

Readora
is to be used in EVS job structure definition file. <f_out> is either output file or flow name.

evl readora
is intended for standalone usage, i.e. to be invoked from command line and writing records to standard output.

EVD and EVS are EVL definition files, for details see evl-evd(5) and evl-evs(5).

SQL*Plus Field Separator
Reading the table by SQL*Plus uses as field seprator the value of ‘$EVL_ORACLE_FIELD_SEPARATOR’, which is by default set to ‘\x1f’ (i.e. an Unit Separator), and last field in each record is separated by ‘\n’.

SQL*Plus script hook
Custom options might be added to SQL*Plus script by environment variable ‘$EVL_ORACLE_SQLPLUS_HOOK’.

Synopsis

Readora
[<schema>.]<table> <f_out> <evd>
[--query=<query>] [-w|--where=<condition>]
[ --connect=<connect_identifier> | -b|--dbname=<database> -h|--host=<hostname> [-p|--port=<port>] ]
[-u|--username=<oracle_user>] [-y|--text-output]

evl readora
[<schema>.]<table> <evd>
[--query=<query>] [-w|--where=<condition>]
[ --connect=<connect_identifier> | -b|--dbname=<database> -h|--host=<hostname> [-p|--port=<port>] ]
[-u|--username=<oracle_user>] [-y|--text-output]
[-v|--verbose]

evl readora
( --help | --usage | --version )

Options

--query=<query>
use SQL <query> instead of reading whole table. With this option <table> might be an empty string.

-w, --where=<condition>
use this where condition instead of reading whole table.

-y, --text-output
write the output as text, not binary

Standard options:

--help
print this help and exit

--usage
print short usage information and exit

-v, --verbose
print to stderr info/debug messages of the component

--version
print version and exit

’sqlplus’ options:

--connect=<connect_identifier>
sqlplus will be called in the form:

<username>/<password>@<connect_identifier>

where <connect_identifier> can be in the form

[<net_service_name> | [//]Host[:Port]/<service_name>]

without this option environment variable ‘ORACONN’ (if defined) is used as connection identifier for sqlplus

-b, --dbname=<database>
either this or environment variable ‘ORADATABASE’ should be provided, If also ‘ORADATABASE’ environment variable is set, this option has preference.

-h, --host=<hostname>
either this or environment variable ‘ORAHOST’ should be provided when connecting to other host than localhost. If also ‘ORAHOST’ variable is set, this option has preference.

-p, --port=<port>
either this or environment variable ‘ORAPORT’ should be provided when using other than standard port ‘1521’.

-u, --username=<oracle_user>
without this option environment variable ‘ORAUSER’ is used as user for sqlplus


10.7 Readparquet

(since EVL 2.0)

Write to stdout or <f_out> Parquet files from <parquet> directory.

Readparquet
is to be used in EVS job structure definition file. <f_out> is either output file or flow name.

evl readparquet
is intended for standalone usage, i.e. to be invoked from command line and writing records into standard output.

EVD and EVS are EVL definition files, for details see evl-evd(5) and evl-evs(5).

Synopsis

Readparquet
<parquet> <f_out> (<evd>|-d <inline_evd>) [-y|--text-output]

evl readparquet
<parquet> (<evd>|-d <inline_evd>) [-y|--text-output]
[-v|--verbose]

evl readparquet
( --help | --usage | --version )

Options

-d, --data-definition=<inline_evd>
either this option or the file <evd> must be presented. Example: -d ’id int, name string, started timestamp’

-y, --text-output
write the output as text, not binary

Standard options:

--help
print this help and exit

--usage
print short usage information and exit

-v, --verbose
print to stderr info/debug messages of the component

--version
print version and exit


10.8 Readpg

(since EVL 2.0)

Write to standard output or <f_out> PostgreSQL <table>.

Password is taken:

  1. from file ‘$EVL_PASSFILE’, which is by default ‘$HOME/.evlpass’,
  2. from file ‘$PGPASSFILE’, which is by default ‘$HOME/.pgpass’.

When such file has not permissions 600, it is ignored! For details see ‘evl-password’.

Readpg
is to be used in EVS job structure definition file. <f_out> is either output file or flow name.

evl readpg
is intended for standalone usage, i.e. to be invoked from command line and writing records to standard output.

EVD and EVS are EVL definition files, for details see evl-evd(5) and evl-evs(5).

Synopsis

Readpg
[<schema>.]<table> <f_out> (<evd>|-d <inline_evd>)
[-q|--query=<query> | -w|--where=<condition>]
[-b|--dbname=<database>] [-h|--host=<hostname>] [-p|--port=<port>]
[-u|--username=<pguser>] [--psql=<psql_options>] [-y|--text-output]

evl readpg
[<schema>.]<table> (<evd>|-d <inline_evd>)
[-q|--query=<query> | -w|--where=<condition>]
[-b|--dbname=<database>] [-h|--host=<hostname>] [-p|--port=<port>]
[-u|--username=<pguser>] [--psql=<psql_options>] [-y|--text-output]
[-v|--verbose]

evl readpg
( --help | --usage | --version )

Options

-d, --data-definition=<inline_evd>
either this option or <evd> file must be presented. Example: ‘-d 'id int, user_id string enc=iso-8859-1'

-q, --query=<query>
Use SQL <query> instead of reading whole table. With this option <table> might be an empty string.

-w, --where=<condition>
use this where condition instead of reading whole table

-y, --text-output
write the output as text, not binary

Standard options:

--help
print this help and exit

--usage
print short usage information and exit

-v, --verbose
print to stderr info/debug messages of the component

--version
print version and exit

’psql’ options:

-b, --dbname=<database>
either this or environment variable ‘PGDATABASE’ should be provided, if not, then current system username is used as psql database. If also ‘PGDATABASE’ environment variable is set, this option has preference. (This option is provided to ‘psql’ command.)

-h, --host=<hostname>
either this or environment variable ‘PGHOST’ should be provided when connecting to other host than localhost. If also ‘PGHOST’ variable is set, this option has preference. (This option is provided to ‘psql’ command.)

-p, --port=<port>
either this or environment variable ‘PGPORT’ should be provided when using other than standard port ‘5432’. (This option is provided to ‘psql’ command.)

--psql=<psql_options>
all other options to be provides to psql command. See ‘man psql’ for details.

-u, --username=<pguser>
either this or environment variable ‘PGUSER’ should be provided, if not, then current system username is used as psql user. If variable ‘PGUSER’ is set, this option has preference. (This option is provided to ‘psql’ command.)

Examples

  1. To read a table from default schema (mostly ‘public’) in EVL job (i.e. in EVS file) from localhost:5432:

    export PGUSER=some_pg_user
    export PGDATABASE=my_db
    Readpg my_table MYTABLE evd/mytable.evd
    Map MYTABLE ...

    Password is taken from ~/.pgpass, which has 600 permissions and look like this:

    localhost:5432:my_db:some_pg_user:H+SCs9;_@D

10.9 Readqvd

(since EVL 2.3)

Write to standard output or <f_out> the content of the <file.qvd>. It parses fields as they are specified in EVD file, unless ‘--match-fields’ is specified.

If there are less fields in the EVD file than in QVD, only such fields are returned.

Readqvd
is to be used in EVS job structure definition file. <f_out> is either output file or flow name.

evl readqvd
is intended for standalone usage, i.e. to be invoked from command line and reading records from standard input.

EVD and EVS are EVL definition files, for details see evl-evd(5) and evl-evs(5).

Synopsis

Readqvd
<file.qvd> <f_out> (<evd>|-d <inline_evd>)
[-y|--text-output | -a|--text-output-dos-eol | -b|--text-output-mac-eol]
[-m|--match-fields]
[-n|--null-as-string[=<string>]]
[--filter=<condition>]
[--first-record=<n>]
[--guess-uniform-symbol-size]
[--low-memory]

evl readqvd
<file.qvd> (<evd>|-d <inline_evd>)
[-y|--text-output | -a|--text-output-dos-eol | -b|--text-output-mac-eol]
[-m|--match-fields]
[-n|--null-as-string[=<string>]]
[--filter=<condition>]
[--first-record=<n>]
[--guess-uniform-symbol-size]
[--low-memory]
[-v|--verbose]

evl readqvd
( --help | --usage | --version )

Options

-d, --data-definition=<inline_evd>
either this option or the file <evd> must be presented. Example: ‘-d 'id int, name string, started timestamp'

-m, --match-fields
match fields between EVD and QVD, otherwise they are taken one by one from input QVD file. If there are less fields in the EVD file than in QVD, only such fields are returned.

-n, --null-as-string[=<string>]
read <string> as a NULL value, without <string> specified it reads an empty string as NULL

--filter=<condition>
read only records with given <condition>.

--first-record=<n>
start to read from the record number <n>.

--guess-uniform-symbol-size
might speed up indexing of dictionary, but it could not work in all cases. Use only in special cases when need really good performance.

--low-memory
do not read dictionary into memory. This could save memory consumption, but slows down reading the source file.

-y, --text-output
write the output as text, not binary

--text-output-dos-eol
produce the output as text with CRLF as end of line

--text-output-mac-eol
produce the output as text with CR as end of line

Standard options:

--help
print this help and exit

--usage
print short usage information and exit

-v, --verbose
print to stderr info/debug messages of the component

--version
print version and exit


10.10 Readsqlite

(since EVL 2.7)

Write to stdout or <f_out> SQLite <table>.

It takes the whole table with columns in order defined by EVD, unless <query> and/or <condition> is specified.

Path to the database file is taken from environment variable ‘$EVL_SQLITE_DATABASE’, unless <db_file> is specified.

Readsqlite
is to be used in EVS job structure definition file. <f_out> is either output file or flow name.

evl readsqlite
is intended for standalone usage, i.e. to be invoked from command line and writing records to standard output.

EVD and EVS are EVL definition files, for details see evl-evd(5) and evl-evs(5).

Synopsis

Readsqlite
<table> <f_out> (<evd>|-d <inline_evd>)
[--dbname=<db_file>] [--query=<query>] [-w|--where=<condition>]
[-y|--text-output]

evl readsqlite
<table> (<evd>|-d <inline_evd>)
[--dbname=<db_file>] [--query=<query>] [-w|--where=<condition>]
[-y|--text-output]
[-v|--verbose]

evl readsqlite
( --help | --usage | --version )

Options

-d, --data-definition=<inline_evd>
either this option or the file <evd> must be presented. Example: -d ’id int, user_id string enc=iso-8859-1’

--dbname=<db_file>
path to the SQLite database file; if this option is not used, database file is taken from environment variable ‘$EVL_SQLITE_DATABASE’.

--query=<query>
Use SQL <query> instead of reading whole table. With this option <table> might be an empty string.

-w, --where=<condition>
use this where condition instead of reading whole table.

-y, --text-output
write the output as text, not binary

Standard options:

--help
print this help and exit

--usage
print short usage information and exit

-v, --verbose
print to stderr info/debug messages of the component

--version
print version and exit

Examples

  1. To read a table ‘my_table’ in EVL job (i.e. in EVS file) from ‘/home/myself/my_db.sqlite’:

    export EVL_SQLITE_DATABASE="/home/myself/my_db.sqlite"
    Readsqlite my_table MYTABLE evd/mytable.evd
    Map MYTABLE ...
  2. Command line usage of sending table ‘my_table’ from ‘/home/myself/my_db.sqlite’ to standard output:

    export EVL_SQLITE_DATABASE="/home/myself/my_db.sqlite"
    evl readsqlite my_table evd/mytable.evd --text-output

    or just

    evl readsqlite –dbname="/home/myself/my_db.sqlite" my_table evd/mytable.evd –text-output


10.11 Readtd

(since EVL 1.1)

Write to stdout or <f_out> Teradata <table>.

Readtd
is to be used in EVS job structure definition file. <f_out> is either output file or flow name.

evl readtd
is intended for standalone usage, i.e. to be invoked from command line and writing records to standard output.

EVD and EVS are EVL definition files, for details see evl-evd(5) and evl-evs(5).

Synopsis

Readtd
<database>.<table> <f_out> (<evd>|-d <inline_evd>) [-y|--text-output]

evl readtd
<database>.<table> (<evd>|-d <inline_evd>) [-y|--text-output]
[-v|--verbose]

evl readtd
( --help | --usage | --version )

Options

-d, --data-definition=<inline_evd>
either this option or the file <evd> must be presented. Example: -d ’id int, user_id string enc=iso-8859-1’

-y, --text-output
write the output as text, not binary

Standard options:

--help
print this help and exit

--usage
print short usage information and exit

-v, --verbose
print to stderr info/debug messages of the component

--version
print version and exit


10.12 Readxls

(since EVL 2.2)

Read XLS sheet and write to <f_out>.

Unless ‘--sheet-index’ or ‘--sheet-name’ is specified, it reads only the first sheet from the file.

It skips the header line, unless option ‘--no-header’ or ‘--match-fields’ is used.

Readxls
is to be used in EVS job structure definition file. <f_out> is either output file or flow name.

evl readxls
is intended for standalone usage, i.e. to be invoked from command line and writing records to standard output.

EVD and EVS are EVL definition files, for details see evl-evd(5) and evl-evs(5).

Synopsis

Readxls
<file> <f_out> (<evd>|-d <inline_evd>)
[-m|--match-fields | --no-header]
[--sheet-index=<n> | --sheet-name=<name>]
[-y|--text-output]

evl readxls
<file> (<evd>|-d <inline_evd>)
[-m|--match-fields | --no-header]
[--sheet-index=<n> | --sheet-name=<name>]
[-y|--text-output]
[-v|--verbose]

evl readxls
( --help | --usage | --version )

Options

-d, --data-definition=<inline_evd>
either this option or the file <evd> must be presented. Example: ‘-d 'id int, name string, started timestamp'

-m, --match-fields
read only fields specified by EVD, based on header. All characters other than ‘[a-zA-Z0-9_-]’ are replaced by underscore when matching with EVD field names.

--no-header
suppose there is no header

--sheet-index=<n>
read <n>-th sheet, starting from number 0 (i.e. ‘--sheet-index=0’ is the default behaviour)

--sheet-name=<name>
read sheet with name <name>

-y, --text-output
write the output as text, not binary

Standard options:

--help
print this help and exit

--usage
print short usage information and exit

-v, --verbose
print to stderr info/debug messages of the component

--version
print version and exit


10.13 Readxlsx

(since EVL 2.2)

Read XLSX sheet and write to <f_out>.

Unless ‘--sheet-index’ or ‘--sheet-name’ is specified, it reads only the first sheet from the file.

It skips the header line, unless option ‘--no-header’ or ‘--match-fields’ is used.

Readxlsx
is to be used in EVS job structure definition file. <f_out> is either output file or flow name.

evl readxlsx
is intended for standalone usage, i.e. to be invoked from command line and writing records to standard output.

EVD and EVS are EVL definition files, for details see evl-evd(5) and evl-evs(5).

Synopsis

Readxlsx
<file> <f_out> (<evd>|-d <inline_evd>)
[-m|--match-fields | --no-header]
[--sheet-index=<n> | --sheet-name=<name>]
[-y|--text-output]

evl readxlsx
<file> (<evd>|-d <inline_evd>)
[-m|--match-fields | --no-header]
[--sheet-index=<n> | --sheet-name=<name>]
[-y|--text-output]
[-v|--verbose]

evl readxlsx
( --help | --usage | --version )

Options

-d, --data-definition=<inline_evd>
either this option or the file <evd> must be presented. Example: ‘-d 'id int, name string, started timestamp'

-m, --match-fields
read only fields specified by EVD, based on header. All characters other than ‘[a-zA-Z0-9_-]’ are replaced by underscore when matching with EVD field names.

--no-header
suppose there is no header

--sheet-index=<n>
read <n>-th sheet, starting from number 0 (i.e. ‘--sheet-index=0’ is the default behaviour)

--sheet-name=<name>
read sheet with name <name>

-y, --text-output
write the output as text, not binary

Standard options:

--help
print this help and exit

--usage
print short usage information and exit

-v, --verbose
print to stderr info/debug messages of the component

--version
print version and exit


10.14 Readxml

(since EVL 1.3)

Parse XML <f_in> into <evd>.

In general not all input fields need to exist in the input XML, but if they are, then the option ‘--all-fields-exist’ will speed up the processing.

When the input XML has not the same order of fields as defined in <evd>, then option ‘--match-fields’ has to be used.

Usually when reading XML file written by ‘Writexml’ it is good to call ‘Readxml’ with option ‘-a’, as there are always all fields from <evd>.

Readxml
is to be used in EVS job structure definition file. <f_out> is either output file or flow name.

evl readxml
is intended for standalone usage, i.e. to be invoked from command line and and write to standard output.

EVD and EVS are EVL definition files, for details see evl-evd(5) and evl-evs(5).

Synopsis

Readxml
<f_in> <f_out> (<evd>|-d <inline_evd>)
[-a|--all-fields-exist]
[-m|--match-fields]
[--document-tag=<tag>]
[--record-tag=<tag>]
[--vector-element-tag=<tag>]
[-y|--text-output]

evl readxml
(<evd>|-d <inline_evd>)
[-a|--all-fields-exist]
[-m|--match-fields]
[--document-tag=<tag>]
[--record-tag=<tag>]
[--vector-element-tag=<tag>]
[-y|--text-output]
[-v|--verbose]

evl readxml
( --help | --usage | --version )

Options

-d, --data-definition=<inline_evd>
either this option or the file <evd> must be presented. Example: -d ’user_sum long’

-a, --all-fields-exist
when the input contain all fields (e.g. output of evl-writexml), then using this option increase the performance

-m, --match-fields
when field are not in the same order as used in evd, this option must be used

--document-tag=<tag>
specify a tag name of the main tag, by default it tries to guess it. XML file should look like this:

<?xml version="1.0" encoding="UTF-8"?>
<document>
...
</document>

where the tag ‘document’ can be of any name.

--record-tag=<tag>
specify a tag name of a record, by default it tries to guess it. XML file should look like this:

<?xml version="1.0" encoding="UTF-8"?>
<document>
<record>
...
</record>
<record>
...
</record>
<record>
...
</record>
...
</document>

where the tag ‘record’ can be of any name, but the same accross the file.

--vector-element-tag=<tag>
the name of the tag for vector elements, e.g. XML file with vector ‘someVector’:

...
<someVector>
<elem>1</elem>
<elem>2</elem>
<elem>3</elem>
</someVector>
...

shoul be read with option ‘--vector-element-tag=elem’.

-y, --text-output
write the output as text, not binary

Standard options:

--help
print this help and exit

--usage
print short usage information and exit

-v, --verbose
print to stderr info/debug messages of the component

--version
print version and exit