First official version released! After more than a year of design and development and after first industry implementation in T-Mobile CZ.

New features:

  • Lookup tables’ – lookup loaded into memory and used in mappings.
  • Checksum functions’ – standard checksum function for strings: ‘md5’, ‘sha224’, ‘sha256’, ‘sha384’, ‘sha512’.
  • HDFS support
  • Spark code generation – Parquet and Impala integration
  • Job Manager

New components:

  • Aggreg’ – do aggregation for groups of records.
  • Cat’ – concatenate several input flows into single output one.
  • Comp’ – use custom component, which is actually another job.
  • Cut’ – omit fields from input by the output data definition.
  • Filter’ – for simple one- or two-way switch. For more complex use ‘Map’.
  • Join’ – join two input flows by the key. Catch left/right or even unmatched records.
  • Map’ – transform input fields and write into output fields.
  • Read’ – read file(s) into output flow, uncompress if needed.
  • Sort’ – sort, deduplicate, check sort; simply the output is always sorted by the key.
  • Tee’ – replicate one input flow to several output ones.
  • Trash’ – like /dev/null.
  • Write’ – write the flow into file, compress if needed.

New commands:

  • Mkdir
  • Mv