Skip to main content
Version: Next

Type validation

starlake allows you to validate the types of the data you are loading. This is done by specifying the schema of the data you are writing. This schema, generally infered from sample data, is described using the YAML syntax in the attributessection of the metadata/load/<domain>/<table>.sl.yml file.

The order_line table may be described as follows:


metadata/load/starbake/order_line.sl.yml - validation section
table:
pattern: "order_line.*.csv"
metadata:
- ...
attributes:
- name: "order_id"
type: "int"
- name: "product_id"
type: "int"
- name: "quantity"
type: "int"
- name: "sale_price"
type: "double"

int and double are what we call the basic types and are mapped to the corresponding SQL types.

Basic types

The basic types below are supported by starlake:


metadata/types/default.sl.yml
types:
- name: "string"
primitiveType: "string"
pattern: ".+"
sample: "Hello World"
comment: "Any set of chars"
- name: "int"
pattern: "[-|\\+|0-9][0-9]*"
primitiveType: "long"
sample: "1234"
comment: "Int number"
- name: "integer"
pattern: "[-|\\+|0-9][0-9]*"
primitiveType: "long"
sample: "1234"
comment: "Int number"
- name: "byte"
primitiveType: "byte"
pattern: "."
sample: "x"
comment: "Any set of chars"
- name: "double"
primitiveType: "double"
pattern: "[-+]?\\d*\\.?\\d+[Ee]?[-+]?\\d*"
sample: "-45.78"
comment: "Any floating value"
- name: "long"
primitiveType: "long"
pattern: "[-|\\+|0-9][0-9]*"
sample: "-64564"
comment: "any positive or negative number"
- name: "short"
primitiveType: "short"
pattern: "-?\\d+"
sample: "564"
comment: "any positive or negative number"
- name: "boolean"
primitiveType: "boolean"
pattern: "(?i)true|yes|[y1]<-TF->(?i)false|no|[n0]"
sample: "TruE"
- name: "timestamp"
primitiveType: "timestamp"
pattern: "ISO_DATE_TIME"
sample: "2019-12-31 23:59:02"
comment: "date/time in epoch millis"
- name: "decimal"
primitiveType: "decimal"
pattern: "-?\\d*\\.{0,1}\\d+"
sample: "-45.787686786876"
comment: "Any floating value"
- name: "date"
primitiveType: "date"
pattern: "yyyy-MM-dd"
sample: "2018-07-21"
comment: "Date in the format yyyy-MM-dd"
- name: "basic_iso_date"
primitiveType: "timestamp"
pattern: "BASIC_ISO_DATE"
sample: "20111203"
comment: "Timestamp based on yyyMMdd pattern"
- name: "iso_local_date"
primitiveType: "timestamp"
pattern: "ISO_LOCAL_DATE"
sample: "2011-12-03"
comment: "Timestamp based on yyyy-MM-dd pattern"
- name: "iso_offset_date"
primitiveType: "timestamp"
pattern: "ISO_OFFSET_DATE"
sample: "2011-12-03+02:00"
comment: "Timestamp based on `ISO Date with offset` pattern"
- name: "iso_date"
primitiveType: "timestamp"
pattern: "ISO_DATE"
sample: "2011-12-03+02:00"
comment: "Timestamp based on `ISO Date with or without offset` pattern"
- name: "iso_local_date_time"
primitiveType: "timestamp"
pattern: "ISO_LOCAL_DATE_TIME"
sample: "2011-12-03T10:15:30"
comment: "Timestamp based on `ISO Local Date and Time` pattern"
- name: "iso_offset_date_time"
primitiveType: "timestamp"
pattern: "ISO_OFFSET_DATE_TIME"
sample: "2011-12-03T10:15:30+01:00"
comment: "Timestamp based on `ISO Local Date and Time` pattern"
- name: "iso_zoned_date_time"
primitiveType: "timestamp"
pattern: "ISO_ZONED_DATE_TIME"
sample: "2011-12-03T10:15:30+01:00[Europe/Paris]"
comment: "Timestamp based on `ISO Zoned Date Time` pattern"
- name: "iso_date_time"
primitiveType: "timestamp"
pattern: "ISO_DATE_TIME"
sample: "2011-12-03T10:15:30+01:00[Europe/Paris]"
comment: "Timestamp based on `ISO Date and time with ZoneId` pattern"
- name: "iso_ordinal_date"
primitiveType: "timestamp"
pattern: "ISO_ORDINAL_DATE"
sample: "2012-337"
comment: "Timestamp based on `year and day of year` pattern"
- name: "iso_week_date"
primitiveType: "timestamp"
pattern: "ISO_WEEK_DATE"
sample: "2012-W48-6"
comment: "Timestamp based on `Year and Week` pattern"
- name: "iso_instant"
primitiveType: "timestamp"
pattern: "ISO_INSTANT"
sample: "2011-12-03T10:15:30Z"
comment: "Timestamp based on `Date and Time of an Instant` pattern (UTC only)"
- name: "rfc_1123_date_time"
primitiveType: "timestamp"
pattern: "RFC_1123_DATE_TIME"
sample: "Tue, 3 Jun 2008 11:05:30 GMT"
comment: "Timestamp based on `RFC 1123 / RFC 822` patterns"

Custom types

You may redefine existing standard types or define your own types in the metadata/types directory in a file named custom.sl.yml.

This allows you to run your input data against advanced validation rules.

Below is an example of a custom type definition:


metadata/types/custom.sl.yml

types:
- name: "email"
pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
primitiveType: "string"
sample: "
comment: "Email address"
- name: "phone"
pattern: "^[0-9]{10}$"
primitiveType: "string"
sample: "1234567890"
comment: "Phone number"
- name: "zip_code"
pattern: "^[0-9]{5}$"
primitiveType: "string"
sample: "12345"
comment: "Zip code"

Note that you have to map the custom type to a primitive type, which is the type that will be used in the SQL schema.

The following primitive types are supported:

  • string
  • long
  • double
  • byte
  • short
  • boolean
  • timestamp
  • decimal
  • date