Skip to main content
Version: Next

Write strategies

When loading a file to a database table you can specify how to data is written to the table. That's what the metadata.writeStrategy property is for.

The following strategies are available:

StrategyDescription
APPENDInsert all rows into the table. If the table already contains data, the new rows will be appended.
OVERWRITEReplace all rows in the table with the new rows. This will delete all existing rows and insert the new ones.
OVERWRITE_BY_KEYMerge the new rows with the existing rows. If a row with the same key already exists in the table, the new row will overwrite the old one otherwise it will be appended.
OVERWRITE_BY_KEY_AND_TIMESTAMPMerge the new rows with the existing rows. If a row with the same key and an older timestamp already exists in the table, the new row will overwrite the old one otherwise it will be appended.
OVERWRITE_BY_PARTITIONMerge the new rows with the existing rows. All existing partitions present in the new data will be overwritten. All other partitions will be left untouched and new partitions will be appended.
DELETE_THEN_INSERTDelete rows in the target table for which records with the same keys exist in the incoming data, before inserting the incoming rows.
SCD2Merge the new rows with the existing rows using the Slowly Changing Dimension Type 2 strategy. This will keep track of the history of the rows in the table.
ADAPTATIVEThe write strategy will be determined at runtime based on the properties of the file being loaded.

APPEND

If the table does not exist, it will be created. If the table already contains data, the new rows will be appended.


Append to the table.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "APPEND"
attributes:
- ...

OVERWRITE

If the table does not exist, it will be created. Replace all rows in the table with the new rows. This will delete all existing rows and insert the new ones.


Overwrite the table.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "OVERWRITE"
attributes:
- ...

OVERWRITE_BY_KEY

If the table does not exist, it will be created. Merge the new rows with the existing rows. If a row with the same key already exists in the table, the new row will overwrite the old one otherwise it will be appended.


Overwrite by key. The column 'id' is used as key.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "OVERWRITE_BY_KEY"
key: ["id"]
on: TARGET
attributes:
- ...

OVERWRITE_BY_KEY_AND_TIMESTAMP

If the table does not exist, it will be created. Merge the new rows with the existing rows. If a row with the same key and an older timestamp already exists in the table, the new row will overwrite the old one otherwise it will be appended.


Overwrite by key and timestamp. The column 'id' is used as key.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "OVERWRITE_BY_KEY_AND_TIMESTAMP"
key: ["id"]
timestamp: "valid_from"
on: TARGET
attributes:
- ...

OVERWRITE_BY_PARTITION

If the table does not exist, it will be created. Merge the new rows with the existing rows. All existing partitions present in the new data will be overwritten. All other partitions will be left untouched and new partitions will be appended.


Overwrite by partition. Require the partition property to be set in the metadata.sink section.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "OVERWRITE_BY_PARTITION"
on: TARGET
attributes:
- ...

DELETE_THEN_INSERT

If the table does not exist, it will be created. Delete rows in the target table for which records with the same keys exist in the incoming data, before inserting the incoming rows.


Delete then insert
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "DELETE_THEN_INSERT"
key: ["id", "name" ...]
attributes:
- ...

SCD2

If the table does not exist, it will be created. Merge the new rows with the existing rows using the Slowly Changing Dimension Type 2 strategy. This will keep track of the history of the rows in the table.


Slow changing dimension Type 2. The column 'id' is used as key and the column 'date' is used as timestamp.
table:
pattern: "<table>.*.csv"
metadata:
...
writeStrategy:
type: "SCD2"
key: ["id"]
timestamp: "date"
startTs: "valid_from"
endTs: "valid_to"
on: BOTH
attributes:
- ...

the startTs and endTs properties are used to specify the names of the columns that will be used to store the start and end timestamps of the rows.

They may be omitted. In that case, the default values sl_start_ts and sl_end_ts will be used.

These default values may be changed in the metadata/application.sl.yml file by setting the following global variables .


application:
...
scd2StartTimestamp: "sl_start_ts"
scd2EndTimestamp: "sl_end_ts"

Adaptive write strategy

Have you ever needed to change the way you feed your table from time to time or periodically? Adaptive Write may be the solution to your need. This feature allows you to adjust the loading mode at runtime, according to various criteria listed in the table below.

For example, you want to ingest in APPEND mode throughout the week, except on Sundays when the source sends you all of certain tables, as discrepancies may occur with the incremental mode. This can be done automatically by changing the domain or table configuration.

The example below illustrates the change at domain level that will be propagated to all these tables.

# _config.sl.yml
load:
name: "DOMAIN
metadata:
...
writeStrategy:
types:
APPEND: 'dayOfWeek != 7'
OVERWRITE: 'dayOfWeek == 7'

Another example is based on the file name:

# _config.sl.yml
load:
name: "DOMAIN"
metadata:
...
writeStrategy:
types:
OVERWRITE: 'group("mode") == "FULL"'
APPEND: 'group("mode") == "APPEND"'
#my_table.sl.yml
table:
...
pattern: ".*-(?<mode>FULL|APPEND).csv$"

You may want to combine these criteria. If so, just use regular boolean operators with !, &&and || and wrap with parenthesis if necessary.

note

When using String in expression, makes sure to wrap them with double quotes "

List of criterias

CriteriaDescriptionExample
group(index or name)File pattern must use (named) capture groupspattern: `my-file-(F
fileSizeCurrent file size in bytesfileSize > 1000
fileSizeBCurrent file size in bytes. Alias of fileSize
fileSizeKoCurrent file size in Ko
fileSizeMoCurrent file size in Mo
fileSizeGoCurrent file size in Go
fileSizeToCurrent file size in To
isFirstDayOfMonthCurrent day is first day of month
isLastDayOfMonthCurrent day is last day of month
dayOfWeekInteger representing day of week. Monday = 1, ..., Sunday = 7
isFileFirstDayOfMonthFile modification date is first day of month
isFileLastDayOfMonthFile modification date is last day of month
fileDayOfWeekInteger representing file modification day of week. Monday = 1, ..., Sunday = 7
note

For a criteria relying on datetime, you can change its timezone with timezone application settings in application.sl.yml