Skip to main content
Version: Next

Load strategies

Standard strategies

The starlake load will look for each domain and table, the files that match the pattern specified in the table.pattern attribute of the metadata/load/<domain>/<table>.sl.yml file in the directory specified in the load.metadata.directory attribute of the same file or, if not specified, from the <domain>/_config.sl.yml file.

starlake comes with two load strategies:

Load StrategyDescription
ai.starlake.job.load.IngestionTimeStrategyLoad the files in a chronological order based on the file last modification time. This is the default.
ai.starlake.job.load.IngestionNameStrategyLoad the files in a lexicographical order based on the file name.

To use a load strategy, you need to specify the loadStrategyClass attribute in the metadata/application.sl.yml file.


metadata/application.sl.yml: to switch from a time based load to a name based load
application:
...
loadStrategyClass: ai.starlake.job.load.IngestionNameStrategy
...

Custom Strategies

You can define your own load strategy by implementing the ai.starlake.job.load.LoadStrategy interface.


src/main/scala/my/own//CustomLoadStrategy.scala
object CustomLoadStrategy extends LoadStrategy with StrictLogging {

def list(
storageHandler: StorageHandler,
path: Path,
extension: String = "",
since: LocalDateTime = LocalDateTime.MIN,
recursive: Boolean
): List[FileInfo] = ???
}

metadata/application.sl.yml: to use a custom load strategy

application:
...
loadStrategyClass: ai.starlake.job.load.MyLoadStrategy
...