Skip to main content

Quickly Build Optimized Big & Fast Data Pipelines

So your data just keep moving

Low Code / No Code

Starlake Data Pipeline was designed from the ground up to be easily installed and used to ingest your data and expose your insights quickly.

On Premise and Cloud Native

Starlake Data Pipeline support almost all On Premise and native cloud solutions, including Amazon S3 / Azure Storage / Google Storage / Apache HDFS, Snowflake / Google BigQuery / Amazon Redshift / Apache Hive.

DevOps friendly

Starlake Data Pipeline was designed to fully integrate into the DevOps ecosystem to take advantage of practices like Git Merge Requests, incremental CI/CD, Text based configuration and BYO SQL environment.

From any source to any sink at Spark™ speed.

Code free ingestion of any Spark source or sink including Snowflake, BigQuery, Parquet, JDBC, TEXT, XML, JSON, POSITIONAL sources. Work on any Spark distribution including Azure Synapse, Amazon EMR, Cloudera, Google Dataproc and Databricks.

Keep your Lakehouse from becoming a Dataswamp.

With advanced validation and rich metadata, define semantic types and make sure your input fields respect the specified formats.
Mark fields as being primary/foreign keys, optional or ignored, apply custom privacy functions and / or rename fields during the ingestion process.
Apply on the fly in memory transformation using any standard SQL function or custom UDF.

Business & Developer Friendly

Get the best of both worlds.
Because business users love Excel and developers love Git and text based development, share your ingestion metadata with business users and let them bring in any update before getting them back instantly in YAML for full git versioning support.

Script Free Database Replication

Select the tables and columns in your source database and replicate your data into any warehouse using full and/or incremental modes with optional pre and post load transformations.

Security as a First-Class Citizen

Because you take your data security seriously, Starlake makes it possible to define the access control restrictions using acccess control lists (ACL), row level security (RLS) and column level security (CLS).

Data Observability through Metrics and Auditing

For each file ingested: get the date and time, the number of records accepted / rejected and the process duration.
For each rejected input attribute: get the reason for rejection, the source value and the expected format.
For each discrete column: get the list of distinct values, modality, frequency and missing values.
For each continuous column: get the min, the max, the mean, the median, the variance, the sum, the standard deviation, the 25th percentile and the 75th percentile.

Interactive YAML Schema Validation

Work using your favorite VSCode YAML validation extension . Improve productivity and readability using the context sensitive entry helpers and intelligent YAML auto-completion feature.

Interactive Relationships Editor

Using the Starlake VSCode extension, create interactive entity-relationship diagrams and share them with your business users.
Use the CLI to generate the complete entity-relationship diagram in a searchable SVG format and include it in your website.

Using the CLI you may also generate the complete access control rules diagram in a searchable SVG format.