Low Code / No Code
Starlake Data Pipeline was designed from the ground up to be easily installed and used to ingest your data and expose your insights quickly.
On Premise and Cloud Native
Starlake Data Pipeline support almost all On Premise and native cloud solutions, including Amazon S3 / Azure Storage / Google Storage / Apache HDFS, Snowflake / Google BigQuery / Amazon Redshift / Apache Hive.
Starlake Data Pipeline was designed to fully integrate into the DevOps ecosystem to take advantage of practices like Git Merge Requests, incremental CI/CD, Text based configuration and BYO SQL environment.
From any source to any sink at Spark™ speed.
Keep your Lakehouse from becoming a Dataswamp.
Mark fields as being primary/foreign keys, optional or ignored, apply custom privacy functions and / or rename fields during the ingestion process.
Apply on the fly in memory transformation using any standard SQL function or custom UDF.
Business & Developer Friendly
Because business users love Excel and developers love Git and text based development, share your ingestion metadata with business users and let them bring in any update before getting them back instantly in YAML for full git versioning support.
Script Free Database Replication
Security as a First-Class Citizen
Data Observability through Metrics and Auditing
For each rejected input attribute: get the reason for rejection, the source value and the expected format.
For each discrete column: get the list of distinct values, modality, frequency and missing values.
For each continuous column: get the min, the max, the mean, the median, the variance, the sum, the standard deviation, the 25th percentile and the 75th percentile.
Interactive YAML Schema Validation
Interactive Relationships Editor
Use the CLI to generate the complete entity-relationship diagram in a searchable SVG format and include it in your website.