Polars versus Spark
· 6 min read
Introduction
Polars is often compared to Spark. In this post, I will highlight the main differences and the best use cases for each in my data engineering activities.
As a Data Engineer, I primarily focus on the following goals:
- Parsing files, validating their input, and loading the data into the target data warehouse.
- Once the data is loaded, applying transformations by joining and aggregating the data to build KPIs.
However, on a daily basis, I also need to develop on my laptop and test my work locally before delivering it to the CI pipeline and then to production.
What about my fellow data scientist colleagues? They need to run their workload on production data through their favorite notebook environment.
This post addresses the following points:
- How suitable each tool is for loading files into your data warehouse.
- How easy and powerful each tool is for performing transformations.
- How easy it is to test your code locally before deploying to the cloud