Transitioning from Pandas to Polars the easy way — by taking a pit stop at SQL.

Ben Feifke

Towards Data Science

The secret’s out! Polars is the hottest thing on the block, and everybody wants a slice 😎

I recently wrote a post, “The 3 Reasons I Permanently Switched From Pandas to Polars”, because, well, this is one of the most common use-cases for picking up Polars — as a drop-in replacement for Pandas. However, even though this is the most common use-case, transitioning from Pandas to Polars can be a bit strange given the heavy differences in syntax between the two.

In my earlier blog post, I discussed how Pandas forces its users to perform data queries in an object-oriented programming approach, while Polars enables its users to perform data queries in a data-oriented programming approach, much like SQL. As such, even though Polars most often serves as a drop-in replacement for Pandas, if you’re trying to learn Polars, comparing it to SQL is likely a much easier starting point than comparing it to Pandas. The objective of this post is to do just that: to compare Polars syntax to SQL syntax as a primer for getting up and running with Polars.

In this post, I show a syntax comparison of Polars vs SQL, by first establishing a toy dataset, and then demonstrating a Polars-to-SQL syntax comparison of three increasingly complex queries on that dataset.

Note that this blog post uses Google BigQuery as its SQL dialect.

The toy dataset used throughout this post is a table of orders and a table of customers for some restaurant:

orders

| order_date_utc | order_value_usd | customer_id |
|----------------|-----------------|-------------|
| 2024-01-02 | 50 | 001 |
| 2024-01-05 | 30 | 002 |
| 2024-01-20 | 44 | 001 |
| 2024-01-22 | 33 | 003 |
| 2024-01-29 | 25 | 002 |

customers

| customer_id | is_premium_customer | name…



Source link