As a data engineer, a typical working day for me, apart from meetings, is full of SELECT
, FROM
and WHERE
. But these basic statements are not enough, especially for the complex ad hoc analysis that is increasingly common nowadays.
SQL is a powerful language. It is a declarative language where we define what we want and the engine finds a way to achieve it. The language is evolving to adapt to the increasing variety of analysis needs. I wrote an article about an advanced SQL feature to deal with hierarchical data. And today, let’s explore another beyond-the-basic feature: PIVOT.
Problem Statement
Imagine you are working as a data engineer for a retail company. The company wants to analyze product sales data to identify trends and opportunities for growth. The data is stored in a table called Sales
with the following structure:
ProductID | Date | Amount |
---|---|---|
101 | 2024-01-10 | 300 |
101 | 2024-12-15 | 500 |
101 | 2025-01-15 | 700 |
101 | 2025-02-01 | 1100 |
102 | 2024-02-20 | 800 |
102 | 2024-11-03 | 400 |
102 | 2025-01-20 | 900 |
102 | 2025-02-22 | 650 |
103 | 2023-07-25 | 1200 |
103 | 2024-08-15 | 1500 |
103 | 2025-02-10 | 1250 |
104 | 2023-12-05 | 400 |
104 | 2024-06-30 | 800 |
104 | 2025-01-30 | 300 |
104 | 2025-02-25 | 500 |
This structure is not good for reports. The company wants this data served in a format where years are represented as columns for easier comparison across products.
GROUP BY - The Amateur Way
A very straightforward approach to this problem is to use the GROUP BY
statement. We will group by ProductID
, and we will get the sum column for each month. Below is a SQL Server example. Other SQL engines should have similar syntax.
select
ProductID
,sum(iif(year(Date)=2023, Amount, null)) as [2023]
,sum(iif(year(Date)=2024, Amount, null)) as [2024]
,sum(iif(year(Date)=2025, Amount, null)) as [2025]
from Sales
group by ProductID;
GROUP BY query result
ProductID 2023 2024 2025 101 null 800 1800 102 null 1200 1550 103 1200 1500 1250 104 400 800 800
At first glance, it’s simple, and it works. Sometimes just working is enough.
SQL PIVOT - The Complex Way
PIVOT
is a operator in SQL that allows you to transform rows into columns. This transformation is particularly useful when summarizing data and creating a more interpretable format for analysis. The below SQL Server query achieves similar results:
select
*
from (
select
ProductID,
year(Date) as Year,
Amount
from Sales
) as ToPivotSales pivot (
sum(Amount) for Year in ([2023], [2024], [2025])
) as PivotedSales
PIVOT query result
ProductID 2023 2024 2025 101 null 800 1800 102 null 1200 1550 103 1200 1500 1250 104 400 800 800
To do PIVOT
, we first need a subquery to specify the columns we need. In this case they are ProductID
, Year
and Amount
. If you don’t like subqueries, CTE (Common Table Expression) works as well.
In the PIVOT
decalaration:
sum(Amount)
: specifies that the aggregation function to be applied issum
, which will sum theAmount
values.for Year in ([2023], [2024], [2025])
: defines how the pivoting will occur:for Year
: indicates that the values in theYear
column will be used to create new columns in the result set.in ([2023], [2024], [2025])
: specifies the specific years that will become the new columns in the result. Each of these years will have a corresponding column that contains the summed Amount for that year.
To be honest, I don’t like the syntax of PIVOT
. It is terrible to me. It involves subqueries and CTEs, it uses more levels of indentation, and it’s harder to scan through. And if we look at the execution plan, it is not really faster than GROUP BY
.
With that being said, PIVOT
still has advantages over GROUP BY
that it requires less boilerplate code. In the examples above, to add a new year to the query with GROUP BY
, you have to copy, paste, and edit in 2 places. With PIVOT
, all you have to do is add a new value to the list. This makes PIVOT
shine in situations where we have to deal with a long list of values.
Dynamic PIVOT - The Hacker Way
While both GROUP BY and PIVOT are useful, they have the limitation that you must explicitly specify the list of values. When the data is small, this is fine. But it becomes a problem when dealing with evolving, large data where you do not know or do not want to manually list all possible values. Imagine you build a report of annual sales, you clearly don’t want to update the query every year.
Dynamic PIVOT is a technique that allows us to pivot data without hard-coding the pivot column values. It is not a standard SQL operation. Therefore, the syntax may vary between different SQL engines. In Snowflake SQL, you can achieve dynamic pivoting by something as simple as this.
...
pivot (
sum(Amount) for Year in (any order by Year)
)
Most other SQL engines don’t offer this level of simplicity. They require a bit of complexity. Below is a SQL Server query example.
declare
@cols as nvarchar(max),
@query as nvarchar(max);
select @cols = string_agg(quotename(Year), ',') within group (order by Year)
from (select distinct year(Date) as Year from Sales) as YearList;
set @query = (
'select *
from (
select ProductID, year(Date) as Year, Amount
from Sales
) as ToPivotSales pivot (
sum(Amount) for Year in (' + @cols + ')
) as PivotedSales;'
);
exec sp_executesql @query;
Dynamic PIVOT query result
ProductID 2023 2024 2025 101 null 800 1800 102 null 1200 1550 103 1200 1500 1250 104 400 800 800
Although the code is not pleasing to eyes, the concept is simple. We need an additional step to find all the unique values and store them in a variable. Then we construct the query with the above list of unique values. Execute it and we get the expected result.
Final Thought
Mastering SQL is not easy. SQL is more than just a query language; it’s a powerful tool that, when used effectively, can turn raw data into actionable intelligence. Advanced SQL techniques such as PIVOT
are not very common in our day-to-day work. But we should know what we have in our toolbox. So that we can quickly select the right tool from the toolbox where the right job applies.
* You can find the execution of SQL query examples in this post at https://dbfiddle.uk/2jD1lHkL