Cool things you can do using Snowflake SQL

7 minute read

Table of Contents

  1. Introduction
  2. QUALIFY
  3. UNPIVOT
  4. Conclusion

Introduction

Hello! It’s been 5 months since my last post, so I’m excited to be back here writing again (it’s the humanities major in me that gets me writing!).

You probably have heard of Snowflake, but in case you haven’t, it’s an incredibly powerful cloud-based data warehouse. I could go on and on about what Snowflake allows you to do, but in short, Snowflake allows you to store all of your structured and semi-structured data in one place and query that data super, super fast. I mean super, super, super fast: not only does Snowflake optimize query performance like no other, you can also instantaneously scale up the computing power used for your queries to near-infinity.

There’s a reason why Snowflake’s IPO was incredibly successful, and the company counts Warren Buffett’s Berkshire Hathaway and Salesforce as major investors. I know what you’re thinking: I’m not affiliated with Snowflake in any way—I just think it’s really incredible technology. :smile:

What I recently found out is Snowflake offers a free 30-day trial where you don’t provide any payment information, so you can test drive Snowflake and see all the neat things it can do. So that’s what I did!

In this post, I’ll show you a couple cool Snowflake SQL things that you can’t do in PostgreSQL/MySQL.

QUALIFY

The first SQL dialect I ever learned was PostgreSQL and that’s because PostgreSQL was what was taught in my Metis data science bootcamp. I later learned MySQL when I started practicing SQL on LeetCode, and that’s only because LeetCode doesn’t offer PostgreSQL. Between MySQL and PostgreSQL, PostgreSQL is definitely more powerful and popular these days. Then I tried Snowflake SQL, and Snowflake SQL is even better!

Let’s look at the below employee table together. I created this table in Snowflake within minutes by defining the table columns and uploading a CSV file with dummy data. (I was really hungry when I created the dummy data, hence the food-based names.)

EMPLOYEE_ID FIRST_NAME LAST_NAME DEPARTMENT ANNUAL_SALARY_AMOUNT_USD IS_MANAGER STARTED_WORK_ON
1 Binh Hoang Data 100000 FALSE 2021-01-11
2 Lewis Coffee Finance 90000 FALSE 2021-03-21
3 Bagel Soda Finance 200000 TRUE 2021-04-01
4 Sun Moon Marketing 300000 TRUE 2020-02-22
5 Water Bottle Marketing 150000 TRUE 2019-05-01
6 Cactus Turnip Sales 90000 FALSE 2020-12-01
7 Book Shelf Sales 160000 TRUE 2019-06-21
8 Chicken Congee Sales 80000 FALSE 2021-03-26
9 Soba Noodles Data 180000 TRUE 2020-07-15
10 Dim Sum Marketing 90000 FALSE 2021-05-01

If I’m working in PostgreSQL and want to query the employees who have the highest salary in each department with all columns included (hence can’t use GROUP BY), I’d have to use a combination of a CTE and a window function like DENSE_RANK() (RANK() would work as well, but not ROW_NUMBER() since ROW_NUMBER() wouldn’t recognize ties). This is what the query below does. It works, but it’s definitely verbose for what I’m trying to accomplish.

Query 1 (PostgreSQL)

WITH rank_employee_in_each_department_by_salary AS (
    SELECT 
        *,
        DENSE_RANK() OVER (PARTITION BY department ORDER BY annual_salary_amount_usd DESC) AS employee_rank
    FROM 
        employee    
)
SELECT
    employee_id,
    first_name,
    last_name,
    department,
    annual_salary_amount_usd,
    is_manager,
    started_work_on
FROM 
    rank_employee_in_each_department_by_salary
WHERE 
    employee_rank = 1
ORDER BY 
    department;

Enter Snowflake’s QUALIFY clause.

From Snowflake’s documentation, QUALIFY “does with window functions what HAVING does with aggregate functions and GROUP BY clauses.” Using QUALIFY, I don’t need to use a CTE and materialize an additional employee_rank column using a window function. I simply add a QUALIFY clause to filter my query using a window function. This is a much more elegant and performant query than what you could accomplish in most other SQL dialects.

Query 2 (Snowflake SQL)

SELECT 
    *
FROM 
    employee
QUALIFY 
    DENSE_RANK() OVER (PARTITION BY department ORDER BY annual_salary_amount_usd DESC) = 1
ORDER BY 
    department;

Query 2 Results

EMPLOYEE_ID FIRST_NAME LAST_NAME DEPARTMENT ANNUAL_SALARY_AMOUNT_USD IS_MANAGER STARTED_WORK_ON
9 Soba Noodles Data 180000 TRUE 2020-07-15
3 Bagel Soda Finance 200000 TRUE 2021-04-01
4 Sun Moon Marketing 300000 TRUE 2020-02-22
7 Book Shelf Sales 160000 TRUE 2019-06-21

I don’t believe Snowflake invented the QUALIFY clause since it’s used in other SQL dialects such as Teradata, but I’m just glad it exists. If you know who was the original inventor of the QUALIFY clause, please comment below. Would love to know!

UNPIVOT

Let’s say Human Resources wants to know the below bespoke metrics. Why? Who knows.

  1. Ratio of combined salary of Finance / total combined salary of Finance, Marketing, Sales
  2. Ratio of combined salary of Data / total combined salary of Data, Finance, Sales
  3. Ratio of combined salary of Marketing / total combined salary of Marketing, Data, Finance
  4. Ratio of combined salary of Sales / total combined salary of Sales, Data, Marketing

What you do know is that you’ve been assigned to work on this request and even though the data request is strange, you have been assured that that’s the data they actually want.

You know that there’s an Employee table that you can query to get this data, but you can’t do a simple GROUP BY or window function since the groups of data vary across each denominator.

You think about it some more and realize you can use a bunch of CASE WHENs, and you write the below query.

Query 3 (Snowflake SQL)

SELECT
    ROUND(SUM(CASE WHEN department = 'Finance' THEN annual_salary_amount_usd END) / SUM(CASE WHEN department IN ('Finance', 'Marketing', 'Sales') THEN annual_salary_amount_usd END) * 100, 2) AS finance_ratio,
    ROUND(SUM(CASE WHEN department = 'Data' THEN annual_salary_amount_usd END) / SUM(CASE WHEN department IN ('Data', 'Finance', 'Sales') THEN annual_salary_amount_usd END) * 100, 2) AS data_ratio,
    ROUND(SUM(CASE WHEN department = 'Marketing' THEN annual_salary_amount_usd END) / SUM(CASE WHEN department IN ('Marketing', 'Data', 'Finance') THEN annual_salary_amount_usd END) * 100, 2) AS marketing_ratio,
    ROUND(SUM(CASE WHEN department = 'Sales' THEN annual_salary_amount_usd END) / SUM(CASE WHEN department IN ('Sales', 'Data', 'Marketing') THEN annual_salary_amount_usd END) * 100, 2) AS sales_ratio
FROM 
    employee;   

Query 3 Results

FINANCE_RATIO DATA_RATIO MARKETING_RATIO SALES_RATIO
25.00 31.11 48.65 28.70

Great! But what if you want to stack the data vertically (i.e. normalize the data) rather than horizontally. This becomes especially important if you had to calculate 10 of these bespoke metrics instead of 4: then you’d be scrolling from left to right to see all your data, which is not ideal.

Enter Snowflake’s UNPIVOT operator!

From Snowflake’s documentation: “Rotates a table by transforming columns into rows. UNPIVOT is a relational operator that accepts two columns (from a table or subquery), along with a list of columns, and generates a row for each column specified in the list. In a query, it is specified in the FROM clause after the table name or subquery.”

UNPIVOT is super useful for displaying query results vertically, which in most situations is much easier for you and others to read. And all it takes is wrapping your original query in a CTE and a few additional lines of code.

Query 4 (Snowflake SQL)

WITH bespoke_metrics AS (
    SELECT
        ROUND(SUM(CASE WHEN department = 'Finance' THEN annual_salary_amount_usd END) / SUM(CASE WHEN department IN ('Finance', 'Marketing', 'Sales') THEN annual_salary_amount_usd END) * 100, 2) AS finance_ratio,
        ROUND(SUM(CASE WHEN department = 'Data' THEN annual_salary_amount_usd END) / SUM(CASE WHEN department IN ('Data', 'Finance', 'Sales') THEN annual_salary_amount_usd END) * 100, 2) AS data_ratio,
        ROUND(SUM(CASE WHEN department = 'Marketing' THEN annual_salary_amount_usd END) / SUM(CASE WHEN department IN ('Marketing', 'Data', 'Finance') THEN annual_salary_amount_usd END) * 100, 2) AS marketing_ratio,
        ROUND(SUM(CASE WHEN department = 'Sales' THEN annual_salary_amount_usd END) / SUM(CASE WHEN department IN ('Sales', 'Data', 'Marketing') THEN annual_salary_amount_usd END) * 100, 2) AS sales_ratio
    FROM 
        employee    
)
SELECT 
    *
FROM 
    bespoke_metrics
    UNPIVOT (value 
             FOR metric IN (finance_ratio, data_ratio, marketing_ratio, sales_ratio));

Query 4 Results

METRICS VALUE
FINANCE_RATIO 25.00
DATA_RATIO 31.11
MARKETING_RATIO 48.65
SALES_RATIO 28.70

See how much easier this is to read?

PostgreSQL does not have an UNPIVOT operator, but you can mimic the same functionality as UNPIVOT using a LATERAL JOIN as demonstrated in this blog post. However, LATERAL JOINs scare less seasoned SQL coders and are slightly less elegant, so UNPIVOT wins my heart. :heart:

Snowflake didn’t invent UNPIVOT since UNPIVOT also exists in Oracle and SQL Server—I’m just glad the Snowflake people ported it over.

Conclusion

I’m a writer. I spent four years at Yale honing my writing skills, and I’ve also been an editor for a popular Tumblr. Coding, in my opinion, is very similar to writing. You can write messy code that no one understands, and you can write messy essays. On the flip side, you can write clear, concise, idiomatic code, and you can write clear, concise, expressive essays.

I hope this post helps you write more idiomatic and concise Snowflake SQL—it’ll save you and whoever reads your code from the feeling of dread people get when they look at bad code (or bad writing).

Young girl practicing coding on iPad Photo by Kelly Sikkema on Unsplash

Okay, for my next blog post, I plan on writing about LATERAL JOINs since I mentioned they tend to scare people. Let’s make them less scary, for my sake’s and yours. :sweat_smile:

Leave a comment