Overview

Welcome to the Powered by Tasty Bytes - Zero to Snowflake Quickstart focused on Semi-Structured Data Processing!

Within this Quickstart, we will learn about processing Semi-Structured Data in Snowflake by diving into the VARIANT Data Type, Semi-Structured Data Processing combining Dot Notation and Lateral Flattening as well as View Creation and Snowsight Charting.

For more detail on Semi-Structured Data in Snowflake please visit the Semi-Structured Data Overview documentation

Prerequisites

What You Will Learn

What You Will Build

Overview

Within this Quickstart we will follow a Tasty Bytes themed story via a Snowsight SQL Worksheet with this page serving as a side by side guide complete with additional commentary, images and documentation links.

This section will walk you through logging into Snowflake, Creating a New Worksheet, Renaming the Worksheet, Copying SQL from GitHub, and Pasting the SQL we will be leveraging within this Quickstart.

Step 1 - Accessing Snowflake via URL

Step 2 - Logging into Snowflake

Step 3 - Navigating to Worksheets

Step 4 - Creating a Worksheet

Step 5 - Renaming a Worksheet

Step 6 - Accessing Quickstart SQL in GitHub

Step 7 - Copying Setup SQL from GitHub

Step 8 - Pasting Setup SQL from GitHub into your Snowflake Worksheet

Step 9 - Click Next –>

Overview

As a Tasty Bytes Data Engineer, we have been tasked with profiling our Menu data that includes a Semi-Structured Data column. From this menu table we need to produce an Analytics layer View that exposes Dietary and Ingredient data to our end users.

Step 1 - Setting our Context and Querying our Table

To begin, let's execute the first three queries together which will:

USE ROLE tasty_data_engineer;
USE WAREHOUSE tasty_de_wh;

SELECT TOP 10
    m.truck_brand_name,
    m.menu_type,
    m.menu_item_name,
    m.menu_item_health_metrics_obj
FROM frostbyte_tasty_bytes.raw_pos.menu m;

Within our output, we can see that the menu_item_health_metrics_obj must be the Semi-Structured Data we were told contained the metrics we need to provide downstream.

By clicking into one of the cells in this column, we will see Snowsight automatically expand the stats pane to give us a better view of what is inside.

Step 2 - Exploring our Semi-Structured Column

To dive deeper into how this column in defined in Snowflake, please run the next query where we leverage SHOW COLUMNS to explore the Data Types present in our menu table.

SHOW COLUMNS IN frostbyte_tasty_bytes.raw_pos.menu;

Looking at our result set, we see the menu_item_health_metrics_obj is a VARIANT Data Type.

Step 3 - Traversing Semi-Structured Data using Dot Notation

Within our menu_item_health_metrics_obj column, we saw that menu_item_id was included alongside the more nested Ingredients and Dietary Restriction data we need to access.

Please execute the next query where we begin to leverage Dot Notation to traverse our Semi-Structured data.

SELECT 
    m.menu_item_health_metrics_obj:menu_item_id AS menu_item_id,
    m.menu_item_health_metrics_obj:menu_item_health_metrics AS menu_item_health_metrics
FROM frostbyte_tasty_bytes.raw_pos.menu m;

Using Dot Notation we were able to successfully extract menu_item_id in full, but look to still be left with additional semi-structured objects in the menu_item_health_metrics column output.

Once again let's click into one of the cells within this column to take a further look.

We are making progress! Let's see how we can further process menu_item_health_metrics in the next section by using additional Snowflake functions.

Step 4 - Click Next –>

Overview

Having seen how we can easily query Semi-Structured Data as it exists in a Variant column using Dot Notation, our Tasty Data Engineer is well on the way to providing their internal stakeholders with the data they have requested.

Within this section, we will conduct additional Semi-Structured Data processing to meet requirements.

Step 1 - Introduction to Lateral Flatten

To further extract the data our downstream users are asking for from our menu_item_health_metrics_obj column. Please execute the next query which utilizes the Dot Notation functionality we just explored alongside Snowflakes FLATTEN function and LATERAL JOIN capability to provide us with the first ingredient Array we have been asked for.

SELECT 
    m.menu_item_name,
    obj.value:"ingredients"::VARIANT AS ingredients
FROM frostbyte_tasty_bytes.raw_pos.menu m,
    LATERAL FLATTEN (input => m.menu_item_health_metrics_obj:menu_item_health_metrics) obj;

Step 2 - Exploring an Array Function

Before we extract the requested Dietary data, please kick off the next query which highlights a Snowflake Array Functions, which will explore the ingredients column for any menu_item_name that includes Lettuce by leveraging ARRAY_CONTAINS.

SELECT 
    m.menu_item_name,
    obj.value:"ingredients"::VARIANT AS ingredients
FROM frostbyte_tasty_bytes.raw_pos.menu m,
    LATERAL FLATTEN (input => m.menu_item_health_metrics_obj:menu_item_health_metrics) obj
WHERE ARRAY_CONTAINS('Lettuce'::VARIANT, obj.value:"ingredients"::VARIANT);

Based on our output, we see that quite a few of our Menu Items include Lettuce. This sort of analysis would be extremely valuable for our Supply Chain Procurement Managers in the event of any food related recalls in the cities and countries we support.

Step 3 - Structuring Semi-Structured Data at Scale

Having just seen the sort of value we can provide to our organization already, let's now execute the last query of this section.

This query will use the Dot Notation, and our combined Lateral Join plus Flatten Table Function to provide the result set we were initially tasked to generate.

SELECT 
    m.menu_item_health_metrics_obj:menu_item_id::integer AS menu_item_id,
    m.menu_item_name,
    obj.value:"ingredients"::VARIANT AS ingredients,
    obj.value:"is_healthy_flag"::VARCHAR(1) AS is_healthy_flag,
    obj.value:"is_gluten_free_flag"::VARCHAR(1) AS is_gluten_free_flag,
    obj.value:"is_dairy_free_flag"::VARCHAR(1) AS is_dairy_free_flag,
    obj.value:"is_nut_free_flag"::VARCHAR(1) AS is_nut_free_flag
FROM frostbyte_tasty_bytes.raw_pos.menu m,
    LATERAL FLATTEN (input => m.menu_item_health_metrics_obj:menu_item_health_metrics) obj;

Great! That output looks to meet the exact requirements our stakeholders have requested. In the next section we will explore how we can promote this to our Analytics layer where they have the ability to access it.

Step 4 - Click Next –>

Overview

In the last section, we constructed a query that provides the exact output our end users require using a suite of Snowflake Semi-Structured Data functionality along the way. Next we will follow the process of promoting this query against our Raw layer through Harmonized and eventually to Analytics where our end users are privileged to read from.

Note: For those more familiar with Bronze, Silver, and Gold Data Models we can think of Raw as Bronze, Harmonized as Silver and Analytics as Gold.

Step 1 - Creating our Harmonized View Using our Semi-Structured Flattening SQL

Taking the exact query we ended our last section with please now execute the next query which contains this SQL plus all additional already structured menu Table Columns.

Within this query we use CREATE VIEW in our Harmonized schema to encapsulate the Semi-Structured processing logic and additional Columns as a Table.

CREATE OR REPLACE VIEW frostbyte_tasty_bytes.harmonized.menu_v
    AS
SELECT 
    m.menu_id,
    m.menu_type_id,
    m.menu_type,
    m.truck_brand_name,
    m.menu_item_health_metrics_obj:menu_item_id::integer AS menu_item_id,
    m.menu_item_name,
    m.item_category,
    m.item_subcategory,
    m.cost_of_goods_usd,
    m.sale_price_usd,
    obj.value:"ingredients"::VARIANT AS ingredients,
    obj.value:"is_healthy_flag"::VARCHAR(1) AS is_healthy_flag,
    obj.value:"is_gluten_free_flag"::VARCHAR(1) AS is_gluten_free_flag,
    obj.value:"is_dairy_free_flag"::VARCHAR(1) AS is_dairy_free_flag,
    obj.value:"is_nut_free_flag"::VARCHAR(1) AS is_nut_free_flag
FROM frostbyte_tasty_bytes.raw_pos.menu m,
    LATERAL FLATTEN (input => m.menu_item_health_metrics_obj:menu_item_health_metrics) obj;

With the harmonized.menu_v view created, we could now query it directly without having leverage the more complex SQL we used within. However, since the Analytics layer is where our stakeholders access data, let's take the next step to land this there.

Step 2 - Promoting from Harmonized to Analytics with Ease

Referencing our harmonized.menu_v please now execute the next query which will create our analytics.menu_v that will begin delivering value downstream immediately.

Within this query we see a few new functions we have not yet covered. First off we are adding a COMMENT which can be seen within SHOW VIEWS commands or within the Snowsight interface to document what a user may see when querying this view.

We are also utilizing SELECT * EXCLUDE and RENAME parameters which can make the lives of SQL developers much easier by reducing query or View definition complexity.

Step 3 - Click Next –>

Overview

With our Menu View available in our Analytics layer, let's execute a few queries against it that we will provide to our end users showcasing how Snowflake powers a relational query experience over Semi-Structured data without having to make additional copies or conduct any complex processing.

Step 1 - Analyzing Arrays

We saw the ingredients Column as a result of a query in our previous section, but this is now available without any Dot Notation requirements in our analytics.menu_v View.

With this, please now execute the next query that leverages two additional Snowflake Array Functions ARRAY_INTERSECTION and ARRAYS_OVERLAP to see which of our non-beverage Menu Items overlap for each of our Food Truck Brands Menus.

SELECT 
    m1.menu_type,
    m1.menu_item_name,
    m2.menu_type AS overlap_menu_type,
    m2.menu_item_name AS overlap_menu_item_name,
    ARRAY_INTERSECTION(m1.ingredients, m2.ingredients) AS overlapping_ingredients
FROM frostbyte_tasty_bytes.analytics.menu_v m1
JOIN frostbyte_tasty_bytes.analytics.menu_v m2
    ON m1.menu_item_id <> m2.menu_item_id -- avoid joining the same menu item to itself
    AND m1.menu_type <> m2.menu_type 
WHERE 1=1
    AND m1.item_category <> 'Beverage' -- remove beverages
    AND m2.item_category <> 'Beverage' -- remove beverages
    AND ARRAYS_OVERLAP(m1.ingredients, m2.ingredients) -- evaluates to TRUE if one ingredient is in both arrays
ORDER BY m1.menu_type;

Once again, using exactly the View we have built this sort of query could provide real world business value to our Food Truck operators as they work will Procurement to predict and order their ingredient needs each week.

Step 2 - Providing Metrics to Executives

Now let's take a look at assisting our Tasty Bytes Executives make data driven Menu decisions by providing high-level metrics surrounding the dietary restrictions we are currently addressing across our brands.

Please execute the next query which utilizes COUNT, SUM and conditional CASE statements to aggregate the required metrics from our analytics.menu_v.

SELECT
    COUNT(DISTINCT menu_item_id) AS total_menu_items,
    SUM(CASE WHEN is_healthy_flag = 'Y' THEN 1 ELSE 0 END) AS healthy_item_count,
    SUM(CASE WHEN is_gluten_free_flag = 'Y' THEN 1 ELSE 0 END) AS gluten_free_item_count,
    SUM(CASE WHEN is_dairy_free_flag = 'Y' THEN 1 ELSE 0 END) AS dairy_free_item_count,
    SUM(CASE WHEN is_nut_free_flag = 'Y' THEN 1 ELSE 0 END) AS nut_free_item_count
FROM frostbyte_tasty_bytes.analytics.menu_v m;

With the output we just recieved we have successfully went from a Raw table containing Semi-Structured Data to a single, aggregate row that can easily be accessed by anyone in our organization to empower Tasty Bytes to be more data driven.

Step 3 - Turning Results to Charts

As some of our Tasty Bytes Executives prefer visual representations of data, let's now look at how easy it is to turn tabular results into easy to digest visual charts within Snowsight.

Please execute the next query now which adds filters on three of our Truck Brands Names to the SQL we ran previously.

SELECT
    m.brand_name,
    SUM(CASE WHEN is_gluten_free_flag = 'Y' THEN 1 ELSE 0 END) AS gluten_free_item_count,
    SUM(CASE WHEN is_dairy_free_flag = 'Y' THEN 1 ELSE 0 END) AS dairy_free_item_count,
    SUM(CASE WHEN is_nut_free_flag = 'Y' THEN 1 ELSE 0 END) AS nut_free_item_count
FROM frostbyte_tasty_bytes.analytics.menu_v m
WHERE m.brand_name IN  ('Plant Palace', 'Peking Truck','Revenge of the Curds')
GROUP BY m.brand_name;

By default, Snowsight returns our query results in tabular form. However one powerful Snowsight feature we have not covered yet is Using Charts

Please now follow the arrows in the screenshot below to create your own Bar Graph showing how these different Food Truck Brands compare when it comes to items on their menu that address certain dietary restrictions.

To finish off this Quickstart, we must point out how easy a Tasty Bytes Executive could do this sort of analysis on their own without ever needing to know about the Semi-Structured Data Processing we have encapsulated in the Views we built. With all of this we can rest assured that we are assisting in driving the democratization of data in our Tasty Bytes Organization.

Step 4 - Click Next –>

Conclusion

Fantastic work! You have successfully completed the Tasty Bytes - Zero to Snowflake - Semi-Structured Data Quickstart.

By doing so you have now:

If you would like to re-run this Quickstart please leverage the Reset scripts in the bottom of your associated Worksheet.

Next Steps

To continue your journey in the Snowflake Data Cloud, please now visit the link below to see all other Powered by Taste Bytes - Quickstarts available to you.