Overview

Welcome to the Powered by Tasty Bytes - Zero to Snowflake Quickstart focused on Semi-Structured Data Processing!

Within this Quickstart, we will learn about processing Semi-Structured Data in Snowflake by diving into the VARIANT Data Type, Semi-Structured Data Processing combining Dot Notation and Lateral Flattening as well as View Creation and Snowsight Charting.

For more detail on Semi-Structured Data in Snowflake please visit the Semi-Structured Data Overview documentation

Prerequisites

What You Will Learn

What You Will Build

Overview

Within this Quickstart we will follow a Tasty Bytes themed story via a Snowsight SQL Worksheet with this page serving as a side by side guide complete with additional commentary, images and documentation links.

This section will walk you through logging into Snowflake, Creating a New Worksheet, Renaming the Worksheet, Copying SQL from GitHub, and Pasting the SQL we will be leveraging within this Quickstart.

Step 1 - Accessing Snowflake via URL

Step 2 - Logging into Snowflake

Step 3 - Navigating to Worksheets

Step 4 - Creating a Worksheet

Step 5 - Renaming a Worksheet

Step 6 - Accessing Quickstart SQL in GitHub

Step 7 - Copying Setup SQL from GitHub

Step 8 - Pasting Setup SQL from GitHub into your Snowflake Worksheet

Step 9 - Click Next –>

Overview

As a Tasty Bytes Data Engineer, we have been tasked with profiling our Menu data that includes a Semi-Structured Data column. From this menu table we need to produce an Analytics layer View that exposes Dietary and Ingredient data to our end users.

Step 1 - Setting our Context and Querying our Table

To begin, let's execute the first three queries together which will:

USE ROLE tb_data_engineer;
USE WAREHOUSE tb_de_wh;
USE DATABASE tb_101;

SELECT TOP 10
    truck_brand_name,
    menu_type,
    menu_item_name,
    menu_item_health_metrics_obj
FROM raw_pos.menu;

Within our output, we can see that the menu_item_health_metrics_obj must be the Semi-Structured Data we were told contained the metrics we need to provide downstream.

Step 2 - Exploring our Semi-Structured Column

To dive deeper into how this column in defined in Snowflake, please run the next query where we leverage SHOW COLUMNS to explore the Data Types present in our menu table.

SHOW COLUMNS IN raw_pos.menu;

Looking at our result set, we see the menu_item_health_metrics_obj is a VARIANT Data Type.

Step 3 - Click Next –>

Overview

Within our menu_item_health_metrics_obj column, we saw that menu_item_id was included alongside the more nested Ingredients and Dietary Restriction data we need to access.

Step 1 - Extract First Level Elements

Please execute the next query where we begin to leverage Dot Notation to traverse our Semi-Structured data.

SELECT
    menu_item_health_metrics_obj:menu_item_id AS menu_item_id,
    menu_item_health_metrics_obj:menu_item_health_metrics AS menu_item_health_metrics
FROM raw_pos.menu;

Using Dot Notation we were able to successfully extract menu_item_id in full, but look to still be left with additional semi-structured objects in the menu_item_health_metrics column output.

Step 2 - Flattening with Dot Notation

To convert Semi-Structured data to a relational representation we can use FLATTEN function and LATERAL JOIN with either Dot or Bracket Notation.

Let's first use Dot Notation to extract our Ingredients into an Array Column by executing the next query.

SELECT
    m.menu_item_name,
    m.menu_item_health_metrics_obj:menu_item_id AS menu_item_id,
    obj.value:"ingredients"::ARRAY AS ingredients
FROM raw_pos.menu m, 
    LATERAL FLATTEN (input => m.menu_item_health_metrics_obj:menu_item_health_metrics) obj
ORDER BY menu_item_id;

Step 3 - Flattening with Bracket Notation

Let's now use Bracket Notation to extract our Ingredients into an Array Column by executing the next query.

SELECT
    m.menu_item_name,
    m.menu_item_health_metrics_obj['menu_item_id'] AS menu_item_id,
    obj.value['ingredients']::ARRAY AS ingredients
FROM raw_pos.menu m,
    LATERAL FLATTEN (input => m.menu_item_health_metrics_obj:menu_item_health_metrics) obj
ORDER BY menu_item_id;

Step 4 - Extracting Dietary Columns with Dot Notation

Let's first extract the remaining Dietary Columns using both Dot Notation alongside the Ingredients Array.

SELECT
    m.menu_item_health_metrics_obj:menu_item_id AS menu_item_id,
    m.menu_item_name,
    obj.value:"ingredients"::VARIANT AS ingredients,
    obj.value:"is_healthy_flag"::VARCHAR(1) AS is_healthy_flag,
    obj.value:"is_gluten_free_flag"::VARCHAR(1) AS is_gluten_free_flag,
    obj.value:"is_dairy_free_flag"::VARCHAR(1) AS is_dairy_free_flag,
    obj.value:"is_nut_free_flag"::VARCHAR(1) AS is_nut_free_flag
FROM raw_pos.menu m,
    LATERAL FLATTEN (input => m.menu_item_health_metrics_obj:menu_item_health_metrics) obj;

Step 5 - Extracting Dietary Columns with Bracket Notation

Let's now extract the remaining Dietary Columns using both Dot Notation alongside the Ingredients Array.

SELECT
    m.menu_item_health_metrics_obj['menu_item_id'] AS menu_item_id,
    m.menu_item_name,
    obj.value['ingredients']::VARIANT AS ingredients,
    obj.value['is_healthy_flag']::VARCHAR(1) AS is_healthy_flag,
    obj.value['is_gluten_free_flag']::VARCHAR(1) AS is_gluten_free_flag,
    obj.value['is_dairy_free_flag']::VARCHAR(1) AS is_dairy_free_flag,
    obj.value['is_nut_free_flag']::VARCHAR(1) AS is_nut_free_flag
FROM raw_pos.menu m,
    LATERAL FLATTEN (input => m.menu_item_health_metrics_obj:menu_item_health_metrics) obj;

Step 6 - Click Next –>

Overview

With all of the required data, extracted, flattened and available in tabular form, we will now work to provide access to our Business Users.

Within this step, we will promote a full output of the Menu table with the flattened columns to Views in our Harmonized and Analytics layers.

If you are more familiar with Medallion Architectures, we can think of the Harmonized layer as Silver and the Analytics layer as Gold.

Step 1 - Harmonized/Silver Layer View

To begin, let's add columns to our previous Dot Notation query and leverage it within a new Menu View in our Harmonized or Silver layer. This query will provide a View MENU_V successfully created result.

CREATE OR REPLACE VIEW harmonized.menu_v
COMMENT = 'Menu level metrics including Truck Brands and Menu Item details including Cost, Price, Ingredients and Dietary Restrictions'
    AS
SELECT
    m.menu_id,
    m.menu_type_id,
    m.menu_type,
    m.truck_brand_name,
    m.menu_item_health_metrics_obj:menu_item_id::integer AS menu_item_id,
    m.menu_item_name,
    m.item_category,
    m.item_subcategory,
    m.cost_of_goods_usd,
    m.sale_price_usd,
    obj.value:"ingredients"::VARIANT AS ingredients,
    obj.value:"is_healthy_flag"::VARCHAR(1) AS is_healthy_flag,
    obj.value:"is_gluten_free_flag"::VARCHAR(1) AS is_gluten_free_flag,
    obj.value:"is_dairy_free_flag"::VARCHAR(1) AS is_dairy_free_flag,
    obj.value:"is_nut_free_flag"::VARCHAR(1) AS is_nut_free_flag
FROM raw_pos.menu m,
    LATERAL FLATTEN (input => m.menu_item_health_metrics_obj:menu_item_health_metrics) obj;

Step 2 - Analytics/Gold Layer View

With the Harmonized View containing the flattening logic in place, let's now promote the data to the Analytics Schema where our various Business Users will be able to access it by running the next query. This query will provide a View MENU_V successfully created result.

CREATE OR REPLACE VIEW analytics.menu_v
COMMENT = 'Menu level metrics including Truck Brands and Menu Item details including Cost, Price, Ingredients and Dietary Restrictions'
    AS
SELECT
    *
    EXCLUDE (menu_type_id) --exclude MENU_TYPE_ID
    RENAME (truck_brand_name AS brand_name) -- rename TRUCK_BRAND_NAME to BRAND_NAME
FROM harmonized.menu_v;

Step 3 - Querying our Downstream View

Before moving on, let's use our view to take a look at the results for our Better Off Bread brand.

SELECT 
    brand_name,
    menu_item_name,
    sale_price_usd,
    ingredients,
    is_healthy_flag,
    is_gluten_free_flag,
    is_dairy_free_flag,
    is_nut_free_flag
FROM analytics.menu_v
WHERE brand_name = 'Better Off Bread';

Step 4 - Grant Access to our Developers

To conclude this step, lets now grant our Developer the ability to run SELECT statements against this view. This query will provide a Statement executed successfully. result set.

GRANT SELECT ON analytics.menu_v TO ROLE tb_dev;

Step 5 - Click Next —>

Overview:

With our Menu View available in our Analytics layer, let's now jump into the life of a Tasty Bytes Developer. Within this step, we will address questions from the the Tasty Bytes Leadership Team related to our Food Truck Menu's.

Along the way we will see how Snowflake can provide a relational query experience over Semi-Structured data without having to make additional copies or conduct any complex data transformations.

Step 1 - Exploring an Array Function

Before we extract the requested Dietary data, please kick off the next query which highlights a Snowflake Array Functions, which will explore the ingredients column for any menu_item_name that includes Lettuce by leveraging ARRAY_CONTAINS.

USE ROLE tb_dev;
USE WAREHOUSE tb_dev_wh;

SELECT
    m.menu_item_id,
    m.menu_item_name,
    m.ingredients
FROM analytics.menu_v m
WHERE ARRAY_CONTAINS('Lettuce'::VARIANT, m.ingredients);

Based on our output, we see that quite a few of our Menu Items include Lettuce. This sort of analysis would be extremely valuable for our Supply Chain Procurement Managers in the event of any food related recalls in the cities and countries we support.

Step 2 - Array Overlap Analysis

Adding on additional Array functions: ARRAY_INTERSECTION, ARRAY_OVERLAP and ARRAY_SIZE, let's now find which Menu Items across Menu Types contain overlapping Ingredients and are those Ingredients?

SELECT
    m1.brand_name,
    m1.menu_item_name,
    m2.brand_name AS overlap_brand,
    m2.menu_item_name AS overlap_menu_item_name,
    ARRAY_INTERSECTION(m1.ingredients, m2.ingredients) AS overlapping_ingredients
FROM analytics.menu_v m1
JOIN analytics.menu_v m2
    ON m1.menu_item_id <> m2.menu_item_id -- avoid joining the same menu item to itself
    AND m1.menu_type <> m2.menu_type
WHERE 1=1
    AND m1.item_category  <> 'Beverage' -- remove beverages
    AND ARRAYS_OVERLAP(m1.ingredients, m2.ingredients) -- return only those that overlap
ORDER BY ARRAY_SIZE(overlapping_ingredients) DESC; -- order by largest number of overlapping ingredients

Step 3 - Providing Metrics to Executives

Now let's take a look at assisting our Tasty Bytes Executives make data driven Menu decisions by providing high-level metrics surrounding the dietary restrictions we are currently addressing across our brands.

Please execute the next query which utilizes COUNT, SUM and conditional CASE statements to aggregate the required metrics from our analytics.menu_v.

SELECT
    COUNT(DISTINCT menu_item_id) AS total_menu_items,
    SUM(CASE WHEN is_gluten_free_flag = 'Y' THEN 1 ELSE 0 END) AS gluten_free_item_count,
    SUM(CASE WHEN is_dairy_free_flag = 'Y' THEN 1 ELSE 0 END) AS dairy_free_item_count,
    SUM(CASE WHEN is_nut_free_flag = 'Y' THEN 1 ELSE 0 END) AS nut_free_item_count
FROM analytics.menu_v m;

With the output we just recieved we have successfully went from a Raw table containing Semi-Structured Data to a single, aggregate row that can easily be accessed by anyone in our organization to empower Tasty Bytes to be more data driven.

Step 4 - Turning Results to Charts

As some of our Tasty Bytes Executives prefer visual representations of data, let's now look at how easy it is to turn tabular results into easy to digest visual charts within Snowsight.

Please execute the next query now which adds filters on three of our Truck Brands Names to the SQL we ran previously.

SELECT
    m.brand_name,
    SUM(CASE WHEN is_gluten_free_flag = 'Y' THEN 1 ELSE 0 END) AS gluten_free_item_count,
    SUM(CASE WHEN is_dairy_free_flag = 'Y' THEN 1 ELSE 0 END) AS dairy_free_item_count,
    SUM(CASE WHEN is_nut_free_flag = 'Y' THEN 1 ELSE 0 END) AS nut_free_item_count
FROM analytics.menu_v m
WHERE m.brand_name IN ('Plant Palace', 'Peking Truck','Revenge of the Curds')
GROUP BY m.brand_name;

By default, Snowsight returns our query results in tabular form. However one powerful Snowsight feature we have not covered yet is Using Charts

Please now click the Chart Button and create a Bar Chart.

To finish off this Quickstart, we must point out how easy a Tasty Bytes Executive could do this sort of analysis on their own without ever needing to know about the Semi-Structured Data Processing we have encapsulated in the Views we built. With all of this we can rest assured that we are assisting in driving the democratization of data in our Tasty Bytes Organization.

Step 5 - Click Next –>

Conclusion

Fantastic work! You have successfully completed the Tasty Bytes - Zero to Snowflake - Semi-Structured Data Quickstart.

By doing so you have now:

If you would like to re-run this Quickstart please leverage the Reset scripts in the bottom of your associated Worksheet.

Next Steps

To continue your journey in the Snowflake AI Data Cloud, please now visit the link below to see all other Powered by Tasty Bytes - Quickstarts available to you.