Welcome to the Powered by Tasty Bytes - Zero to Snowflake Quickstart focused on Semi-Structured Data Processing!
Within this Quickstart, we will learn about processing Semi-Structured Data in Snowflake by diving into the VARIANT Data Type, Semi-Structured Data Processing combining Dot Notation and Lateral Flattening as well as View Creation and Snowsight Charting.
For more detail on Semi-Structured Data in Snowflake please visit the Semi-Structured Data Overview documentation
Within this Quickstart we will follow a Tasty Bytes themed story via a Snowsight SQL Worksheet with this page serving as a side by side guide complete with additional commentary, images and documentation links.
This section will walk you through logging into Snowflake, Creating a New Worksheet, Renaming the Worksheet, Copying SQL from GitHub, and Pasting the SQL we will be leveraging within this Quickstart.
As a Tasty Bytes Data Engineer, we have been tasked with profiling our Menu data that includes a Semi-Structured Data column. From this menu table we need to produce an Analytics layer View that exposes Dietary and Ingredient data to our end users.
To begin, let's execute the first three queries together which will:
USE ROLE tasty_data_engineer; USE WAREHOUSE tasty_de_wh; SELECT TOP 10 m.truck_brand_name, m.menu_type, m.menu_item_name, m.menu_item_health_metrics_obj FROM frostbyte_tasty_bytes.raw_pos.menu m;
Within our output, we can see that the
menu_item_health_metrics_obj must be the Semi-Structured Data we were told contained the metrics we need to provide downstream.
By clicking into one of the cells in this column, we will see Snowsight automatically expand the stats pane to give us a better view of what is inside.
To dive deeper into how this column in defined in Snowflake, please run the next query where we leverage SHOW COLUMNS to explore the Data Types present in our
SHOW COLUMNS IN frostbyte_tasty_bytes.raw_pos.menu;
Looking at our result set, we see the
menu_item_health_metrics_obj is a VARIANT Data Type.
menu_item_health_metrics_obj column, we saw that
menu_item_id was included alongside the more nested Ingredients and Dietary Restriction data we need to access.
Please execute the next query where we begin to leverage Dot Notation to traverse our Semi-Structured data.
SELECT m.menu_item_health_metrics_obj:menu_item_id AS menu_item_id, m.menu_item_health_metrics_obj:menu_item_health_metrics AS menu_item_health_metrics FROM frostbyte_tasty_bytes.raw_pos.menu m;
Using Dot Notation we were able to successfully extract
menu_item_id in full, but look to still be left with additional semi-structured objects in the
menu_item_health_metrics column output.
Once again let's click into one of the cells within this column to take a further look.
We are making progress! Let's see how we can further process
menu_item_health_metrics in the next section by using additional Snowflake functions.
Having seen how we can easily query Semi-Structured Data as it exists in a Variant column using Dot Notation, our Tasty Data Engineer is well on the way to providing their internal stakeholders with the data they have requested.
Within this section, we will conduct additional Semi-Structured Data processing to meet requirements.
To further extract the data our downstream users are asking for from our
menu_item_health_metrics_obj column. Please execute the next query which utilizes the Dot Notation functionality we just explored alongside Snowflakes FLATTEN function and LATERAL JOIN capability to provide us with the first
ingredient Array we have been asked for.
SELECT m.menu_item_name, obj.value:"ingredients"::VARIANT AS ingredients FROM frostbyte_tasty_bytes.raw_pos.menu m, LATERAL FLATTEN (input => m.menu_item_health_metrics_obj:menu_item_health_metrics) obj;
Before we extract the requested Dietary data, please kick off the next query which highlights a Snowflake Array Functions, which will explore the
ingredients column for any
menu_item_name that includes Lettuce by leveraging ARRAY_CONTAINS.
SELECT m.menu_item_name, obj.value:"ingredients"::VARIANT AS ingredients FROM frostbyte_tasty_bytes.raw_pos.menu m, LATERAL FLATTEN (input => m.menu_item_health_metrics_obj:menu_item_health_metrics) obj WHERE ARRAY_CONTAINS('Lettuce'::VARIANT, obj.value:"ingredients"::VARIANT);
Based on our output, we see that quite a few of our Menu Items include Lettuce. This sort of analysis would be extremely valuable for our Supply Chain Procurement Managers in the event of any food related recalls in the cities and countries we support.
Having just seen the sort of value we can provide to our organization already, let's now execute the last query of this section.
This query will use the Dot Notation, and our combined Lateral Join plus Flatten Table Function to provide the result set we were initially tasked to generate.
SELECT m.menu_item_health_metrics_obj:menu_item_id::integer AS menu_item_id, m.menu_item_name, obj.value:"ingredients"::VARIANT AS ingredients, obj.value:"is_healthy_flag"::VARCHAR(1) AS is_healthy_flag, obj.value:"is_gluten_free_flag"::VARCHAR(1) AS is_gluten_free_flag, obj.value:"is_dairy_free_flag"::VARCHAR(1) AS is_dairy_free_flag, obj.value:"is_nut_free_flag"::VARCHAR(1) AS is_nut_free_flag FROM frostbyte_tasty_bytes.raw_pos.menu m, LATERAL FLATTEN (input => m.menu_item_health_metrics_obj:menu_item_health_metrics) obj;
Great! That output looks to meet the exact requirements our stakeholders have requested. In the next section we will explore how we can promote this to our Analytics layer where they have the ability to access it.
In the last section, we constructed a query that provides the exact output our end users require using a suite of Snowflake Semi-Structured Data functionality along the way. Next we will follow the process of promoting this query against our Raw layer through Harmonized and eventually to Analytics where our end users are privileged to read from.
Note: For those more familiar with Bronze, Silver, and Gold Data Models we can think of Raw as Bronze, Harmonized as Silver and Analytics as Gold.
Taking the exact query we ended our last section with please now execute the next query which contains this SQL plus all additional already structured
menu Table Columns.
Within this query we use CREATE VIEW in our Harmonized schema to encapsulate the Semi-Structured processing logic and additional Columns as a Table.
CREATE OR REPLACE VIEW frostbyte_tasty_bytes.harmonized.menu_v AS SELECT m.menu_id, m.menu_type_id, m.menu_type, m.truck_brand_name, m.menu_item_health_metrics_obj:menu_item_id::integer AS menu_item_id, m.menu_item_name, m.item_category, m.item_subcategory, m.cost_of_goods_usd, m.sale_price_usd, obj.value:"ingredients"::VARIANT AS ingredients, obj.value:"is_healthy_flag"::VARCHAR(1) AS is_healthy_flag, obj.value:"is_gluten_free_flag"::VARCHAR(1) AS is_gluten_free_flag, obj.value:"is_dairy_free_flag"::VARCHAR(1) AS is_dairy_free_flag, obj.value:"is_nut_free_flag"::VARCHAR(1) AS is_nut_free_flag FROM frostbyte_tasty_bytes.raw_pos.menu m, LATERAL FLATTEN (input => m.menu_item_health_metrics_obj:menu_item_health_metrics) obj;
harmonized.menu_v view created, we could now query it directly without having leverage the more complex SQL we used within. However, since the Analytics layer is where our stakeholders access data, let's take the next step to land this there.
harmonized.menu_v please now execute the next query which will create our
analytics.menu_v that will begin delivering value downstream immediately.
Within this query we see a few new functions we have not yet covered. First off we are adding a COMMENT which can be seen within SHOW VIEWS commands or within the Snowsight interface to document what a user may see when querying this view.
We are also utilizing SELECT * EXCLUDE and RENAME parameters which can make the lives of SQL developers much easier by reducing query or View definition complexity.
With our Menu View available in our Analytics layer, let's execute a few queries against it that we will provide to our end users showcasing how Snowflake powers a relational query experience over Semi-Structured data without having to make additional copies or conduct any complex processing.
We saw the
ingredients Column as a result of a query in our previous section, but this is now available without any Dot Notation requirements in our
With this, please now execute the next query that leverages two additional Snowflake Array Functions ARRAY_INTERSECTION and ARRAYS_OVERLAP to see which of our non-beverage Menu Items overlap for each of our Food Truck Brands Menus.
SELECT m1.menu_type, m1.menu_item_name, m2.menu_type AS overlap_menu_type, m2.menu_item_name AS overlap_menu_item_name, ARRAY_INTERSECTION(m1.ingredients, m2.ingredients) AS overlapping_ingredients FROM frostbyte_tasty_bytes.analytics.menu_v m1 JOIN frostbyte_tasty_bytes.analytics.menu_v m2 ON m1.menu_item_id <> m2.menu_item_id -- avoid joining the same menu item to itself AND m1.menu_type <> m2.menu_type WHERE 1=1 AND m1.item_category <> 'Beverage' -- remove beverages AND m2.item_category <> 'Beverage' -- remove beverages AND ARRAYS_OVERLAP(m1.ingredients, m2.ingredients) -- evaluates to TRUE if one ingredient is in both arrays ORDER BY m1.menu_type;
Once again, using exactly the View we have built this sort of query could provide real world business value to our Food Truck operators as they work will Procurement to predict and order their ingredient needs each week.
Now let's take a look at assisting our Tasty Bytes Executives make data driven Menu decisions by providing high-level metrics surrounding the dietary restrictions we are currently addressing across our brands.
Please execute the next query which utilizes COUNT, SUM and conditional CASE statements to aggregate the required metrics from our
SELECT COUNT(DISTINCT menu_item_id) AS total_menu_items, SUM(CASE WHEN is_healthy_flag = 'Y' THEN 1 ELSE 0 END) AS healthy_item_count, SUM(CASE WHEN is_gluten_free_flag = 'Y' THEN 1 ELSE 0 END) AS gluten_free_item_count, SUM(CASE WHEN is_dairy_free_flag = 'Y' THEN 1 ELSE 0 END) AS dairy_free_item_count, SUM(CASE WHEN is_nut_free_flag = 'Y' THEN 1 ELSE 0 END) AS nut_free_item_count FROM frostbyte_tasty_bytes.analytics.menu_v m;
With the output we just recieved we have successfully went from a Raw table containing Semi-Structured Data to a single, aggregate row that can easily be accessed by anyone in our organization to empower Tasty Bytes to be more data driven.
As some of our Tasty Bytes Executives prefer visual representations of data, let's now look at how easy it is to turn tabular results into easy to digest visual charts within Snowsight.
Please execute the next query now which adds filters on three of our Truck Brands Names to the SQL we ran previously.
SELECT m.brand_name, SUM(CASE WHEN is_gluten_free_flag = 'Y' THEN 1 ELSE 0 END) AS gluten_free_item_count, SUM(CASE WHEN is_dairy_free_flag = 'Y' THEN 1 ELSE 0 END) AS dairy_free_item_count, SUM(CASE WHEN is_nut_free_flag = 'Y' THEN 1 ELSE 0 END) AS nut_free_item_count FROM frostbyte_tasty_bytes.analytics.menu_v m WHERE m.brand_name IN ('Plant Palace', 'Peking Truck','Revenge of the Curds') GROUP BY m.brand_name;
By default, Snowsight returns our query results in tabular form. However one powerful Snowsight feature we have not covered yet is Using Charts
Please now follow the arrows in the screenshot below to create your own Bar Graph showing how these different Food Truck Brands compare when it comes to items on their menu that address certain dietary restrictions.
To finish off this Quickstart, we must point out how easy a Tasty Bytes Executive could do this sort of analysis on their own without ever needing to know about the Semi-Structured Data Processing we have encapsulated in the Views we built. With all of this we can rest assured that we are assisting in driving the democratization of data in our Tasty Bytes Organization.
Fantastic work! You have successfully completed the Tasty Bytes - Zero to Snowflake - Semi-Structured Data Quickstart.
By doing so you have now:
If you would like to re-run this Quickstart please leverage the Reset scripts in the bottom of your associated Worksheet.
To continue your journey in the Snowflake Data Cloud, please now visit the link below to see all other Powered by Taste Bytes - Quickstarts available to you.