Welcome to the Zero to Snowflake Quickstart! This guide is a consolidated journey through key areas of the Snowflake AI Data Cloud. You will start with the fundamentals of warehousing and data transformation, build an automated data pipeline, then see how you can experiment with LLMs using the Cortex Playground to compare different models for summarizing text, use AISQL Functions to instantly analyze customer review sentiment with a simple SQL command, and leverage the Snowflake Copilot to accelerate query writing by asking for the SQL you need in plain English, harness Cortex Search for intelligent text discovery, and utilize Cortex Analyst for conversational business intelligence. Finally, you will learn to secure your data with powerful governance controls and enrich your analysis through seamless data collaboration.
Our mission is to provide unique, high-quality food options in a convenient and cost-effective manner, emphasizing the use of fresh ingredients from local vendors. Their vision is to become the largest food truck network in the world with a zero carbon footprint.
In this Quickstart, we will use Snowflake Workspaces to organize, edit, and run all the SQL scripts required for this course. We will create a dedicated SQL worksheet for the setup and each vignette. This will keep our code organized and easy to manage.
Let's walk through how to create your first worksheet, add the necessary setup code, and run it.
First, we need a place to put our setup script.
Now that you have your worksheet, it's time to add the setup SQL and execute it.
The process you just completed for creating a new worksheet is the exact same workflow you will use for every subsequent vignette in this course.
For each new vignette, you will:
Within this Vignette, we will learn about core Snowflake concepts by exploring Virtual Warehouses, using the query results cache, performing basic data transformations, leveraging data recovery with Time Travel, and monitoring our account with Resource Monitors and Budgets.
Copy and paste the SQL from this file in a new Worksheet to follow along in Snowflake.
Virtual Warehouses are the dynamic, scalable, and cost-effective computing power that lets you perform analysis on your Snowflake data. Their purpose is to handle all your data processing needs without you having to worry about the underlying technical details.
First, lets set our session context. To run the queries, highlight the three queries at the top of your worksheet and click the "► Run" button.
ALTER SESSION SET query_tag = '{"origin":"sf_sit-is","name":"tb_101_v2","version":{"major":1, "minor":1},"attributes":{"is_quickstart":0, "source":"tastybytes", "vignette": "getting_started_with_snowflake"}}';
USE DATABASE tb_101;
USE ROLE accountadmin;
Let's create our first warehouse! This command creates a new X-Small warehouse that will initially be suspended.
CREATE OR REPLACE WAREHOUSE my_wh
COMMENT = 'My TastyBytes warehouse'
WAREHOUSE_TYPE = 'standard'
WAREHOUSE_SIZE = 'xsmall'
MIN_CLUSTER_COUNT = 1
MAX_CLUSTER_COUNT = 2
SCALING_POLICY = 'standard'
AUTO_SUSPEND = 60
INITIALLY_SUSPENDED = true,
AUTO_RESUME = false;
Virtual Warehouses: A virtual warehouse, often referred to simply as a "warehouse", is a cluster of compute resources in Snowflake. Warehouses are required for queries, DML operations, and data loading. For more information, see the Warehouse Overview.
Now that we have a warehouse, we must set it as the active warehouse for our session. Execute the next statement.
USE WAREHOUSE my_wh;
If you try to run the query below, it will fail, because the warehouse is suspended and does not have AUTO_RESUME
enabled.
SELECT * FROM raw_pos.truck_details;
Let's resume it and set it to auto-resume in the future.
ALTER WAREHOUSE my_wh RESUME;
ALTER WAREHOUSE my_wh SET AUTO_RESUME = TRUE;
Now, try the query again. It should execute successfully.
SELECT * FROM raw_pos.truck_details;
Warehouses in Snowflake are designed for elasticity. We can scale our warehouse up on the fly to handle a more intensive workload. Let's scale our warehouse to an X-Large.
ALTER WAREHOUSE my_wh SET warehouse_size = 'XLarge';
With our larger warehouse, let's run a query to calculate total sales per truck brand.
SELECT
o.truck_brand_name,
COUNT(DISTINCT o.order_id) AS order_count,
SUM(o.price) AS total_sales
FROM analytics.orders_v o
GROUP BY o.truck_brand_name
ORDER BY total_sales DESC;
This is a great place to demonstrate another powerful feature in Snowflake: the Query Result Cache. When you first ran the ‘sales per truck' query, it likely took several seconds. If you run the exact same query again, the result will be nearly instantaneous. This is because the query results were cached in Snowflake's Query Result Cache.
Run the same ‘sales per truck' query from the previous step. Note the execution time in the query details pane. It should be much faster.
SELECT
o.truck_brand_name,
COUNT(DISTINCT o.order_id) AS order_count,
SUM(o.price) AS total_sales
FROM analytics.orders_v o
GROUP BY o.truck_brand_name
ORDER BY total_sales DESC;
Query Result Cache: Results are retained for any query for 24 hours. Hitting the result cache requires almost no compute resources, making it ideal for frequently run reports or dashboards. The cache resides in the Cloud Services Layer, making it globally accessible to all users and warehouses in the account. For more information, please visit the documentation on using persisted query results.
We will now be working with smaller datasets, so we can scale our warehouse back down to an X-Small to conserve credits.
ALTER WAREHOUSE my_wh SET warehouse_size = 'XSmall';
In this section, we will see some basic transformation techniques to clean our data and use Zero-Copy Cloning to create development environments. Our goal is to analyze the manufacturers of our food trucks, but this data is currently nested inside a VARIANT
column.
First, let's take a look at the truck_build
column.
SELECT truck_build FROM raw_pos.truck_details;
This table contains data about the make, model and year of each truck, but it is nested, or embedded in a special data type called a VARIANT. We can perform operations on this column to extract these values, but first we'll create a development copy of the table.
Let's create a development copy of our truck_details
table. Snowflake's Zero-Copy Cloning lets us create an identical, fully independent copy of the table instantly, without using additional storage.
CREATE OR REPLACE TABLE raw_pos.truck_dev CLONE raw_pos.truck_details;
Zero-Copy Cloning: Cloning creates a copy of a database object without duplicating the storage. Changes made to either the original or the clone are stored as new micro-partitions, leaving the other object untouched.
Now that we have a safe development table, let's add columns for year
, make
, and model
. Then, we will extract the data from the truck_build
VARIANT
column and populate our new columns.
-- Add new columns
ALTER TABLE raw_pos.truck_dev ADD COLUMN IF NOT EXISTS year NUMBER;
ALTER TABLE raw_pos.truck_dev ADD COLUMN IF NOT EXISTS make VARCHAR(255);
ALTER TABLE raw_pos.truck_dev ADD COLUMN IF NOT EXISTS model VARCHAR(255);
-- Extract and update data
UPDATE raw_pos.truck_dev
SET
year = truck_build:year::NUMBER,
make = truck_build:make::VARCHAR,
model = truck_build:model::VARCHAR;
Let's run a query to see the distribution of truck makes.
SELECT
make,
COUNT(*) AS count
FROM raw_pos.truck_dev
GROUP BY make
ORDER BY make ASC;
Did you notice anything odd about the results from the last query? We can see a data quality issue: ‘Ford' and ‘Ford_' are being treated as separate manufacturers. Let's easily fix this with a simple UPDATE
statement.
UPDATE raw_pos.truck_dev
SET make = 'Ford'
WHERE make = 'Ford_';
Here we're saying we want to set the row's make value to Ford
wherever it is Ford_
. This will ensure none of the Ford makes have the underscore, giving us a unified make count.
Our development table is now cleaned and correctly formatted. We can instantly promote it to be the new production table using the SWAP WITH
command. This atomically swaps the two tables.
ALTER TABLE raw_pos.truck_details SWAP WITH raw_pos.truck_dev;
Now that the swap is complete, we can drop the unnecessary truck_build
column from our new production table. We also need to drop the old production table, which is now named truck_dev
. But for the sake of the next lesson, we will "accidentally" drop the main table.
ALTER TABLE raw_pos.truck_details DROP COLUMN truck_build;
-- Accidentally drop the production table!
DROP TABLE raw_pos.truck_details;
Oh no! We accidentally dropped the production truck_details
table. Luckily, Snowflake's Time Travel feature allows us to recover it instantly. The UNDROP
command restores dropped objects.
If you run a DESCRIBE
command on the table, you will get an error stating it does not exist.
DESCRIBE TABLE raw_pos.truck_details;
Let's restore the truck_details
table to the exact state it was in before being dropped.
UNDROP TABLE raw_pos.truck_details;
Time Travel & UNDROP: Snowflake Time Travel enables accessing historical data at any point within a defined period. This allows for restoring data that has been modified or deleted. UNDROP
is a feature of Time Travel that makes recovery from accidental drops trivial.
Verify the table was successfully restored by selecting from it. Then, we can safely drop the actual development table, truck_dev
.
-- Verify the table was restored
SELECT * from raw_pos.truck_details;
-- Now drop the real truck_dev table
DROP TABLE raw_pos.truck_dev;
Monitoring compute usage is critical. Snowflake provides Resource Monitors to track warehouse credit usage. You can define credit quotas and trigger actions (like notifications or suspension) when thresholds are reached.
Let's create a resource monitor for my_wh
. This monitor has a monthly quota of 100 credits and will send notifications at 75% and suspend the warehouse at 90% and 100% of the quota. First, ensure your role is accountadmin
.
USE ROLE accountadmin;
CREATE OR REPLACE RESOURCE MONITOR my_resource_monitor
WITH CREDIT_QUOTA = 100
FREQUENCY = MONTHLY
START_TIMESTAMP = IMMEDIATELY
TRIGGERS ON 75 PERCENT DO NOTIFY
ON 90 PERCENT DO SUSPEND
ON 100 PERCENT DO SUSPEND_IMMEDIATE;
With the monitor created, apply it to my_wh
.
ALTER WAREHOUSE my_wh
SET RESOURCE_MONITOR = my_resource_monitor;
For more information on what each configuration handles, please visit the documentation for Working with Resource Monitors.
While Resource Monitors track warehouse usage, Budgets provide a more flexible approach to managing all Snowflake costs. Budgets can track spend on any Snowflake object and notify users when a dollar amount threshold is reached.
Let's first create the budget object in SQL.
CREATE OR REPLACE SNOWFLAKE.CORE.BUDGET my_budget()
COMMENT = 'My Tasty Bytes Budget';
Let's take a look at the Budget Page on Snowsight.
Navigate to Admin » Cost Management » Budgets.
Key:
Configuring a budget is done through the Snowsight UI.
ACCOUNTADMIN
. You can change this in the bottom left corner.100
.For a detailed guide on Budgets, please see the Snowflake Budgets Documentation.
Universal Search allows you to easily find any object in your account, plus explore data products in the Marketplace, relevant Snowflake Documentation, and Community Knowledge Base articles.
Let's try it now.
truck
into the search bar.You can also use natural language. For example, search for: Which truck franchise has the most loyal customer base?
Universal search will return relevant tables and views, even highlighting columns that might help answer your question, providing an excellent starting point for analysis.
Within this vignette, we will learn how to build a simple, automated data pipeline in Snowflake. We will start by ingesting raw, semi-structured data from an external stage, and then use the power of Snowflake's Dynamic Tables to transform and enrich that data, creating a pipeline that automatically stays up-to-date as new data arrives.
Copy and paste the SQL from this file in a new Worksheet to follow along in Snowflake.
Our raw menu data currently sits in an Amazon S3 bucket as CSV files. To begin our pipeline, we first need to ingest this data into Snowflake. We will do this by creating a Stage to point to the S3 bucket and then using the COPY
command to load the data into a staging table.
First, let's set our session context to use the correct database, role, and warehouse. Execute the first few queries in your worksheet.
ALTER SESSION SET query_tag = '{"origin":"sf_sit-is","name":"tb_101_v2","version":{"major":1, "minor":1},"attributes":{"is_quickstart":0, "source":"tastybytes", "vignette": "data_pipeline"}}';
USE DATABASE tb_101;
USE ROLE tb_data_engineer;
USE WAREHOUSE tb_de_wh;
A Stage is a Snowflake object that specifies an external location where data files are stored. We'll create a stage that points to our public S3 bucket. Then, we'll create the table that will hold this raw data.
-- Create the menu stage
CREATE OR REPLACE STAGE raw_pos.menu_stage
COMMENT = 'Stage for menu data'
URL = 's3://sfquickstarts/frostbyte_tastybytes/raw_pos/menu/'
FILE_FORMAT = public.csv_ff;
CREATE OR REPLACE TABLE raw_pos.menu_staging
(
menu_id NUMBER(19,0),
menu_type_id NUMBER(38,0),
menu_type VARCHAR(16777216),
truck_brand_name VARCHAR(16777216),
menu_item_id NUMBER(38,0),
menu_item_name VARCHAR(16777216),
item_category VARCHAR(16777216),
item_subcategory VARCHAR(16777216),
cost_of_goods_usd NUMBER(38,4),
sale_price_usd NUMBER(38,4),
menu_item_health_metrics_obj VARIANT
);
With the stage and table in place, let's load the data from the stage into our menu_staging
table using the COPY INTO
command.
COPY INTO raw_pos.menu_staging
FROM @raw_pos.menu_stage;
Snowflake excels at handling semi-structured data like JSON using its native VARIANT
data type. One of the columns we ingested, menu_item_health_metrics_obj
, contains JSON. Let's explore how to query it.
Let's look at the raw JSON. Notice it contains nested objects and arrays.
SELECT menu_item_health_metrics_obj FROM raw_pos.menu_staging;
We can use special syntax to navigate the JSON structure. The colon (:
) accesses keys by name, and square brackets ([]
) access array elements by index. We can also cast results to explicit data types using the CAST
function or the double-colon shorthand (::
).
SELECT
menu_item_name,
CAST(menu_item_health_metrics_obj:menu_item_id AS INTEGER) AS menu_item_id, -- Casting using 'AS'
menu_item_health_metrics_obj:menu_item_health_metrics[0]:ingredients::ARRAY AS ingredients -- Casting using double colon (::) syntax
FROM raw_pos.menu_staging;
The FLATTEN
function is a powerful tool for un-nesting arrays. It produces a new row for each element in an array. Let's use it to create a list of every ingredient for every menu item.
SELECT
i.value::STRING AS ingredient_name,
m.menu_item_health_metrics_obj:menu_item_id::INTEGER AS menu_item_id
FROM
raw_pos.menu_staging m,
LATERAL FLATTEN(INPUT => m.menu_item_health_metrics_obj:menu_item_health_metrics[0]:ingredients::ARRAY) i;
Our franchises are constantly adding new menu items. We need a way to process this new data automatically. For this, we can use Dynamic Tables, a powerful tool designed to simplify data transformation pipelines by declaratively defining the result of a query and letting Snowflake handle the refreshes.
We'll start by creating a dynamic table that extracts all unique ingredients from our staging table. We set a LAG
of ‘1 minute', which tells Snowflake the maximum amount of time this table's data can be behind the source data.
CREATE OR REPLACE DYNAMIC TABLE harmonized.ingredient
LAG = '1 minute'
WAREHOUSE = 'TB_DE_WH'
AS
SELECT
ingredient_name,
menu_ids
FROM (
SELECT DISTINCT
i.value::STRING AS ingredient_name,
ARRAY_AGG(m.menu_item_id) AS menu_ids
FROM
raw_pos.menu_staging m,
LATERAL FLATTEN(INPUT => menu_item_health_metrics_obj:menu_item_health_metrics[0]:ingredients::ARRAY) i
GROUP BY i.value::STRING
);
Let's see the automation in action. One of our trucks has added a Banh Mi sandwich, which contains new ingredients for French Baguette and Pickled Daikon. Let's insert this new menu item into our staging table.
INSERT INTO raw_pos.menu_staging
SELECT
10101, 15, 'Sandwiches', 'Better Off Bread', 157, 'Banh Mi', 'Main', 'Cold Option', 9.0, 12.0,
PARSE_JSON('{"menu_item_health_metrics": [{"ingredients": ["French Baguette","Mayonnaise","Pickled Daikon","Cucumber","Pork Belly"],"is_dairy_free_flag": "N","is_gluten_free_flag": "N","is_healthy_flag": "Y","is_nut_free_flag": "Y"}],"menu_item_id": 157}');
Now, query the harmonized.ingredient
table. Within a minute, you should see the new ingredients appear automatically.
-- You may need to wait up to 1 minute and re-run this query
SELECT * FROM harmonized.ingredient
WHERE ingredient_name IN ('French Baguette', 'Pickled Daikon');
Now we can build a multi-step pipeline by creating more dynamic tables that read from other dynamic tables. This creates a chain, or a Directed Acyclic Graph (DAG), where updates automatically flow from the source to the final output.
Let's create a lookup table that maps ingredients to the menu items they are used in. This dynamic table reads from our harmonized.ingredient
dynamic table.
CREATE OR REPLACE DYNAMIC TABLE harmonized.ingredient_to_menu_lookup
LAG = '1 minute'
WAREHOUSE = 'TB_DE_WH'
AS
SELECT
i.ingredient_name,
m.menu_item_health_metrics_obj:menu_item_id::INTEGER AS menu_item_id
FROM
raw_pos.menu_staging m,
LATERAL FLATTEN(INPUT => m.menu_item_health_metrics_obj:menu_item_health_metrics[0]:ingredients) f
JOIN harmonized.ingredient i ON f.value::STRING = i.ingredient_name;
Let's simulate an order of two Banh Mi sandwiches by inserting records into our order tables.
INSERT INTO raw_pos.order_header
SELECT
459520441, 15, 1030, 101565, null, 200322900,
TO_TIMESTAMP_NTZ('08:00:00', 'hh:mi:ss'),
TO_TIMESTAMP_NTZ('14:00:00', 'hh:mi:ss'),
null, TO_TIMESTAMP_NTZ('2022-01-27 08:21:08.000'),
null, 'USD', 14.00, null, null, 14.00;
INSERT INTO raw_pos.order_detail
SELECT
904745311, 459520441, 157, null, 0, 2, 14.00, 28.00, null;
Finally, let's create our final dynamic table. This one joins our order data with our ingredient lookup tables to create a summary of monthly ingredient usage per truck. This table depends on the other dynamic tables, completing our pipeline.
CREATE OR REPLACE DYNAMIC TABLE harmonized.ingredient_usage_by_truck
LAG = '2 minute'
WAREHOUSE = 'TB_DE_WH'
AS
SELECT
oh.truck_id,
EXTRACT(YEAR FROM oh.order_ts) AS order_year,
MONTH(oh.order_ts) AS order_month,
i.ingredient_name,
SUM(od.quantity) AS total_ingredients_used
FROM
raw_pos.order_detail od
JOIN raw_pos.order_header oh ON od.order_id = oh.order_id
JOIN harmonized.ingredient_to_menu_lookup iml ON od.menu_item_id = iml.menu_item_id
JOIN harmonized.ingredient i ON iml.ingredient_name = i.ingredient_name
JOIN raw_pos.location l ON l.location_id = oh.location_id
WHERE l.country = 'United States'
GROUP BY
oh.truck_id,
order_year,
order_month,
i.ingredient_name
ORDER BY
oh.truck_id,
total_ingredients_used DESC;
Now, let's query the final table in our pipeline. After a few minutes for the refreshes to complete, you will see the ingredient usage for two Banh Mis from the order we inserted in a previous step. The entire pipeline updated automatically.
-- You may need to wait up to 2 minutes and re-run this query
SELECT
truck_id,
ingredient_name,
SUM(total_ingredients_used) AS total_ingredients_used
FROM
harmonized.ingredient_usage_by_truck
WHERE
order_month = 1
AND truck_id = 15
GROUP BY truck_id, ingredient_name
ORDER BY total_ingredients_used DESC;
Finally, let's visualize our pipeline's Directed Acyclic Graph, or DAG. The DAG shows how our data flows through the tables, and it can be used to monitor the health and lag of our pipeline.
To access the DAG in Snowsight:
INGREDIENT_USAGE_BY_TRUCK
).You will now see a visualization of your pipeline, showing how the base tables flow into your dynamic tables.
Welcome to the Zero to Snowflake Hands-on Lab focused on Snowflake Cortex AI!
Within this lab, we will explore Snowflake's complete AI platform through a progressive journey from experimentation into unified business intelligence. We'll learn AI capabilities by building a comprehensive customer intelligence system using Cortex Playground for AI experimentation, Cortex AISQL Functions for production-scale analysis, Snowflake Copilot for AI-assisted SQL development, Cortex Search for semantic text searching and Cortex Analyst for natural language analytics.
Through this journey, you'll construct a complete intelligence customer analytics platform:
Phase 1: AI Foundation
Phase 2: Intelligent Development & Discovery
Phase 3: Conversational Intelligence
As a data analyst at Tasty Bytes, you need to rapidly explore customer feedback using AI models to identify service improvement opportunities. Traditionally, AI experimentation is complex and time-consuming. Snowflake Cortex Playground solves this by offering a quick, secure environment directly within Snowflake's UI to experiment with diverse AI models, compare their performance on real business data, and export successful approaches as production-ready SQL. This lab guides you through using Cortex Playground for rapid prototyping and seamless integration of AI into your data workflows.
Let's begin by connecting directly to customer review data within Cortex Playground. This keeps your data secure within Snowflake while allowing you to analyze feedback using AI models.
Navigation steps:
What you've accomplished: You now have direct access to customer review data within the AI interface. The filter allows you to focus your analysis on specific truck brands, making your experiment more targeted and relevant.
Now, let's analyze customer reviews to extract specific operational insights and compare how different AI models perform on this business task.
Setup Model Comparison:
Note: Snowflake Cortex provides access to leading AI models from multiple providers, including Anthropic, OpenAI, Meta, and others, giving you choice and flexibility without vendor lock-in.
Enter this strategic prompt:
Analyze this customer review across multiple dimensions: sentiment score with confidence level, key theme extraction, competitive positioning insights, operational impact assessment, and priority ranking for management action
Key Insight: Notice the distinct strengths: Claude provides structured, executive-ready analysis with clear confidence. In contrast, Snowflake's Llama model, optimized specifically for robust business intelligence, delivers comprehensive operational intelligence enriched with strategic context and detailed competitive analysis. This highlights the power of leveraging multiple AI providers, empowering you to choose the ideal approach for your specific business needs.
With our optimal model identified, we now need to fine-tune its behavior for different business scenarios. The same model can produce vastly different results depending on its settings—let's optimize this for our specific analytical requirements.
We want to observe how adjusting parameters, especially "temperature," affects the AI model's responses. Does it lead to more consistent or more creative answers?
How to Set Up This Temperature Test:
Give that a try and see how the responses differ! It's pretty cool to see how these small tweaks can change the AI's "personality."
Observe the Impact:
Notice how adjusting the temperature parameter fundamentally changes the analytical output, even with the same AI model and data.
While temperature influences token choice, top_p (set to 0.8 on the right) restricts possible tokens. max_tokens simply sets the maximum response length; be mindful small values can truncate results. This gives you precise control over AI creativity versus consistency, letting you match the AI's behavior to your analytical objectives.
Now that we've mastered model selection and parameter optimization, let's examine the technology foundation that makes this experimentation possible. Understanding this will help us transition from playground testing to production deployment.
In this section, let's explore the core technology that takes your AI insights from the playground to production.
Every AI insight you generate in Cortex Playground isn't just magic; it's backed by SQL. Click "View Code" after any model response, and you'll see the exact SQL query, complete with your specified settings like temperature. This isn't just for show—this code is ready for action! You can run it directly in a Snowflake worksheet, automate it with streams and tasks, or integrate it with a dynamic table for live data processing. It's also worth noting that the functionalities of this Cortex Complete can be accessed programmatically via Python or a REST API, offering flexible integration options.
Behind every prompt you've run, the SNOWFLAKE.CORTEX.COMPLETE function is hard at work. This is Snowflake Cortex's powerful function providing direct access to industry-leading large language models for text completion. The Cortex Playground simply offers an intuitive interface to test and compare these models before you embed them directly into your SQL. (Heads up: this will evolve to AI_COMPLETE in future releases.)
This seamless integration means your AI experimentation directly translates into production-ready workflows within Snowflake.
The Cortex Playground is an invaluable tool for experimenting with individual reviews, but true large-scale customer feedback analysis demands specialized AI functions. The prompt patterns and model selections you've refined here lay the groundwork for building scalable solutions. Our next step involves processing thousands of reviews using purpose-built AI SQL Functions like SENTIMENT(), CLASSIFY(), EXTRACT_ANSWER(), and AI_SUMMARIZE_AGG(). This systematic approach ensures that AI-driven insights seamlessly become a core part of our operational strategy.
You've experimented with AI models in Cortex Playground to analyze individual customer reviews. Now, it's time to scale! This Quickstart shows you how to use AI SQL Functions to process thousands of reviews, turning experimental insights into production-ready intelligence. You'll learn to:
Copy and paste the SQL from this file in a new Worksheet to follow along in Snowflake.
First, let's set our session context. We will assume the role of a TastyBytes data analyst with the intention of leveraging AISQL functions to gain insights from customer reviews.
ALTER SESSION SET query_tag = '{"origin":"sf_sit-is","name":"tb_101_v2_aisql","version":{"major":1, "minor":1},"attributes":{"is_quickstart":0, "source":"tastybytes", "vignette": "aisql_functions"}}';
USE ROLE tb_analyst;
USE DATABASE tb_101;
USE WAREHOUSE tb_de_wh;
Analyze customer sentiment across all food truck brands to identify which trucks are performing best and create fleet-wide customer satisfaction metrics. In Cortex Playground, we analyzed individual reviews manually. Now we'll use the SENTIMENT()
function to automatically score customer reviews from -1 (negative) to +1 (positive), following Snowflake's official sentiment ranges.
Business Question: "How do customers feel about each of our truck brands overall?"
Please execute this query to analyze customer sentiment across our food truck network and categorize feedback.
SELECT
truck_brand_name,
COUNT(*) AS total_reviews,
AVG(CASE WHEN sentiment >= 0.5 THEN sentiment END) AS avg_positive_score,
AVG(CASE WHEN sentiment BETWEEN -0.5 AND 0.5 THEN sentiment END) AS avg_neutral_score,
AVG(CASE WHEN sentiment <= -0.5 THEN sentiment END) AS avg_negative_score
FROM (
SELECT
truck_brand_name,
SNOWFLAKE.CORTEX.SENTIMENT (review) AS sentiment
FROM harmonized.truck_reviews_v
WHERE
language ILIKE '%en%'
AND review IS NOT NULL
LIMIT 10000
)
GROUP BY
truck_brand_name
ORDER BY total_reviews DESC;
Key Insight: Notice how we transitioned from analyzing reviews one at a time in Cortex Playground to systematically processing thousands. The SENTIMENT()
function automatically scored every review and categorized them into Positive, Negative, and Neutral - giving us instant fleet-wide customer satisfaction metrics.
Sentiment Score Ranges:
Now, let's categorize all reviews to understand what aspects of our service customers are talking about most. We'll use the AI_CLASSIFY()
function, which automatically categorizes reviews into user-defined categories based on AI understanding, rather than simple keyword matching. In this step, we will categorize customer feedback into business-relevant operational areas and analyze their distribution patterns.
Business Question: "What are customers primarily commenting on - food quality, service, or delivery experience?"
Execute the Classification Query:
WITH classified_reviews AS (
SELECT
truck_brand_name,
AI_CLASSIFY(
review,
['Food Quality', 'Pricing', 'Service Experience', 'Staff Behavior']
):labels[0] AS feedback_category
FROM
harmonized.truck_reviews_v
WHERE
language ILIKE '%en%'
AND review IS NOT NULL
AND LENGTH(review) > 30
LIMIT
10000
)
SELECT
truck_brand_name,
feedback_category,
COUNT(*) AS number_of_reviews
FROM
classified_reviews
GROUP BY
truck_brand_name,
feedback_category
ORDER BY
truck_brand_name,
number_of_reviews DESC;
Key Insight: Observe how AI_CLASSIFY()
automatically categorized thousands of reviews into business-relevant themes such as Food Quality, Service Experience, and more. We can instantly see that Food Quality is the most discussed topic across our truck brands, providing the operations team with clear, actionable insight into customer priorities.
Next, to gain precise answers from unstructured text, we'll utilize the EXTRACT_ANSWER()
function. This powerful function enables us to ask specific business questions about customer feedback and receive direct answers. In this step, our goal is to identify precise operational issues mentioned in customer reviews, highlighting specific problems that require immediate attention.
Business question: "What specific improvement or complaint is mentioned in this review?"
Let's execute the next query:
SELECT
truck_brand_name,
primary_city,
LEFT(review, 100) || '...' AS review_preview,
SNOWFLAKE.CORTEX.EXTRACT_ANSWER(
review,
'What specific improvement or complaint is mentioned in this review?'
) AS specific_feedback
FROM
harmonized.truck_reviews_v
WHERE
language = 'en'
AND review IS NOT NULL
AND LENGTH(review) > 50
ORDER BY truck_brand_name, primary_city ASC
LIMIT 10000;
Key Insight: Notice how EXTRACT_ANSWER()
distills specific, actionable insights from long customer reviews. Rather than manual review, this function automatically identifies concrete feedback like "friendly staff was saving grace" and "hot dogs are cooked to perfection." The result is a transformation of dense text into specific, quotable feedback that the operations team can leverage instantly.
Finally, to create concise summaries of customer feedback, we'll use the AI_SUMMARIZE_AGG()
function. This powerful function generates short, coherent summaries from lengthy unstructured text. In this step, our goal is to distill the essence of customer reviews for each truck brand into digestible summaries, providing quick overviews of overall sentiment and key points.
Business Question: "What are the key themes and overall sentiment for each truck brand?"
Execute the Summarization Query:
SELECT
truck_brand_name,
AI_SUMMARIZE_AGG (review) AS review_summary
FROM
(
SELECT
truck_brand_name,
review
FROM
harmonized.truck_reviews_v
LIMIT
100
)
GROUP BY
truck_brand_name;
Key Insight: The AI_SUMMARIZE_AGG()
function condenses lengthy reviews into clear, brand-level summaries. These summaries highlight recurring themes and sentiment trends, providing decision-makers with quick overviews of each food truck's performance and enabling faster understanding of customer perception without reading individual reviews.
We've successfully demonstrated the transformative power of AI SQL functions, shifting customer feedback analysis from individual review processing to systemic, production-scale intelligence. Our journey through these four core functions clearly illustrates how each serves a distinct analytical purpose, transforming raw customer voices into comprehensive business intelligence—systematic, scalable, and immediately actionable. What once required individual review analysis now processes thousands of reviews in seconds, providing both the emotional context and specific details crucial for data-driven operational improvements.
As our analysis grows in sophistication, combining multiple AI functions in complex queries can become challenging. This is precisely where Snowflake Copilot offers powerful assistance. Let's explore how Copilot can help us create complex queries using natural language and significantly accelerate our work, empowering us to build even more intricate analytical workflows with ease.
You've successfully built powerful AI SQL queries to analyze customer reviews. Now, the challenge is to connect these insights to real business impact: Do negative reviews affect loyalty? Do positive reviews boost orders? This requires complex queries joining multiple data sources and correlating AI results.
This lab introduces Snowflake Copilot, an LLM-powered assistant designed to simplify this challenge. Running securely within Snowflake Cortex, Copilot helps you construct sophisticated analytical workflows, translating complex business questions into powerful SQL.
Copy and paste the SQL from this file in a new Worksheet to follow along in Snowflake.
First, let's set our session context. We will assume the role of a TastyBytes data analyst with the intention of leveraging Snowflake Copilot to build sophisticated queries and gain deeper business insights.
ALTER SESSION SET query_tag = '{"origin":"sf_sit-is","name":"tb_101_v2_copilot","version":{"major":1, "minor":1},"attributes":{"is_quickstart":0, "source":"tastybytes", "vignette": "snowflake_copilot"}}';
USE ROLE tb_analyst;
USE WAREHOUSE tb_de_wh;
USE DATABASE tb_101;
USE SCHEMA harmonized;
To begin, let's explore what types of analysis questions we can ask about our Tasty Bytes dataset using Copilot's natural language understanding capabilities.
Type the following question in the Copilot message box and click RUN to see the result of the query:
Prompt 1: How do I structure a query that correlates customer review sentiment with customer loyalty metrics and order behavior? I have a review sentiment data and customer loyalty metrics tables with order information. I need to understand the relationship between what customers say and their actual purchasing pattern.
Key Insight: Copilot provides schema-aware business intelligence, directly analyzing your specific Tasty Bytes tables and suggesting strategic analysis using your actual column names. This isn't generic AI; it's purpose-built intelligence that profoundly understands your data structure.
Now let's use Copilot to generate complex SQL with multiple table joins from a simple business question.
In the same Copilot panel, paste the following business question and click RUN to see the result:
Prompt 2: Show me high-value customers who have been leaving negative reviews. I want to understand which customers spend a lot of money with us but seem unhappy based on their feedback
Key Insight: Notice how Copilot transforms a simple business question into production-ready customer intelligence with complex analytical logic and actionable results—all without requiring SQL expertise. This showcases Copilot's core value: you can ask strategic questions in plain English and instantly receive the enterprise-grade analytics that typically demand data engineering skills.
Snowflake Copilot profoundly transforms business intelligence by enabling users to effortlessly translate complex business questions into sophisticated SQL queries. As demonstrated with Tasty Bytes, it empowers both technical and non-technical users to derive actionable insights from their data without deep SQL expertise. This LLM-powered assistant delivers schema-aware, purpose-built intelligence, ensuring robust data governance and keeping all enterprise data securely within Snowflake. Copilot isn't just generic AI; it's a strategic tool that bridges operational insights with business intelligence.
While Copilot excels at generating complex analytical queries, a common daily challenge for customer service teams is quickly finding specific customer reviews for complaints or compliments. Traditional keyword search often falls short, missing the nuances of natural language.
Snowflake Cortex Search solves this by providing low-latency, high-quality "fuzzy" search over your Snowflake text data. It quickly sets up hybrid (vector and keyword) search engines, handling embeddings, infrastructure, and tuning for you. Under the hood, Cortex Search combines semantic (meaning-based) and lexical (keyword-based) retrieval with intelligent re-ranking to deliver the most relevant results. In this lab, you will configure a search service, connect it to customer review data, and run semantic queries to proactively identify key customer feedback.
This opens the search service configuration interface, where you'll define how Snowflake indexes and interprets your text data.
In the initial configuration screen, enter:
TB_DEV
TB_DEV_WH
TB_101
HARMONIZED
customer_feedback_intelligence
Click Next: Select data.
This wizard will guide you through several configuration screens:
TRUCK_REVIEWS_V
REVIEW
(the text column to search)TRUCK_BRAND_NAME
, PRIMARY_CITY
, REVIEW_ID
)DATE
, LANGUAGE
, etc.Note: Creating the search service includes building the index, so the initial setup may take a little longer - but once created, queries run with low latency and scale seamlessly.
Behind this simple UI, Cortex Search is performing a complex task. It analyzes the text in your "REVIEW" column, using an AI model to generate semantic embeddings, which are numerical representations of the text's meaning. These embeddings are then indexed, allowing for high-speed conceptual searches later on. In just a few clicks, you have taught Snowflake to understand the intent behind your reviews.
When the service shows as "Active", click on Playground and enter the natural language prompt in the search bar:
Prompt - 1: Customers getting sick
Key Insight: Notice Cortex Search isn't just finding customers - it's finding CONDITIONS that could MAKE customers sick. That is the difference between reactive keyword search and proactive semantic understanding.
Now try another query:
Prompt - 2: Angry customers
Key Insight: These customers are about to churn, but they never said "I'm angry." They expressed frustration in their own words. Cortex Search understands the emotion behind the language, helping you identify and save at-risk customers before they leave.
Ultimately, Cortex Search transforms how Tasty Bytes analyzes customer feedback. It empowers the customer service manager to move beyond simply sifting through reviews, to truly understand and proactively act upon the voice of the customer at scale, driving better operational decisions and enhancing customer loyalty.
In the next module - Cortex Analyst - you'll use natural language to query structured data.
A business analyst at Tasty Bytes needs to enable self-service analytics, allowing the business team to ask complex questions in natural language and get instant insights without relying on data analysts to write SQL. While previous AI tools helped with finding reviews and complex query generation, the demand now is for conversational analytics that directly transforms structured business data into immediate insights.
Cortex Analyst empowers business users to ask sophisticated questions directly, seamlessly extracting value from their analytics data through natural language interaction. This lab will guide you through designing a semantic model, connecting it to your business data, configuring relationships and synonyms, and then executing advanced business intelligence queries using natural language.
Let's begin by navigating to Cortex Analyst in Snowsight and configuring our semantic model foundations.
TB_ADMIN
.TB_CORTEX_WH
.tasty_bytes_business_analytics
.Note: To make a request to Cortex Analyst, you must use a role that has the SNOWFLAKE.CORTEX_USER
role granted.
In the ‘Select tables' step, let's choose our pre-built analytics views.
TB_101
SEMANTIC_LAYER
Customer_Loyalty_Metrics_v
and Orders_v
TOTAL_SALES_AMOUNT
could be "Total Revenue").Now let's add table synonyms for better natural language understanding:
Customers, customer_data, loyalty, customer_metrics, customer_info
Orders, transactions, sales, purchases, order_data
After creating the semantic model, let's establish the relationship between our logical tables and add business-friendly synonyms.
Let's configure our table relationship by creating:
orders_to_customer_loyalty_metrics
Left outer
many-to-one
ORDERS_V
CUSTOMER_LOYALTY_METRICS_V
CUSTOMER_ID = CUSTOMER_ID
Upon completion, we will have a semantic model ready for sophisticated natural language queries.
With our semantic model and relationship active, let's demonstrate sophisticated natural language analysis by running our first complex business query.
Navigate to Cortex Analyst query interface.
Let's execute our customer segmentation analysis:
Prompt 1: Tell me, which customer groups, broken down by marital status and gender, are spending the most per customer? I'd like to see this across our different cities and regions. Also, can we compare their long-term spending habits to identify our most valuable customer demographics for focused marketing efforts?
Key Insight: Instantly delivers comprehensive intelligence by combining multi-table joins, demographic segmentation, geographic insights, and lifetime value analysis - insights that would require 40+ lines of SQL and hours of analyst effort.
Having seen basic segmentation, let's now demonstrate enterprise-grade SQL that showcases the full power of conversational business intelligence.
Let's execute our multi-layered customer analysis:
Prompt 2: I want to understand our customer base better. Can you group customers by how much they've spent with us over time, then show me their ordering patterns differ between top spenders and lower spenders? Also compare how our franchise locations perform versus company-owned stores for each customer group
Key Insight: Notice how Cortex Analyst seamlessly bridges the gap between a business user's simple, natural language question and the sophisticated, multi-faceted SQL query required to answer it. It automatically constructs the complex logic, including CTEs, window functions, and detailed aggregations, that would typically demand a skilled data analyst.
Through these rigorous steps, we've forged a robust Cortex Analyst semantic model. This isn't just an improvement; it's a transformative tool designed to liberate users across various industries from the constraints of SQL, enabling them to surface profound business intelligence through intuitive natural language queries. Our multi-layered analyses, while showcased through the Tasty Bytes use case, powerfully illustrate how this model drastically cuts down on the time and effort traditionally needed for deep insights, thereby democratizing access to data and fueling a culture of informed, agile decision-making on a broad scale.
Within this vignette, we will explore some of the powerful governance features within Snowflake Horizon. We will begin with a look at Role-Based Access Control (RBAC), before diving into features like automated data classification, tag-based masking policies for column-level security, row-access policies, data quality monitoring, and finally, account-wide security monitoring with the Trust Center.
Copy and paste the SQL from this file in a new Worksheet to follow along in Snowflake.
Snowflake's security model is built on a framework of Role-based Access Control (RBAC) and Discretionary Access Control (DAC). Access privileges are assigned to roles, which are then assigned to users. This creates a powerful and flexible hierarchy for securing objects.
First, let's set our context for this exercise and view the roles that already exist in the account.
USE ROLE useradmin;
USE DATABASE tb_101;
USE WAREHOUSE tb_dev_wh;
SHOW ROLES;
We will now create a custom tb_data_steward
role. This role will be responsible for managing and protecting our customer data.
CREATE OR REPLACE ROLE tb_data_steward
COMMENT = 'Custom Role';
The typical hierarchy of system and custom roles might look something like this:
+---------------+
| ACCOUNTADMIN |
+---------------+
^ ^ ^
| | |
+-------------+-+ | ++-------------+
| SECURITYADMIN | | | SYSADMIN |<------------+
+---------------+ | +--------------+ |
^ | ^ ^ |
| | | | |
+-------+-------+ | | +-----+-------+ +-------+-----+
| USERADMIN | | | | CUSTOM ROLE | | CUSTOM ROLE |
+---------------+ | | +-------------+ +-------------+
^ | | ^ ^ ^
| | | | | |
| | | | | +-+-----------+
| | | | | | CUSTOM ROLE |
| | | | | +-------------+
| | | | | ^
| | | | | |
+----------+-----+---+--+--------------+-----------+
|
+----+-----+
| PUBLIC |
+----------+
Snowflake System Defined Role Definitions:
We can't do much with our role without granting privileges to it. Let's switch to the securityadmin
role to grant our new tb_data_steward
role the necessary permissions to use a warehouse and access our database schemas and tables.
USE ROLE securityadmin;
-- Grant warehouse usage
GRANT OPERATE, USAGE ON WAREHOUSE tb_dev_wh TO ROLE tb_data_steward;
-- Grant database and schema usage
GRANT USAGE ON DATABASE tb_101 TO ROLE tb_data_steward;
GRANT USAGE ON ALL SCHEMAS IN DATABASE tb_101 TO ROLE tb_data_steward;
-- Grant table-level privileges
GRANT SELECT ON ALL TABLES IN SCHEMA raw_customer TO ROLE tb_data_steward;
GRANT ALL ON SCHEMA governance TO ROLE tb_data_steward;
GRANT ALL ON ALL TABLES IN SCHEMA governance TO ROLE tb_data_steward;
Finally, we grant the new role to our own user. Then we can switch to the tb_data_steward
role and run a query to see what data we can access.
-- Grant role to your user
SET my_user = CURRENT_USER();
GRANT ROLE tb_data_steward TO USER IDENTIFIER($my_user);
-- Switch to the new role
USE ROLE tb_data_steward;
-- Run a test query
SELECT TOP 100 * FROM raw_customer.customer_loyalty;
Looking at the query results, it's clear this table contains a lot of Personally Identifiable Information (PII). In the next sections, we'll learn how to protect it.
A key first step in data governance is identifying and classifying sensitive data. Snowflake Horizon's auto-tagging capability can automatically discover sensitive information by monitoring columns in your schemas. We can then use these tags to apply security policies.
Using the accountadmin
role, we'll create a pii
tag in our governance
schema. We will also grant the necessary privileges to our tb_data_steward
role to perform classification.
USE ROLE accountadmin;
CREATE OR REPLACE TAG governance.pii;
GRANT APPLY TAG ON ACCOUNT TO ROLE tb_data_steward;
GRANT EXECUTE AUTO CLASSIFICATION ON SCHEMA raw_customer TO ROLE tb_data_steward;
GRANT DATABASE ROLE SNOWFLAKE.CLASSIFICATION_ADMIN TO ROLE tb_data_steward;
GRANT CREATE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE ON SCHEMA governance TO ROLE tb_data_steward;
Now, as the tb_data_steward
, we'll create a classification profile. This profile defines how auto-tagging will behave.
USE ROLE tb_data_steward;
CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CLASSIFICATION_PROFILE
governance.tb_classification_profile(
{
'minimum_object_age_for_classification_days': 0,
'maximum_classification_validity_days': 30,
'auto_tag': true
});
Next, we'll define a mapping that tells the classification profile to apply our governance.pii
tag to any column whose SEMANTIC_CATEGORY
matches common PII types like NAME
, PHONE_NUMBER
, EMAIL
, etc.
CALL governance.tb_classification_profile!SET_TAG_MAP(
{'column_tag_map':[
{
'tag_name':'tb_101.governance.pii',
'tag_value':'pii',
'semantic_categories':['NAME', 'PHONE_NUMBER', 'POSTAL_CODE', 'DATE_OF_BIRTH', 'CITY', 'EMAIL']
}]});
Let's manually trigger the classification process on our customer_loyalty
table. Then, we can query the INFORMATION_SCHEMA
to see the tags that were automatically applied.
-- Trigger classification
CALL SYSTEM$CLASSIFY('tb_101.raw_customer.customer_loyalty', 'tb_101.governance.tb_classification_profile');
-- View applied tags
SELECT
column_name,
tag_database,
tag_schema,
tag_name,
tag_value,
apply_method
FROM TABLE(INFORMATION_SCHEMA.TAG_REFERENCES_ALL_COLUMNS('raw_customer.customer_loyalty', 'table'));
Notice that columns identified as PII now have our custom governance.pii
tag applied.
Now that our sensitive columns are tagged, we can use Dynamic Data Masking to protect them. A masking policy is a schema-level object that determines whether a user sees the original data or a masked version at query time. We can apply these policies directly to our pii
tag.
We'll create two policies: one to mask string data and one to mask date data. The logic is simple: if the user's role is not privileged (i.e., not ACCOUNTADMIN
or TB_ADMIN
), return a masked value. Otherwise, return the original value.
-- Create the masking policy for sensitive string data
CREATE OR REPLACE MASKING POLICY governance.mask_string_pii AS (original_value STRING)
RETURNS STRING ->
CASE WHEN
CURRENT_ROLE() NOT IN ('ACCOUNTADMIN', 'TB_ADMIN')
THEN '****MASKED****'
ELSE original_value
END;
-- Now create the masking policy for sensitive DATE data
CREATE OR REPLACE MASKING POLICY governance.mask_date_pii AS (original_value DATE)
RETURNS DATE ->
CASE WHEN
CURRENT_ROLE() NOT IN ('ACCOUNTADMIN', 'TB_ADMIN')
THEN DATE_TRUNC('year', original_value)
ELSE original_value
END;
The power of tag-based governance comes from applying the policy once to the tag. This action automatically protects all columns that have that tag, now and in the future.
ALTER TAG governance.pii SET
MASKING POLICY governance.mask_string_pii,
MASKING POLICY governance.mask_date_pii;
Let's test our work. First, switch to the unprivileged public
role and query the table. The PII columns should be masked.
USE ROLE public;
SELECT TOP 100 * FROM raw_customer.customer_loyalty;
Now, switch to a privileged role, tb_admin
. The data should now be fully visible.
USE ROLE tb_admin;
SELECT TOP 100 * FROM raw_customer.customer_loyalty;
In addition to masking columns, Snowflake allows you to filter which rows are visible to a user with Row Access Policies. The policy evaluates each row against rules you define, often based on the user's role or other session attributes.
A common pattern for row access policies is to use a mapping table that defines which roles can see which data. We'll create a table that maps roles to the country
values they are permitted to see.
USE ROLE tb_data_steward;
CREATE OR REPLACE TABLE governance.row_policy_map
(role STRING, country_permission STRING);
-- Map the tb_data_engineer role to only see 'United States' data
INSERT INTO governance.row_policy_map
VALUES('tb_data_engineer', 'United States');
Now we create the policy itself. This policy returns TRUE
(allowing the row to be seen) if the user's role is an admin role OR if the user's role exists in our mapping table and matches the country
value of the current row.
CREATE OR REPLACE ROW ACCESS POLICY governance.customer_loyalty_policy
AS (country STRING) RETURNS BOOLEAN ->
CURRENT_ROLE() IN ('ACCOUNTADMIN', 'SYSADMIN')
OR EXISTS
(
SELECT 1 FROM governance.row_policy_map rp
WHERE
UPPER(rp.role) = CURRENT_ROLE()
AND rp.country_permission = country
);
Apply the policy to the country
column of our customer_loyalty
table. Then, switch to the tb_data_engineer
role and query the table.
-- Apply the policy
ALTER TABLE raw_customer.customer_loyalty
ADD ROW ACCESS POLICY governance.customer_loyalty_policy ON (country);
-- Switch role to test the policy
USE ROLE tb_data_engineer;
-- Query the table
SELECT TOP 100 * FROM raw_customer.customer_loyalty;
The result set should now only contain rows where the country
is ‘United States'.
Data governance isn't just about security; it's also about trust and reliability. Snowflake helps maintain data integrity with Data Metric Functions (DMFs). You can use system-defined DMFs or create your own to run automated quality checks on your tables.
Let's use a few of Snowflake's built-in DMFs to check the quality of our order_header
table.
USE ROLE tb_data_steward;
-- This will return the percentage of null customer IDs.
SELECT SNOWFLAKE.CORE.NULL_PERCENT(SELECT customer_id FROM raw_pos.order_header);
-- We can use DUPLICATE_COUNT to check for duplicate order IDs.
SELECT SNOWFLAKE.CORE.DUPLICATE_COUNT(SELECT order_id FROM raw_pos.order_header);
-- Average order total amount for all orders.
SELECT SNOWFLAKE.CORE.AVG(SELECT order_total FROM raw_pos.order_header);
We can also create custom DMFs for our specific business logic. Let's create one that checks for orders where the order_total
does not equal unit_price * quantity
.
CREATE OR REPLACE DATA METRIC FUNCTION governance.invalid_order_total_count(
order_prices_t table(
order_total NUMBER,
unit_price NUMBER,
quantity INTEGER
)
)
RETURNS NUMBER
AS
'SELECT COUNT(*)
FROM order_prices_t
WHERE order_total != unit_price * quantity';
Let's insert a bad record to test our DMF. Then, we'll call the function to see if it catches the error. The record we will be inserting is ordering 2 items with a unit price of $5, and a total price of $5 instead of the correct total $10.
-- Insert a record with an incorrect total price
INSERT INTO raw_pos.order_detail
SELECT 904745311, 459520442, 52, null, 0, 2, 5.0, 5.0, null;
-- Call the custom DMF on the order detail table.
SELECT governance.invalid_order_total_count(
SELECT price, unit_price, quantity FROM raw_pos.order_detail
) AS num_orders_with_incorrect_price;
To automate this check, we can associate the DMF with the table and set a schedule to have it run automatically whenever the data changes, then add it to the order_detail
table.
ALTER TABLE raw_pos.order_detail
SET DATA_METRIC_SCHEDULE = 'TRIGGER_ON_CHANGES';
ALTER TABLE raw_pos.order_detail
ADD DATA METRIC FUNCTION governance.invalid_order_total_count
ON (price, unit_price, quantity);
The Trust Center provides a centralized dashboard for monitoring security risks across your entire Snowflake account. It uses scheduled scanners to check for issues like missing Multi-Factor Authentication (MFA), over-privileged roles, or inactive users, and then provides recommended actions.
First, an ACCOUNTADMIN
needs to grant the TRUST_CENTER_ADMIN
application role to a user or role. We'll grant it to our tb_admin
role.
USE ROLE accountadmin;
GRANT APPLICATION ROLE SNOWFLAKE.TRUST_CENTER_ADMIN TO ROLE tb_admin;
USE ROLE tb_admin;
Now, navigate to the Trust Center in the Snowsight UI:
By default, most scanner packages are disabled. Let's enable them to get a comprehensive view of our account's security posture.
Monthly
and click Continue.After the scanners have had a moment to run, navigate back to the Findings tab.
This powerful tool gives you a continuous, actionable overview of your Snowflake account's security health.
In this vignette, we will explore how Snowflake facilitates seamless data collaboration through the Snowflake Marketplace. We will see how easy it is to acquire live, ready-to-query third-party datasets and immediately join them with our own internal data to unlock new insights—all without the need for traditional ETL pipelines.
Copy and paste the SQL from this file in a new Worksheet to follow along in Snowflake.
One of our analysts wants to see how weather impacts food truck sales. To do this, they'll use the Snowflake Marketplace to get live weather data from Weather Source, which can then be joined directly with our own sales data. The Marketplace allows us to access live, ready-to-query data from third-party providers without any data duplication or ETL.
First, let's set our context to use the accountadmin
role, which is required to acquire data from the Marketplace.
USE DATABASE tb_101;
USE ROLE accountadmin;
USE WAREHOUSE tb_de_wh;
Follow these steps in the Snowsight UI to get the Weather Source data:
ACCOUNTADMIN
role.Weather Source frostbyte
. ZTS_WEATHERSOURCE
.This process makes the Weather Source data instantly available in our account as a new database, ready to be queried.
With the Weather Source data now in our account, our analyst can immediately begin joining it with our existing Tasty Bytes data. There's no need to wait for an ETL job to run.
Let's switch to the tb_analyst
role and begin exploring the new weather data. We'll start by getting a list of all distinct US cities available in the share, along with some average weather metrics.
USE ROLE tb_analyst;
SELECT
DISTINCT city_name,
AVG(max_wind_speed_100m_mph) AS avg_wind_speed_mph,
AVG(avg_temperature_air_2m_f) AS avg_temp_f,
AVG(tot_precipitation_in) AS avg_precipitation_in,
MAX(tot_snowfall_in) AS max_snowfall_in
FROM zts_weathersource.onpoint_id.history_day
WHERE country = 'US'
GROUP BY city_name;
Now, let's create a view that joins our raw country
data with the historical daily weather data from the Weather Source share. This gives us a unified view of weather metrics for the cities where Tasty Bytes operates.
CREATE OR REPLACE VIEW harmonized.daily_weather_v
COMMENT = 'Weather Source Daily History filtered to Tasty Bytes supported Cities'
AS
SELECT
hd.*,
TO_VARCHAR(hd.date_valid_std, 'YYYY-MM') AS yyyy_mm,
pc.city_name AS city,
c.country AS country_desc
FROM zts_weathersource.onpoint_id.history_day hd
JOIN zts_weathersource.onpoint_id.postal_codes pc
ON pc.postal_code = hd.postal_code
AND pc.country = hd.country
JOIN raw_pos.country c
ON c.iso_country = hd.country
AND c.city = hd.city_name;
Using our new view, the analyst can query for the average daily temperature in Hamburg, Germany for February 2022. Run the query below, then we'll visualize this as a line chart directly in Snowsight.
SELECT
dw.country_desc,
dw.city_name,
dw.date_valid_std,
AVG(dw.avg_temperature_air_2m_f) AS average_temp_f
FROM harmonized.daily_weather_v dw
WHERE dw.country_desc = 'Germany'
AND dw.city_name = 'Hamburg'
AND YEAR(date_valid_std) = 2022
AND MONTH(date_valid_std) = 2
GROUP BY dw.country_desc, dw.city_name, dw.date_valid_std
ORDER BY dw.date_valid_std DESC;
Line
.DATE_VALID_STD
.AVERAGE_TEMP_F
.Let's take it a step further and combine our orders_v
view with our new daily_weather_v
to see how sales correlate with weather conditions.
CREATE OR REPLACE VIEW analytics.daily_sales_by_weather_v
COMMENT = 'Daily Weather Metrics and Orders Data'
AS
WITH daily_orders_aggregated AS (
SELECT DATE(o.order_ts) AS order_date, o.primary_city, o.country,
o.menu_item_name, SUM(o.price) AS total_sales
FROM harmonized.orders_v o
GROUP BY ALL
)
SELECT
dw.date_valid_std AS date, dw.city_name, dw.country_desc,
ZEROIFNULL(doa.total_sales) AS daily_sales, doa.menu_item_name,
ROUND(dw.avg_temperature_air_2m_f, 2) AS avg_temp_fahrenheit,
ROUND(dw.tot_precipitation_in, 2) AS avg_precipitation_inches,
ROUND(dw.tot_snowdepth_in, 2) AS avg_snowdepth_inches,
dw.max_wind_speed_100m_mph AS max_wind_speed_mph
FROM harmonized.daily_weather_v dw
LEFT JOIN daily_orders_aggregated doa
ON dw.date_valid_std = doa.order_date
AND dw.city_name = doa.primary_city
AND dw.country_desc = doa.country
ORDER BY date ASC;
Our analyst can now answer complex business questions, such as: "How does significant precipitation impact our sales figures in the Seattle market?"
SELECT * EXCLUDE (city_name, country_desc, avg_snowdepth_inches, max_wind_speed_mph)
FROM analytics.daily_sales_by_weather_v
WHERE
country_desc = 'United States'
AND city_name = 'Seattle'
AND avg_precipitation_inches >= 1.0
ORDER BY date ASC;
Let's also visualize the results again in Snowsight, but as a bar chart this time.
Bar
.MENU_ITEM_NAME
.DAILY_SALES
.Our analyst now wants more insight into the specific locations of our food trucks. We can get Point-of-Interest (POI) data from Safegraph, another provider on the Snowflake Marketplace, to enrich our analysis even further.
Follow the same procedure as before to acquire the Safegraph data from the Marketplace.
ACCOUNTADMIN
role.safegraph frostbyte
.ZTS_SAFEGRAPH
.Let's create a view that joins our internal location
data with the Safegraph POI data.
CREATE OR REPLACE VIEW harmonized.tastybytes_poi_v
AS
SELECT
l.location_id, sg.postal_code, sg.country, sg.city, sg.iso_country_code,
sg.location_name, sg.top_category, sg.category_tags,
sg.includes_parking_lot, sg.open_hours
FROM raw_pos.location l
JOIN zts_safegraph.public.frostbyte_tb_safegraph_s sg
ON l.location_id = sg.location_id
AND l.iso_country_code = sg.iso_country_code;
Now we can combine all three datasets: our internal data, the weather data, and the POI data. Let's find our top 3 windiest truck locations in the US in 2022.
SELECT TOP 3
p.location_id, p.city, p.postal_code,
AVG(hd.max_wind_speed_100m_mph) AS average_wind_speed
FROM harmonized.tastybytes_poi_v AS p
JOIN zts_weathersource.onpoint_id.history_day AS hd
ON p.postal_code = hd.postal_code
WHERE
p.country = 'United States'
AND YEAR(hd.date_valid_std) = 2022
GROUP BY p.location_id, p.city, p.postal_code
ORDER BY average_wind_speed DESC;
Finally, let's conduct a more complex analysis to determine brand resilience. We'll use a Common Table Expression (CTE) to first find the windiest locations, and then compare sales on "calm" vs. "windy" days for each truck brand at those locations. This can help inform operational decisions, like offering "Windy Day" promotions for brands that are less resilient.
WITH TopWindiestLocations AS (
SELECT TOP 3
p.location_id
FROM harmonized.tastybytes_poi_v AS p
JOIN zts_weathersource.onpoint_id.history_day AS hd ON p.postal_code = hd.postal_code
WHERE p.country = 'United States' AND YEAR(hd.date_valid_std) = 2022
GROUP BY p.location_id, p.city, p.postal_code
ORDER BY AVG(hd.max_wind_speed_100m_mph) DESC
)
SELECT
o.truck_brand_name,
ROUND(AVG(CASE WHEN hd.max_wind_speed_100m_mph <= 20 THEN o.order_total END), 2) AS avg_sales_calm_days,
ZEROIFNULL(ROUND(AVG(CASE WHEN hd.max_wind_speed_100m_mph > 20 THEN o.order_total END), 2)) AS avg_sales_windy_days
FROM analytics.orders_v AS o
JOIN zts_weathersource.onpoint_id.history_day AS hd
ON o.primary_city = hd.city_name AND DATE(o.order_ts) = hd.date_valid_std
WHERE o.location_id IN (SELECT location_id FROM TopWindiestLocations)
GROUP BY o.truck_brand_name
ORDER BY o.truck_brand_name;
Streamlit is an open-source Python library designed for easily creating and sharing web applications for machine learning and data science. It allows for the rapid development and deployment of data-driven apps.
Streamlit in Snowflake empowers developers to securely build, deploy, and share applications directly within Snowflake. This integration allows you to build apps that process and utilize data stored in Snowflake without the need of moving the data or application code to an external system.
Let's create our first Streamlit app, an app that will display and chart sales data for each menu item in Japan for February 2022.
Congratulations! You have successfully completed the entire Tasty Bytes - Zero to Snowflake journey.
You have now built and configured warehouses, cloned and transformed data, recovered a dropped table with Time Travel, and built an automated data pipeline for semi-structured data. You've also unlocked insights using AI by generating analysis with simple AISQL functions and accelerating your workflow with Snowflake Copilot. Furthermore, you have implemented a robust governance framework with roles and policies and seamlessly enriched your own data with live datasets from the Snowflake Marketplace.
If you would like to re-run this Quickstart, please run the complete RESET
script located at the bottom of your worksheet.
UNDROP
.VARIANT
data, and build automated ELT pipelines with Dynamic Tables.