By completing this quickstart, you will learn how to easily transform raw data into an optimal format for analysis within Power BI. This quickstart will build upon An Introduction to Tasty Bytes quickstart. We'll begin by profiling the data within Snowsight, followed by creating new roles and granting appropriate privileges for our fictitious global BI Analyst team. Next, we'll enrich our data with third party location data from the Snowflake Marketplace in a matter of minutes. From there, we'll transform our raw data into an optimal model for downstream analysis within Power BI. Next, we'll add column and row-level data protections to our data model. Lastly, we'll connect the provided Power BI template (.pbit) file to our Snowflake data model and analyze sales transactions live.

Overview_Diagram

Prerequisites

What You'll Learn

What You'll Need

What You'll Build

In this section, we'll review the Tasty Bytes dataset and use Snowsight to easily profile the data we'll be working with.

Setup

We'll start by creating a new folder and our first SQL worksheet within it.

In Snowsight, start by navigating to the Projects –> Worksheets menu using the left-hand navigation pane.

Projects_Worksheets

Next, create a new folder using the drop-down arrow in the upper-right hand corner. This folder will be used to store our SQL worksheets.

Create_Folder

Name the folder whatever you like. We'll name ours "End-to-End Analytics with Snowflake and Power BI" to match the name of this guide.

Name_Folder

Next, we'll create our first SQL worksheet within our folder. Use the upper-right hand navigation to create a new worksheet. We'll name our worksheet "1 - Data Profiling".

With our worksheet now open, let's set the worksheet context so we can query our Tasty Bytes raw data tables.

/*-------------------------------------------------

--SECTION 2 - REVIEWING THE DATASET

-------------------------------------------------*/

/* set database schema, role, and warehouse context */
use role sysadmin;
use database tb_101;
use schema raw_pos;
use warehouse tb_dev_wh;

With our worksheet context set, let's execute a few queries to get a better feel of the dataset we'll be working with

Data Profiling with Snowsight

/*-------------------------------------------------

--BASIC DATA PROFILING

-------------------------------------------------*/

/* Review the volume of data in each table */
show tables in schema tb_101.raw_pos;

Notice the cardinality of the order_detail and order_header tables 670M+ and 248M records respectively.

Next, we'll sample the data and review the Snowsight results pane.

/* Sample the data */
select *
from order_header
where customer_id is not null
limit 1000;

Take a few moments to review the query results pane and familiarize yourself with all the things you can do. Shown below is an example where the Stats and Query History panes are visible. We can see distribution charts for every column in our SELECT list, filter the result set further in the UI, access the query profile, visualize the results as a chart, download the results to Excel, and more!

Snowsight_Results_Pane

Next, let's review a query that can be used to identify any duplicate records in a table based on the source ID. This is a common data profiling step.

/* Identify if a table contains duplicate records */
with dupes as
(
select franchise_id, count(*)
from tb_101.raw_pos.franchise
group by franchise_id
having count(*) > 1
)
select * 
from tb_101.raw_pos.franchise f 
where exists (select 1 from dupes d where d.franchise_id = f.franchise_id);

We can see the Franchise table contains duplicate records. Some are true duplicates, while others have different city values. We'll clean this data up in a later section when we develop our Dynamic Tables.

Creating New Objects

In this section, we'll:

/*-------------------------------------------------

--CREATE POWERBI SCHEMA AND WAREHOUSE

-------------------------------------------------*/

/*
Now, that we've profiled the dataset, let's create a new schema to store our 
Dynamic Tables that will be queried by our Power BI semantic model.
*/
use role sysadmin;

create or replace schema tb_101.powerbi;
use schema tb_101.powerbi;

/*
-create a warehouse for Power BI if it doesn't already exist
-we won't use this warehouse just yet, but it will be the warehouse used within our Power BI semantic model
*/
create or replace warehouse tb_powerbi_wh
warehouse_size = 'MEDIUM'
max_cluster_count = 1
min_cluster_count = 1
auto_suspend = 300
initially_suspended = true
comment = 'Warehouse used for the TB Power BI DQ semantic model';


/* ensure we're still using our tb_dev_wh */
use warehouse tb_dev_wh;


/*-------------------------------------------------

--CREATE TEST USER, ROLES, AND GRANT PPRIVILEGES

-------------------------------------------------*/

/* start by createing a new schema in the tb_101 database called powerbi */
use role sysadmin;

create or replace schema tb_101.powerbi;


/*
Create a new user called tb_bi_analyst that will be used to connect 
to Snowflake from Power BI

Use a strong, randomly generated password
*/
use role useradmin;

create or replace user tb_bi_analyst
/* populate with your own password */
  password = '';

Next, we'll create our four test roles and grant them to the user created in the previous step as well as the SYSADMIN role.

/* create testing roles */
use role securityadmin;
use warehouse tb_dev_wh;

create or replace role tb_bi_analyst_global;
create or replace role tb_bi_analyst_emea;
create or replace role tb_bi_analyst_na;
create or replace role tb_bi_analyst_apac;

/* grant the roles to the user we created above */
grant role tb_bi_analyst_global to user tb_bi_analyst;
grant role tb_bi_analyst_emea to user tb_bi_analyst;
grant role tb_bi_analyst_na to user tb_bi_analyst;
grant role tb_bi_analyst_apac to user tb_bi_analyst;

/* assign roles to sysadmin */ 
grant role tb_bi_analyst_global to role sysadmin;
grant role tb_bi_analyst_emea to role sysadmin;
grant role tb_bi_analyst_na to role sysadmin;
grant role tb_bi_analyst_apac to role sysadmin;

Lastly, we'll grant the necessary privileges to our four new roles.

/* grant permissions to database */
grant usage on database tb_101 to role tb_bi_analyst_global;
grant usage on database tb_101 to role tb_bi_analyst_emea;
grant usage on database tb_101 to role tb_bi_analyst_na;
grant usage on database tb_101 to role tb_bi_analyst_apac;

/* next, we'll add permissions to our powerbi schema */
grant all on schema tb_101.powerbi to role tb_data_engineer;
grant all on schema tb_101.powerbi to role tb_bi_analyst_global;
grant all on schema tb_101.powerbi to role tb_bi_analyst_emea;
grant all on schema tb_101.powerbi to role tb_bi_analyst_na;
grant all on schema tb_101.powerbi to role tb_bi_analyst_apac;

/* next, we'll add future grants so our analyst roles have access to any newly created objects */
grant all on future tables in schema tb_101.powerbi to role tb_data_engineer;
grant all on future tables in schema tb_101.powerbi to role tb_bi_analyst_global;
grant all on future tables in schema tb_101.powerbi to role tb_bi_analyst_emea;
grant all on future tables in schema tb_101.powerbi to role tb_bi_analyst_na;
grant all on future tables in schema tb_101.powerbi to role tb_bi_analyst_apac;

/* future grants for Dynamic Tables */
grant all on future dynamic tables in schema tb_101.powerbi to role tb_data_engineer;
grant all on future dynamic tables in schema tb_101.powerbi to role tb_bi_analyst_global;
grant all on future dynamic tables in schema tb_101.powerbi to role tb_bi_analyst_emea;
grant all on future dynamic tables in schema tb_101.powerbi to role tb_bi_analyst_na;
grant all on future dynamic tables in schema tb_101.powerbi to role tb_bi_analyst_apac;

/* lastly, grant usage on the tb_powerbi_wh so it can be used by each role */
grant usage on warehouse tb_powerbi_wh to role tb_bi_analyst_global;
grant usage on warehouse tb_powerbi_wh to role tb_bi_analyst_emea;
grant usage on warehouse tb_powerbi_wh to role tb_bi_analyst_na;
grant usage on warehouse tb_powerbi_wh to role tb_bi_analyst_apac;

grant usage on warehouse tb_dev_wh to role tb_bi_analyst_global;
grant usage on warehouse tb_dev_wh to role tb_bi_analyst_emea;
grant usage on warehouse tb_dev_wh to role tb_bi_analyst_na;
grant usage on warehouse tb_dev_wh to role tb_bi_analyst_apac;

Congrats! You have completed the data profiling section of this guide. Additionally, we have created the database schema, user, and roles that will be used throughout the rest of this guide. In the next section we'll review the Snowflake Marketplace and leverage a free marketplace listing to enhance our Tasty Bytes dataset further.

In this section, we'll provide an overview of the Snowflake Marketplace and leverage a free listing to enrich our Tasty Bytes dataset with a few mouse clicks.

Snowflake Marketplace

The Snowflake Marketplace is a component of the Snowflake AI Data Cloud that allows consumers (e.g., you) to explore and access data products such as third party datasets, connectors, native applications, and more. You can browse the Snowflake Marketplace here without having to be logged into your Snowflake account.

From our Snowflake Documentation, Marketplace consumers can do the following:

Searching the Snowflake Marketplace for Location Data

The Tasty Bytes dataset already contains a location table that details the location where each sales transaction occurred. However, that table lacks key location information, including: street addresses, postal codes, and longitude & latitude coordinates. We'll use the Snowflake Marketplace to access a free dataset that can fill in these data point gaps.

Access the Snowflake Marketplace from the left-hand navigation menu –> Data Products –> Marketplace

Snowflake_Marketplace

SafeGraph_Frostbyte_Listing

A new pop-up window will appear where you can change the name of the shared database that will be created in your account, as well as the roles that can access and query the database

SafeGraph_Get_Listing

Querying the Shared Database Created from the Marketplace

/*-------------------------------------------------

--SECTION 3 - THIRD PARTY DATA FROM SNOWFLAKE MARKETPLACE

-------------------------------------------------*/

/* set the worksheet context */
use role tb_dev;
use schema safegraph_frostbyte.public;
use warehouse tb_dev_wh;


/* sample the dataset */
select *
from frostbyte_tb_safegraph_s;

/* view location counts by country */
select 
    country,
    count(*)
from frostbyte_tb_safegraph_s
group by all;

/* issue a cross-database join to the raw_pos.location table and try joining on placekey */
select
    l.location_id,
    l.location,
    l.city as location_city,
    l.country as location_country,
    l.iso_country_code as location_country_iso,
    sg.top_category as location_category,
    sg.sub_category as location_subcategory,
    sg.latitude as location_latitude,
    sg.longitude as location_longitude,
    sg.street_address as location_street_address,
    sg.postal_code as location_postal_code
from tb_101.raw_pos.location l
left join safegraph_frostbyte.public.frostbyte_tb_safegraph_s sg
    ON sg.placekey = l.placekey;


/* 
create a copy of the shared SafeGraph data in the raw_pos schema so 
it can be included in our Dynamic Table definition in the next section 
*/
create or replace table tb_101.raw_pos.safegraph_frostbyte_location
as
select *
from safegraph_frostbyte.public.frostbyte_tb_safegraph_s;

Based on the results of the last query, it looks like Placekey field can be used to accurately join the SafeGraph dataset together with our Tasty Bytes dataset. We have successfully enriched our location data!

Congrats! You have completed the Snowflake Marketplace section of this guide. You should now be familiar with the Snowflake Marketplace and how it allows data consumers to easily enrich and augment their organizational assets with zero ETL, developing APIs, or leveraging legacy FTP processes. In just a few mouse clicks, we were able to add a new shared database to our Snowflake account and issue cross-database queries to enhance our Tasty Bytes data.

In this section, we'll cover two important topics in our end-to-end solution: Star Schemas and Dynamic Tables. We'll start by reviewing star schemas, how they're useful, and specifically, why they are so important for Power BI semantic models. Then, we'll review Dynamic Tables and how they can be used to easily transform your data into a suitable format for downstream analytics. If you're already familiar with these topics, feel free to skip to the "Transforming our Tasty Bytes Data with Dynamic Tables" step in this section.

What is a Star Schema?

Star schemas are the output of adhering to dimensional modeling in your data transformation process. Dimensional modeling is a database modeling technique popularized by Ralph Kimball in the 1990s. While dimensional modeling is the technique, star schemas are the byproduct, and many people say "star schemas" to refer to the entire process these days. The star schema name was coined because, when visualized, the database tables resemble a star with the fact table in the middle and dimension tables surrounding it.

Star_Schema

A star schema consists of two primary types of tables:

Facts

Dimensions

Why Star Schemas for Power BI Semantic Models?

Star schemas are important when designing your Power BI semantic models for a number of reasons, including:

For an exhaustive breakdown on the importance of star schemas when developing Power BI semantic models, check out Power BI's documentation here.

What are Dynamic Tables

With the star schema primer out of the way, let's shift our focus to Dynamic Tables. While fact and dimension tables are agnostic to any data platform, Dynamic Tables are a specific type of Snowflake table that we can use to easily transform our data. Dynamic Tables simplify data engineering pipelines by using declarative SQL to persist query results and scheduled refreshes to keep those results up to date. Dynamic Tables will look and feel like a regular Snowflake table, but will have an underlying SQL definition, similar to a view. However, unlike a view, that SQL logic will be executed on a regular refresh schedule and the query results will be persisted, thus yielding performance similar to a regular Snowflake table.

Dynamic_Tables

Dynamic Tables also provide the following benefits:

Automatic and transparent orchestration

Transforming our Tasty Bytes Data with Dynamic Tables

Once the script is complete, use Snowsight to navigate to your Dynamic Tables and view information about them. Check out the built-in lineage view!

/*-------------------------------------------------

--SECTION 4 - TRANSFORMING DATA WITH DYNAMIC TABLES

-------------------------------------------------*/

/* set worksheet context */
use role sysadmin;
use schema tb_101.powerbi;
use warehouse tb_dev_wh;



/*-------------------------------------------------

--CREATE STATIC DATE & TIME DIMENSIONS

-------------------------------------------------*/

/*
    --let's temporarily scale up our tb_de_wh to quickly create the fact tables and perform the initital data load
    --we'll scale this back down at the end
    --notice how Snowflake's elastic compute is available instantly!
*/
alter warehouse tb_de_wh set warehouse_size = '2x-large';


/*--------------------------------------------
--dim_date
--simple date dimension script sourced from - https://community.snowflake.com/s/question/0D50Z00008MprP2SAJ/snowflake-how-to-build-a-calendar-dim-table
--Can also easily source a free date dimension sourced from Marketplace providers
--------------------------------------------*/

/* set the date range to build date dimension */
set min_date = to_date('2018-01-01');
set max_date = to_date('2024-12-31');
set days = (select $max_date - $min_date);

create or replace table tb_101.powerbi.dim_date
(
   date_id int,
   date date,
   year string, 
   month smallint,    
   month_name string,    
   day_of_month smallint,    
   day_of_week  smallint,  
   weekday string,
   week_of_year smallint,    
   day_of_year  smallint,
   weekend_flag boolean
)
as
  with dates as 
  (
    select dateadd(day, SEQ4(), $min_date) as my_date
    from TABLE(generator(rowcount=> $days))  -- Number of days after reference date in previous line
  )
  select 
        to_number(replace(to_varchar(my_date), '-')),
        my_date,
        year(my_date),
        month(my_date),
        monthname(my_date),
        day(my_date),
        dayofweek(my_date),
        dayname(my_date),
        weekofyear(my_date),
        dayofyear(my_date),
        case when dayofweek(my_date) in (0,6) then 1 else 0 end as weekend_flag
    from dates;



/*--------------------------------------------
--dim_time
--simple time dimension (hour:min:seconds) script 
--------------------------------------------*/

--set the date range to build date dimension
set min_time = to_time('00:00:00');
set max_time = to_time('11:59:59');
set seconds = 86400;

create or replace table tb_101.powerbi.dim_time
(
  time_id int,
  time time,
  hour smallint,   
  minute smallint,    
  second smallint,   
  am_or_pm string,   
  hour_am_pm  string  
)
as
  with seconds as 
  (
    select timeadd(second, SEQ4(), $min_time) as my_time
    from table(generator(rowcount=> $seconds))  -- Number of seconds in a day
  )
  select
         to_number(left(to_varchar(my_time), 2) || substr(to_varchar(my_time),4, 2) || right(to_varchar(my_time), 2)),
         my_time,
         hour(my_time),
         minute(my_time),
         second(my_time),
         case
            when hour(my_time) < 12 THEN 'AM'
            else 'PM'
         end as am_or_pm,
         case
             when hour(my_time) = 0 THEN '12AM'
             when hour(my_time) < 12 THEN hour(my_time) || 'AM'
             when hour(my_time) = 12 THEN '12PM'
             when hour(my_time) = 13 THEN '1PM'
             when hour(my_time) = 14 THEN '2PM'
             when hour(my_time) = 15 THEN '3PM'
             when hour(my_time) = 16 THEN '4PM'
             when hour(my_time) = 17 THEN '5PM'
             when hour(my_time) = 18 THEN '6PM'
             when hour(my_time) = 19 THEN '7PM'
             when hour(my_time) = 20 THEN '8PM'
             when hour(my_time) = 21 THEN '9PM'
             when hour(my_time) = 22 THEN '10PM'
             when hour(my_time) = 23 THEN '11PM'
         end as Hour_am_pm
    from seconds;    


/*-------------------------------------------------

--DYNAMIC TABLES FOR OUR BUSINESS DIMENSIONS AND FACTS

-------------------------------------------------*/

/* dim_truck */
create or replace dynamic table dt_dim_truck
  target_lag = 'DOWNSTREAM'
  warehouse = tb_de_wh
  refresh_mode = incremental
  initialize = on_create
  as
    select distinct
        t.truck_id,
        t.franchise_id,
        m.truck_brand_name,
        t.primary_city as truck_city,
        t.region as truck_region,
        t.iso_region as truck_region_iso,
        t.country as truck_country,
        t.iso_country_code as truck_country_iso,
        t.franchise_flag,
        year as truck_year,
        (2023 - year) as truck_age,
        replace(t.make, 'Ford_', 'Ford') as truck_make,
        t.model as truck_model,
        t.ev_flag,
        t.truck_opening_date
    from tb_101.raw_pos.truck t
    join tb_101.raw_pos.menu m on m.menu_type_id = t.menu_type_id;

    
/* dim_franchise */
create or replace dynamic table dt_dim_franchise
  target_lag = 'DOWNSTREAM'
  warehouse = tb_de_wh
  refresh_mode = incremental
  initialize = on_create
  as
  with remove_duplicates as
  (
    select distinct
        f.franchise_id,
        f.first_name as franchise_first_name,
        f.last_name as franchise_last_name,
        f.city as franchise_city,
        f.country as franchise_country,
        f.e_mail as franchise_email,
        f.phone_number as franchise_phone_number,
        row_number()over(partition by franchise_id order by franchise_city ) as row_num
    from tb_101.raw_pos.franchise f
    )

    select *
    from remove_duplicates
    where row_num = 1;


/* dim_menu_item */
create or replace dynamic table dt_dim_menu_item
  target_lag = 'DOWNSTREAM'
  warehouse = tb_de_wh
  refresh_mode = incremental
  initialize = on_create
  as
     select 
        menu_item_id,
        menu_type_id,
        menu_type,
        item_category as menu_item_category,
        item_subcategory as menu_item_subcategory,
        menu_item_name,
        cost_of_goods_usd as cogs_usd,
        sale_price_usd,
        menu_item_health_metrics_obj:menu_item_health_metrics[0].ingredients as ingredients,
        menu_item_health_metrics_obj:menu_item_health_metrics[0].is_dairy_free_flag as is_dairy_free_flag,
        menu_item_health_metrics_obj:menu_item_health_metrics[0].is_gluten_free_flag as is_gluten_free_flag,
        menu_item_health_metrics_obj:menu_item_health_metrics[0].is_healthy_flag as is_healthy_flag,
        menu_item_health_metrics_obj:menu_item_health_metrics[0].is_nut_free_flag as is_nut_free_flag
    from tb_101.raw_pos.menu m;


/* dim_location */
create or replace dynamic table dt_dim_location
  target_lag = 'DOWNSTREAM'
  warehouse = tb_de_wh
  refresh_mode = incremental
  initialize = on_create
  as
    select
        l.location_id,
        l.location,
        l.city as location_city,
        case
            when l.country in ('England', 'France', 'Germany', 'Poland', 'Spain', 'Sweden') then 'EMEA'
            when l.country in ('Canada', 'Mexico', 'United States') then 'North America'
            when l.country in ('Australia', 'India', 'Japan', 'South Korea') then 'APAC'
            else 'Other'
        end as location_region,
        l.country as location_country,
        l.iso_country_code as location_country_iso,
/*
    Commented out the SafeGraph data points as not every user following this guide
    will have the ability to leverage the Snowflake Marketplace and create new
    shared databases if they are working out of their organization's Snowflake account
    
        sg.top_category as location_category,
        sg.sub_category as location_subcategory,
        sg.latitude as location_latitude,
        sg.longitude as location_longitude,
        sg.street_address as location_street_address,
        sg.postal_code as location_postal_code
*/
    from tb_101.raw_pos.location l;
/*    left join tb_101.raw_pos.safegraph_frostbyte_location sg on sg.placekey = l.placekey; */


/* dim_customer */
create or replace dynamic table dt_dim_customer
  target_lag = 'DOWNSTREAM'
  warehouse = tb_de_wh
  refresh_mode = incremental
  initialize = on_create
  as
    select
        cl.customer_id,
        cl.first_name as customer_first_name,
        cl.last_name as customer_last_name,
        cl.first_name || ' ' || cl.last_name as customer_full_name,
        cl.last_name || ', ' || cl.first_name as customer_last_first_name,
        cl.city as customer_city,
        cl.country as customer_country,
        cl.postal_code as customer_postal_code,
        cl.preferred_language as customer_preferred_language,
        cl.gender as customer_gender,
        cl.favourite_brand as customer_favorite_band,
        cl.marital_status as customer_marital_status,
        cl.children_count as customer_children_count,
        cl.sign_up_date as customer_signup_date,
        cl.birthday_date as customer_dob,
        cl.e_mail as customer_email,
        cl.phone_number as customer_phone_number
    from tb_101.raw_customer.customer_loyalty cl;


/* fact_order_detail */
create or replace dynamic table dt_fact_order_detail
  target_lag = 'DOWNSTREAM'
  warehouse = tb_de_wh
  refresh_mode = incremental
  initialize = on_create
  as


    select 
        od.order_id,
        od.order_detail_id,
        oh.truck_id,
        t.franchise_id,
        cast(oh.location_id as int) as location_id,
        od.menu_item_id,
        dd.date_id,
        ti.time_id,
        oh.customer_id,
        od.quantity,
        od.unit_price,
        od.price as line_total
    from tb_101.raw_pos.order_detail od
    join tb_101.raw_pos.order_header oh on oh.order_id = od.order_id
    join tb_101.raw_pos.truck t on t.truck_id = oh.truck_id
    join tb_101.powerbi.dim_date dd on dd.date =  to_date(oh.order_ts)
    join tb_101.powerbi.dim_time ti on ti.time = to_time(oh.order_ts);


    
/* fact_order_header */
create or replace dynamic table dt_fact_order_header
  target_lag = 'DOWNSTREAM'
  warehouse = tb_de_wh
  refresh_mode = incremental
  initialize = on_create
  as
    select
        order_id, 
        truck_id,
        franchise_id,
        location_id,
        customer_id,
        date_id,
        time_id,
        count(order_detail_id) as order_line_count,
        sum(quantity) as order_qty,
        sum(line_total) as order_total
    from dt_fact_order_detail
    group by 
        order_id, 
        truck_id,
        franchise_id,
        location_id,
        customer_id,
        date_id,
        time_id;


/* fact_order_agg */
create or replace dynamic table dt_fact_order_agg
  target_lag = '1 hour'
  warehouse = tb_de_wh
  as
    select 
        truck_id,
        franchise_id,
        location_id,
        customer_id,
        date_id,
        count(order_id) as order_count, 
        sum(order_line_count) as order_line_count,
        sum(order_qty) as order_qty,
        sum(order_total) as order_total 
    from dt_fact_order_header
    group by
        truck_id,
        franchise_id,
        location_id,
        customer_id,
        date_id;


/* scale the data engineering warehouse back down */
alter warehouse tb_de_wh set warehouse_size = 'small';
alter warehouse tb_de_wh suspend;

DT_Graph

Congrats, you have completed this section! You now should have a solid understanding of star schemas, why they're important when preparing your data for Power BI, and how we can leverage unique Snowflake features like Dynamic Tables to easily transform our data into a simple star schema design. Our Dynamic Tables are set to refresh every hour to keep our data fresh for analysis in Power BI. Depending on our requirements, we can always refresh these more or less frequently.

In this final section within Snowflake, we'll leverage Snowflake Horizon features to protect our data before allowing downstream analysis within Power BI. Specifically, we'll review how we can tag our sensitive data, obfuscate sensitive fields at query time, and enforce row-level security, so users are only able to see what they're allowed to see.

What is Snowflake Horizon?

Snowflake Horizon is Snowflake's internal catalog that allows organizations to discover, govern, secure, and take action on their data.

Snowflake Horizon offers a variety of features, but this section will focus on the following Data Governance capabilities:

Data_Governance

Tagging and Classification

Tags are Snowflake objects that can be assigned to other Snowflake objects (e.g., database, schema, table, view, warehouse, etc.). Tags create key-value pairs that allow data stewards to perform tasks such as: tracking sensitive fields, monitoring resource consumption by cost center, and auditing activities.

Classification is an automated process within Snowflake that can assign Snowflake-defined system tags to your data columns by analyzing your database fields and metadata for personal and sensitive data. The system tags assigned by Classification are tags that Snowflake creates, maintains, and makes available in the Snowflake metadata database.

There are two primary categories for system tags:

Semantic Category

Privacy Category

Now that we have a general understanding of tags within Snowflake, let's run the Classification process on our Power BI schema to have Snowflake automatically assign system tags to our data.

You can classify data using SQL commands or via the Snowsight UI. To streamline this step, we'll classify our schema using SQL commands so you can easily copy and paste in your own worksheet.

/*-------------------------------------------------

--SECTION 5 - PROTECTING SENSITIVE DATA WITH SNOWFLAKE HORIZON

-------------------------------------------------*/

/* set worksheet context */
use role sysadmin;
use schema tb_101.powerbi;
use warehouse tb_dev_wh;



/*-------------------------------------------------

--OBJECT TAGGING & CLASSIFICATION

-------------------------------------------------*/

show tables in schema tb_101.powerbi;

/* 
run the built in procedure below to run the 
classification process against our Customer dimension */
call system$classify('tb_101.powerbi.dt_dim_customer', {'auto_tag': true});

/*
--view the system tags generated by Snowflake
--these can be viewed in the Snowsight UI as well via the Governance dashboard or directly on the DT_Dim_Customer table
*/
select *
from table(
  tb_101.INFORMATION_SCHEMA.TAG_REFERENCES_ALL_COLUMNS(
    'tb_101.powerbi.dt_dim_customer',
    'table'
));

We can also create custom tags and assign them to our table columns. Run the SQL below to create custom tags.

/* create custom tags */
create or replace tag tb_101.powerbi.pii_name_tag
    comment = 'PII Tag for Name Columns';
    
create or replace tag tb_101.powerbi.pii_phone_number_tag
    comment = 'PII Tag for Phone Number Columns';
    
create or replace tag tb_101.powerbi.pii_email_tag
    comment = 'PII Tag for E-mail Columns';

create or replace tag tb_101.powerbi.pii_dob_tag
    comment = 'PII Tag for Date of Birth Columns';

    
/* 
with the custom tags created, assign them to the relevant 
columns in our customer dimension dynamic table 
*/
alter table tb_101.powerbi.dt_dim_customer 
    modify column customer_first_name 
        set tag tb_101.powerbi.pii_name_tag = 'First Name';

alter table tb_101.powerbi.dt_dim_customer 
    modify column customer_last_name 
        set tag tb_101.powerbi.pii_name_tag = 'Last Name';

alter table tb_101.powerbi.dt_dim_customer 
    modify column customer_full_name 
        set tag tb_101.powerbi.pii_name_tag = 'Full Name';

alter table tb_101.powerbi.dt_dim_customer 
    modify column customer_last_first_name 
        set tag tb_101.powerbi.pii_name_tag = 'Full Name';       

alter table tb_101.powerbi.dt_dim_customer 
    modify column customer_phone_number 
        set tag tb_101.powerbi.pii_phone_number_tag = 'Phone Number';

alter table tb_101.powerbi.dt_dim_customer 
    modify column customer_email
        set tag tb_101.powerbi.pii_email_tag = 'E-mail Address';

alter table tb_101.powerbi.dt_dim_customer 
    modify column customer_dob
        set tag tb_101.powerbi.pii_dob_tag = 'Date of Birth';

/* 
view the system and custom tags 
created on the customer dimension 
*/
select *
from table(
  tb_101.INFORMATION_SCHEMA.TAG_REFERENCES_ALL_COLUMNS(
    'tb_101.powerbi.dt_dim_customer',
    'table'
));


Masking Policies

With our data tagged, we can now implement column-level security with Snowflake dynamic data masking policies. Dynamic Data Masking provides a column-level security feature that will obfuscate or "mask" sensitive data at query time. When using masking policies, data is still stored as plain-text data within a table, but will be masked/hidden at query time.

Benefits of using dynamic data masking policies include:

Snowflake also has tag-based masking policies, which allow us to marry these two concepts together. With tag-based masking policies, we are assigning our masking policies to a tag. That way, whenever that tag is used on a column, the data is automatically protected with column-level security.

Continue building out the "4 - Data Governance" worksheet script by executing the code below to create masking policies and, ultimately, tag-based masking policies.

/*-------------------------------------------------

--MASKING POLICIES

-------------------------------------------------*/

/*
-- with our tags in place we can now our create our masking policies that will mask data for all but priviledged roles.
    --> create our name_mask to return **~MASKED~**s
*/
create or replace masking policy tb_101.powerbi.name_mask AS (val STRING) RETURNS STRING ->
    case 
        when CURRENT_ROLE() IN ('SYSADMIN', 'ACCOUNTADMIN', 'TB_BI_ANALYST_GLOBAL') THEN val
    else '**~MASKED~**'
end;


/* create our phone_mask to return only the first 3 numbers unmasked */
create or replace masking policy tb_101.powerbi.phone_mask AS (val STRING) RETURNS STRING ->
    case
        when CURRENT_ROLE() IN ('SYSADMIN', 'ACCOUNTADMIN', 'TB_BI_ANALYST_GLOBAL') THEN val
    else CONCAT(LEFT(val,3), '-***-****')
end;


/* create our email_mask to return ******@<provider>.com */
create or replace masking policy tb_101.powerbi.email_mask AS (val STRING) RETURNS STRING ->
    case 
        when CURRENT_ROLE() IN ('SYSADMIN', 'ACCOUNTADMIN') THEN val
    else CONCAT('******','@', SPLIT_PART(val, '@', -1))
end;


/* create our date of birth mask to truncate to first of month */
create or replace masking policy tb_101.powerbi.dob_mask AS (val date) RETURNS date ->
    case 
        when CURRENT_ROLE() IN ('SYSADMIN', 'ACCOUNTADMIN', 'TB_BI_ANALYST_GLOBAL') THEN val
    else date_trunc('month', val)
end;

/* 
use an alter tag statement to set the masking policies and create tag-based masking policies
NOTE: since the tags have already been set on PII columns, the masking policies will be implicity applied
*/

alter tag tb_101.powerbi.pii_name_tag 
    set masking policy tb_101.powerbi.name_mask;
    
alter tag tb_101.powerbi.pii_phone_number_tag
    set masking policy tb_101.powerbi.phone_mask;
    
alter tag tb_101.powerbi.pii_email_tag
    set masking policy tb_101.powerbi.email_mask;

alter tag tb_101.powerbi.pii_dob_tag
    set masking policy tb_101.powerbi.dob_mask; 

/* validate that our tag-based masking policies work */
use role sysadmin;

/* 
notice how actual values are displayed when 
we use an elevated role like SYSADMIN 
*/
select 
    customer_first_name,
    customer_last_name,
    customer_full_name,
    customer_dob,
    customer_email,
    customer_phone_number
from tb_101.powerbi.dt_dim_customer limit 5;

/* now use one of the analyst roles created earlier */
use role tb_bi_analyst_na;

/* 
notice how masked values are displayed when 
we use our custom role
*/
select
    customer_first_name,
    customer_last_name,
    customer_full_name,
    customer_dob,
    customer_email,
    customer_phone_number
from tb_101.powerbi.dt_dim_customer limit 5;

Row Access Policies

Snowflake row access policies are objects that are used to implement row-level security within Snowflake. Similar to data masking policies, these can be maintained centrally and reused across many tables/views. Row access policies are commonly used in tandem with a "mapping table" to dynamically determine which subset of data a role is able to see at query time.

Continuing with the existing script, we'll implement row-level security by:

/*-------------------------------------------------

--ROW ACCESS POLICIES

-------------------------------------------------*/

use role sysadmin;

create or replace table tb_101.powerbi.row_policy_map
    (
    role STRING, 
    location_id NUMBER
    );

/*
  - with the table in place, we will now insert the relevant role to location mappings
  - the TB_BI_ANALYST_GLOBAL role will have unrestricted access, while the region-specific roles will be mapped to specific regions
*/
insert into tb_101.powerbi.row_policy_map (role, location_id)
    select
        case 
            when location_region = 'EMEA' THEN 'TB_BI_ANALYST_EMEA'
            when location_region = 'North America' THEN 'TB_BI_ANALYST_NA'
            when location_region = 'APAC' THEN 'TB_BI_ANALYST_APAC'
        end AS role,
        location_id
    from dt_dim_location;


create or replace row access policy tb_101.powerbi.rap_dim_location_policy
    as (location_id NUMBER) RETURNS BOOLEAN ->
       CURRENT_ROLE() in 
       /* list of roles that will not be subject to the policy  */
           (
            'ACCOUNTADMIN','SYSADMIN', 'TB_BI_ANALYST_GLOBAL'
           )
        or exists
        /* this clause references our mapping table from above to handle the row level filtering */
            (
            select rp.role 
                from tb_101.powerbi.row_policy_map rp
            where 1=1
                and rp.role = CURRENT_ROLE()
                and rp.location_id = location_id
            );

/* Lastly, apply the row policy to our dim_location and fact tables */
alter table tb_101.powerbi.dt_dim_location
    add row access policy tb_101.powerbi.rap_dim_location_policy ON (location_id);

alter table tb_101.powerbi.dt_fact_order_detail
    add row access policy tb_101.powerbi.rap_dim_location_policy ON (location_id);

alter table tb_101.powerbi.dt_fact_order_header
    add row access policy tb_101.powerbi.rap_dim_location_policy ON (location_id);

alter table tb_101.powerbi.dt_fact_order_agg
    add row access policy tb_101.powerbi.rap_dim_location_policy ON (location_id);


/* validate that our tag-based masking policies work */

/* test with our global role*/
use role tb_bi_analyst_global;

select
    l.location_country,
    count(*) as record_count,
    sum(f.order_total) as sales_amt
from tb_101.powerbi.dt_fact_order_agg f
join tb_101.powerbi.dt_dim_location l on l.location_id = f.location_id
group by all;


/* test with our North America role */
use role tb_bi_analyst_na;

select
    l.location_country,
    count(*) as record_count,
    sum(f.order_total) as sales_amt
from tb_101.powerbi.dt_fact_order_agg f
join tb_101.powerbi.dt_dim_location l on l.location_id = f.location_id
group by all;


/* test with our APAC role */
use role tb_bi_analyst_apac;

select
    l.location_country,
    count(*) as record_count,
    sum(f.order_total) as sales_amt
from tb_101.powerbi.dt_fact_order_agg f
join tb_101.powerbi.dt_dim_location l on l.location_id = f.location_id
group by all;


Congrats! You have now added protections to your data. While this section only scratched the surface on Snowflake Horizon features, you should now be familiar with key features you can leverage to easily identify and protect your organization's sensitive data.

In this section, we'll leverage Power BI Desktop to connect to the data model we've developed in Snowflake. We'll review several key benefits of using Snowflake as your enterprise data platform to power your Power BI workloads. To keep things simple, a pre-built Power BI Template file can be downloaded here. Complete the steps below to connect the template file to your Snowflake account.

Download the Power BI Template File

Connecting to your Snowflake account

1 - Open the .pbit file from Power BI Desktop

2 - A parameters window will appear

3 - Once the parameters have been entered, you'll be prompted to authenticate

4 - Once authenticated, the summary report page will load, and you'll be able to see queries generated from Power BI within the Snowflake Query History UI

Exploring the Semantic Model

Navigate to the "model view" within Power BI Desktop. The model should look familiar because it's a visual representation of the same tables we created in section 4. Power BI makes it easy to focus on a subset of the overall semantic model. Click the various tabs at the bottom to view a single fact table, and its surrounding dimensions, at a time.

As you review the semantic model, notice that the Order Header and Order Agg tables are marked as hidden. This is because we are taking advantage of Power BI's User-defined aggregation feature. This feature will improve performance of DirectQuery semantic models by allowing Power BI to direct traffic/queries to the appropriate fact or aggregate table. For instance, if a query can be fully resolved by using the Order Agg table (14M records), then Power BI will do so. If it cannot, it will check to see if it can use the Order Header table (248M records). Lastly, if a given query cannot be resolved by either of those tables, it will use the most granular Order Details table (670M records). When designing your star schema in both Snowflake and your Power BI semantic model, it's important to understand the granularity of each fact table.

For reference:

Powerbi_Semantic_Model

Publishing to the Power BI Service

You can publish your work to the Power BI Service directly from Power BI Desktop using the "Publish" button on the Home ribbon. Alternatively, you can manually upload your .pbix file from your Power BI workspace. Once published, your single .pbix file will create two artifacts in your Power BI workspace: a "semantic model" and a "report". Open the "Settings" for your semantic model. This is where you will update your connection parameters, as well as use your TB_BI_ANALYST credentials to connect to Snowflake.

Semantic_Model_Params

Exploring Tasty Bytes data via Power BI

The semantic model we published is 100% DirectQuery mode. This means that Power BI will generate SQL queries that will be sent to Snowflake on-the-fly whenever a report is rendered, users change a slicer/filter, or a data point is selected on a visual (cross-filtering). When troubleshooting Power BI activity from Snowflake, the Query History UI (and underlying SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY view) will be your best aide. Snowflake's Query History will automatically log every single query executed in your Snowflake account, the user that issued the query, the warehouse used, the client driver used, and more.

When you first open your report, notice that it takes a moment for your visuals to render. Behind the scenes, Power BI is issuing queries to Snowflake, resuming the TB_POWERBI_WH (assuming it was suspended), and retrieving the results. As you click around on the report, several forms of caching are taking place.

These various layers of caching + taking advantage of Power BI's user-defined aggregations unlock the ability to perform an analysis of hundreds of millions to billions of records in real time from Power BI. All without having to wrestle with scheduled refreshes or purchase larger Power BI SKUs to fit into memory.

Let's begin interacting with the report:

Drillthrough

Verifying our data protections work

Verifying column-level security

Powerbi_Masking

Verifying row-level security

What we've just simulated is a scenario where we have a global user base running reports against the same semantic model, but they all belong to different roles. Since our semantic model is in DirectQuery mode, we are able to implicitly take advantage of our Snowflake masking and row access policies.

EMEA_Sales

Congrats! You have completed the final section of this guide. In this section, we connected our pre-built Power BI template to our Snowflake star schema. We then published this semantic model to a Power BI workspace, established our connection, and explored the Order Summary report which showcased Snowflake's instant elasticity, built-in caching, automatic logging, and column & row level security.

Congratulations! You have completed the End-to-End Analytics with Snowflake and Power BI Quickstart!

What You Learned

Related Resources and Quickstarts