Sharing information between departments, business units and subsidiaries of a company is critical for success, particularly when there are organizational silos in place. A modern data platform must provide decentralized ownership, universal discovery, access control, federated governance, and observability.
Snowflake Horizon is a unified suite of governance and discovery capabilities organized into five pillars.
This Quickstart is focused on the Horizon Access
pillar.
The objective of the Access pillar is to make it simple to share, discover, understand/build trust and access listings across any boundary, internal or external to the organization, and to make loose objects discoverable across account boundaries within an organization, supported by the tools necessary to ensure policy compliance, security, and data quality.
In this lab you will experience the latest Snowflake Horizon Access pillar features for sharing data and native apps intra-company: organizational listings, unified search & discovery, data quality monitoring, role-based governance policies and programmatic management of data products. We will cover structured and unstructured data that is stored on-platform or on external storage.
Signup for an AWS trial account here
horizonadmin
Note: alternatively you can use an existing account instead of a trial account, provided that account has the ORGADMIN
system role enabled.
horizonadmin
, grant it ACCOUNTADMIN and ORGADMIN roles.Execute the following SQL commands in the Account Setup worksheet to bootstrap:
USE ROLE accountadmin;
SET my_user_var = CURRENT_USER();
ALTER USER identifier($my_user_var) SET DEFAULT_ROLE = accountadmin;
CREATE OR REPLACE WAREHOUSE compute_wh WAREHOUSE_SIZE=small INITIALLY_SUSPENDED=TRUE;
GRANT ALL ON WAREHOUSE compute_wh TO ROLE public;
CREATE DATABASE IF NOT EXISTS snowflake_sample_data FROM SHARE sfc_samples.sample_data;
GRANT IMPORTED PRIVILEGES ON DATABASE snowflake_sample_data TO public;
-- Create an AWS Consumer account
USE ROLE orgadmin;
CREATE ACCOUNT horizon_lab_aws_consumer
admin_name = horizonadmin
admin_password = 'FILL_IN_PASSWORD'
email = 'FILL_IN_EMAIL'
must_change_password = false
edition = business_critical
region = AWS_US_WEST_2;
-- Create an Azure Consumer account
CREATE ACCOUNT horizon_lab_azure_consumer
admin_name = horizonadmin
admin_password = 'FILL_IN_PASSWORD'
email = 'FILL_IN_EMAIL'
must_change_password = false
edition = business_critical
region = AZURE_WESTEUROPE;
-- Verify that all three accounts are now created. Also, get the URLs
-- from the column account_url to log in to your consumer accounts later
SHOW ORGANIZATION ACCOUNTS;
-- Enable the ACCOUNTADMIN role on this account to enable global auto-fulfillment
USE ROLE orgadmin;
SELECT current_account_name();
SELECT SYSTEM$ENABLE_GLOBAL_DATA_SHARING_FOR_ACCOUNT('!FILL IN CURRENT_ACCOUNT_NAME()!');
-- PLEASE NOTE DOWN: "orgname-accountname" is the account format needed to add a connection to the Snowflake CLI
SELECT current_organization_name() || '-' || current_account_name();
Login to the HORIZON_LAB_AWS_CONSUMER and HORIZON_LAB_AZURE_CONSUMER accounts as horizonadmin
and run the following in a worksheet in each of the two accounts:
USE ROLE accountadmin;
SET my_user_var = CURRENT_USER();
ALTER USER identifier($my_user_var) SET DEFAULT_ROLE = accountadmin;
CREATE OR REPLACE WAREHOUSE compute_wh WAREHOUSE_SIZE=medium INITIALLY_SUSPENDED=TRUE;
GRANT ALL ON WAREHOUSE compute_wh TO ROLE public;
CREATE DATABASE IF NOT EXISTS snowflake_sample_data FROM SHARE sfc_samples.sample_data;
GRANT IMPORTED PRIVILEGES ON DATABASE snowflake_sample_data TO public;
USE ROLE useradmin;
CREATE OR REPLACE ROLE sales_emea_role
COMMENT = 'EMEA Sales role for Tasty Bytes';
CREATE OR REPLACE ROLE sales_americas_role
COMMENT = 'Americas Sales role for Tasty Bytes';
CREATE OR REPLACE ROLE sales_apj_role
COMMENT = 'APJ Sales role for Tasty Bytes';
CREATE OR REPLACE ROLE sales_manager_role
COMMENT = 'Sales Manager (all-access) role for Tasty Bytes';
-- grant all these roles to the login user
GRANT ROLE sales_emea_role TO USER identifier($my_user_var);
GRANT ROLE sales_americas_role TO USER identifier($my_user_var);
GRANT ROLE sales_apj_role TO USER identifier($my_user_var);
GRANT ROLE sales_manager_role TO USER identifier($my_user_var);
SHOW ROLES;
python --version
% brew install python
% sudo ln -s /usr/bin/python3 /usr/local/bin/python
% brew install pipx
% pipx ensurepath
horizon-aws-provider, horizon-aws-consumer, horizon-azure-consumer
SELECT current_organization_name() || '-' || current_account_name();
snow connection add -n horizon-aws-provider -a [orgname]-[provider_accountname] -u horizonadmin -r accountadmin -w compute_wh
snow connection add -n horizon-aws-consumer -a [orgname]-horizon_lab_aws_consumer -u horizonadmin -r accountadmin -w compute_wh
snow connection add -n horizon-azure-consumer -a [orgname]-horizon_lab_azure_consumer -u horizonadmin -r accountadmin -w compute_wh
snow connection set-default "horizon-aws-provider"
snow connection test --connection horizon-aws-provider
snow connection test --connection horizon-aws-consumer
snow connection test --connection horizon-azure-consumer
# list all connections and check that default is horizon-aws-provider
snow connection list
# you can also manually edit the config.toml file to remove or modify connections
snow --info
Raw text data provided for this lab is an extract from the IMDB Large Movie Review Dataset
Use of this dataset requires that we cite this ACL 2011 paper by Andrew Maas, et al: @InProceedings{maas-EtAl:2011:ACL-HLT2011, author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher}, title = {Learning Word Vectors for Sentiment Analysis}, booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies}, month = {June}, year = {2011}, address = {Portland, Oregon, USA}, publisher = {Association for Computational Linguistics}, pages = {142–150}, url = {http://www.aclweb.org/anthology/P11-1015} }
Weather Source LLC: frostbyte Marketplace listing requires accepting terms of use by the Provider and Snowflake.
Clone our Horizon Quickstart Scripts repository to your local machine with git
:
mkdir ~/snowflakelabs
cd ~/snowflakelabs
git clone git@github.com:Snowflake-Labs/sfguide-horizon-intra-organization-sharing.git horizon-intra-org-scripts
cd horizon-intra-org-scripts
If you prefer not to use git, Download ZIP
from the Lab Scripts github site
Load the SQL scripts in the code/sql
directory into Snowsight Worksheets - one script per worksheet
100_Setup_Data_Model
: create the TastyBytes foundational data model. TastyBytes is a fictitious global food truck network that operates in 30 major cities located in 15 countries with localized menu options and brands. The single Frostbytes_Tasty_Bytes
is organized in the following schemas:RAW_CUSTOMER
: raw customer loyalty data with personally identifiable information (PII)RAW_POS
: raw point-of-sale data denormalized by orders, menu, franchise and countryHARMONIZED
: blended metrics for customers and ordersANALYTICS
: analytic data that delivers insights for aggregate trends and drill down b Use the Run All pulldown command to run 100_Setup_Data_Model
:200_Setup_Data_Products
: build data assets to share in a Listing.In step 1(a) of thescript 200_Setup_Data_Products
you will acquire the Weather Source LLC listing from the Marketplace and install it as a shared database FROSTBYTE_WEATHERSOURCE
.
-- Step 1(a) - Acquire "Weather Source LLC: frostbyte" Snowflake Marketplace Listing
/*---
1. Click -> Data Products (Cloud Icon in left sidebar)
2. Click -> Marketplace
3. Search -> frostbyte
4. Click -> Weather Source LLC: frostbyte
5. Click -> Get
6. Click -> Options
6. Database Name -> FROSTBYTE_WEATHERSOURCE (all capital letters)
7. "Which roles, in addition to ACCOUNTADMIN, can access this database?" -> PUBLIC
8. Click -> Get
---*/
Then proceed to execute all remaining steps in this script. This will create secure views, materialized views, functions and dynamic tables in the ANALYTICS schema, and an internal stage for sharing text data.
Check out all the new objects created in the ANALYTICS and HARMONIZED schemas in the Snowsight Object Explorer panel. You will later create a Listing to share all of these objects.
We have extracted 100 text files from the IMDB Large Movie Review dataset into the repo, that was cloned to your local machine earlier.
Now we can copy those text files into the internal stage movie_stage
that was created by the SQL setup scripts. Run the snow CLI commands at the root of your repo, where the git clone
was done.
cd ~/git_repos/sfguide-horizon-intra-org
snow stage copy data/imdb_reviews @frostbyte_tasty_bytes.movie_reviews.movie_stage
# verify that there are now 100 files in the stage
snow stage list-files @frostbyte_tasty_bytes.movie_reviews.movie_stage
Setup is now complete!
In this section you will create, publish, consume, alter, and monitor a listing.
org-name.account-name
, which you can obtain as follows:select current_organization_name() ||'.'|| current_account_name();
SELECT *
. SELECT * REPLACE ( AS )
returns all columns but replaces the column
with the
.-- re-insert existing data for Berlin but give it today's date as the valid date
INSERT INTO FROSTBYTE_TASTY_BYTES.WEATHER.HISTORY_DAY
SELECT * REPLACE (current_date AS DATE_VALID_STD)
FROM FROSTBYTE_TASTY_BYTES.WEATHER.HISTORY_DAY
WHERE city_name = 'Berlin'
ORDER BY date_valid_std DESC;
SELECT *
FROM tasty_bytes_analytics.HARMONIZED.DAILY_WEATHER_V
WHERE city_name = 'Berlin'
ORDER BY date_valid_std DESC;
Time to revisit the second consumer account ("horizon_lab_azure_consumer") and the replication into that Azure region. By now, the one-time replication setup has been completed in the background and the data product is now ready to use.
use database SNOWFLAKE;
select * from DATA_SHARING_USAGE.LISTING_ACCESS_HISTORY;
select * from DATA_SHARING_USAGE.LISTING_AUTO_FULFILLMENT_DATABASE_STORAGE_DAILY;
select * from DATA_SHARING_USAGE.LISTING_AUTO_FULFILLMENT_REFRESH_DAILY;
select * from DATA_SHARING_USAGE.LISTING_EVENTS_DAILY;
select * from DATA_SHARING_USAGE.LISTING_TELEMETRY_DAILY;
select * from ORGANIZATION_USAGE.LISTING_AUTO_FULFILLMENT_USAGE_HISTORY;
select * from ORGANIZATION_USAGE.REPLICATION_USAGE_HISTORY;
The provider of a listing can choose to enable change tracking on the some or all of the tables or views that are shared in a listing. This enables the consumer to track the data changes. Let's do that with the view DAILY_WEATHER_V:
-- in the provider account:
ALTER VIEW FROSTBYTE_TASTY_BYTES.HARMONIZED.DAILY_WEATHER_V
SET CHANGE_TRACKING = TRUE;
The consumer can now define a Stream to capture the data changes in this view:
-- in the consumer account:
CREATE DATABASE tasty_bytes_local;
USE DATABASE tasty_bytes_local;
CREATE STREAM stream_daily_weather_changes ON VIEW tasty_bytes_analytics.HARMONIZED.DAILY_WEATHER_V;
To see the change tracking in action you can now insert, update, or delete some of the weather related data in the AWS Provider account. For example:
-- in the provider account:
INSERT INTO FROSTBYTE_TASTY_BYTES.WEATHER.HISTORY_DAY
SELECT * REPLACE (current_date AS DATE_VALID_STD)
FROM FROSTBYTE_TASTY_BYTES.WEATHER.HISTORY_DAY
WHERE city_name = 'San Mateo'
ORDER BY date_valid_std DESC;
Now switch to the consumer account and query the stream:
-- in the consumer account:
USE DATABASE tasty_bytes_local;
SELECT METADATA$ACTION, METADATA$ISUPDATE, *
FROM stream_daily_weather_changes;
This section of the lab introduces several capabilities for data providers to restrict the usage of their products by consumers.
Frosty the data steward is concerned that our listing that we have shared includes the view ANALYTICS.CUSTOMER_LOYALTY_METRICS_V which contains sensitive information that must not be accessible to all data consumers. He requests the following restrictions:
Let's implement a row-level access policy to implement the required access control. Note the usage of the context function current_account_name() to detect which consumer account is accessing the shared view.
Fill in AWS Provider Account Name below
use database frostbyte_tasty_bytes;
use schema analytics;
CREATE OR REPLACE ROW ACCESS POLICY country_filter AS (country string)
RETURNS boolean ->
CASE
WHEN current_account_name() IN ('HORIZON_LAB_AWS_CONSUMER')
AND country IN ('United States', 'Canada')
THEN true
WHEN current_account_name() IN ('HORIZON_LAB_AZURE_CONSUMER')
AND country IN ('France', 'Germany', 'Poland', 'Sweden', 'Spain')
THEN true
WHEN current_account_name() IN ('*** FILL IN AWS Provider Account Name ***')
THEN true
ELSE false
END;
Then apply the policy to the shared view:
ALTER VIEW CUSTOMER_LOYALTY_METRICS_V ADD ROW ACCESS POLICY country_filter ON (country);
Now switch to the consumer account HORIZON_LAB_AWS_CONSUMER to confirm that only US and Canadian client data is visible in the view ANALYTICS.CUSTOMER_LOYALTY_METRICS_V.
After the replication interval of 1 minute you will also see that the consumer account HORIZON_LAB_AZURE_CONSUMER can only see the Eurpean clients.
But, Frosty the data steward is not yet satisfied:
Ok, let's get to work.
To make things easy, let's first create a tag that you can use to indicate which columns contain PII data.
CREATE SCHEMA IF NOT EXISTS tags;
CREATE OR REPLACE TAG tags.tasty_pii
ALLOWED_VALUES 'NAME', 'PHONE_NUMBER', 'EMAIL', 'BIRTHDAY'
COMMENT = 'Tag for PII, allowed values are: NAME, PHONE_NUMBER, EMAIL, BIRTHDAY';
With the tag created, let's assign it to the relevant columns in the Customer Loyalty view:
ALTER VIEW ANALYTICS.CUSTOMER_LOYALTY_METRICS_V
MODIFY COLUMN
first_name SET TAG tags.tasty_pii = 'NAME',
last_name SET TAG tags.tasty_pii = 'NAME',
phone_number SET TAG tags.tasty_pii = 'PHONE_NUMBER',
e_mail SET TAG tags.tasty_pii = 'EMAIL';
Optionally, you can also use the UI to add or see the tags on these columns:
Now let's create a slightly more advanced policy to mask the PII columns depending on their tag value and the consmer account:
Fill in AWS Provider Account Name below
CREATE OR REPLACE MASKING POLICY pii_string_mask AS (value STRING) RETURNS STRING ->
CASE
-- two roles in the provider account have access to unmasked values
WHEN CURRENT_ACCOUNT_NAME() IN ('*** FILL IN AWS Provider Account Name ***')
AND CURRENT_ROLE() IN ('ACCOUNTADMIN','SYSADMIN')
THEN value
-- For consumers in the 2nd AWS account: if a column is tagged with
-- TASTY_PII=PHONE_NUMBER then mask everything except the first 3 digits
WHEN CURRENT_ACCOUNT_NAME() IN ('HORIZON_LAB_AWS_CONSUMER')
AND SYSTEM$GET_TAG_ON_CURRENT_COLUMN('TAGS.TASTY_PII') = 'PHONE_NUMBER'
THEN CONCAT(LEFT(value,3), '-***-****')
-- For consumers in the Azure account: if a column is tagged with
-- TASTY_PII=EMAIL then mask everything before the @ sign
WHEN CURRENT_ACCOUNT_NAME() IN ('HORIZON_LAB_AZURE_CONSUMER')
AND SYSTEM$GET_TAG_ON_CURRENT_COLUMN('TAGS.TASTY_PII') = 'EMAIL'
THEN CONCAT('**~MASKED~**','@', SPLIT_PART(value, '@', -1))
-- all other cases and columns, such as first and last name, should be fully masked
ELSE '**~MASKED~**'
END;
Next, apply the policy to the tag so that the policy takes effect on all tages columns:
ALTER TAG tags.tasty_pii SET MASKING POLICY pii_string_mask;
Now switch to the consumer account HORIZON_LAB_AWS_CONSUMER and look at the view CUSTOMER_LOYALTY_METRICS_V to confirm that phone numbers are partially masked while the other PII columns are fully masked.
After the replication interval of 1 minute you will see in the account HORIZON_LAB_AZURE_CONSUMER that emails are partially masked while phone numbers and names are fully masked.
Just when we thought we had all the necessary governance controls in place, Frosty has a new requirement for us.
So far we have been using the context function CURRENT_ACCOUNT_NAME() in our governance policies to control which consumer account can see which data. Now Frosty is telling us, that this needs to be more fine-grained down to indivudal roles on the consumer side.
We will be using roles and database roles for 3 different continents. Let's check that we have the correct roles in place.
On the AWS Provider account, use show database roles to confirm that you have 4 database roles in place. Else create them now.
show database roles in database FROSTBYTE_TASTY_BYTES;
USE DATABASE frostbyte_tasty_bytes;
CREATE OR REPLACE DATABASE ROLE tastybytes_emea_role;
CREATE OR REPLACE DATABASE ROLE tastybytes_americas_role;
CREATE OR REPLACE DATABASE ROLE tastybytes_apj_role;
CREATE OR REPLACE DATABASE ROLE tastybytes_manager_role;
We can use these roles to define more granular and role-based access control for the data consumers. First, we need to give these roles access to the providers's schema and objects that we want to govern, in this case ANALYTICS.CUSTOMER_LOYALTY_METRICS_V:
use database FROSTBYTE_TASTY_BYTES;
grant usage on schema ANALYTICS to database role tastybytes_emea_role;
grant usage on schema ANALYTICS to database role tastybytes_americas_role;
grant usage on schema ANALYTICS to database role tastybytes_apj_role;
grant usage on schema ANALYTICS to database role tastybytes_manager_role;
grant select on view ANALYTICS.CUSTOMER_LOYALTY_METRICS_V
to database role tastybytes_emea_role;
grant select on view ANALYTICS.CUSTOMER_LOYALTY_METRICS_V
to database role tastybytes_americas_role;
grant select on view ANALYTICS.CUSTOMER_LOYALTY_METRICS_V
to database role tastybytes_apj_role;
grant select on view ANALYTICS.CUSTOMER_LOYALTY_METRICS_V
to database role tastybytes_manager_role;
Next, use the context function IS_DATABASE_ROLE_IN_SESSION() to recreate our row-level access policy to define which role can see customer loyality data from which country.
Fill in Provider Account Name in the last WHEN clause of the policy below
use database frostbyte_tasty_bytes;
use schema analytics;
ALTER VIEW CUSTOMER_LOYALTY_METRICS_V DROP ROW ACCESS POLICY country_filter;
-- Fill in Provider Account Name below
CREATE OR REPLACE ROW ACCESS POLICY country_filter AS (country string)
RETURNS boolean ->
CASE
-- users with the AMERICAS role can see data from the US, Canada, and Brazil
WHEN IS_DATABASE_ROLE_IN_SESSION('TASTYBYTES_AMERICAS_ROLE')
AND country IN ('United States', 'Canada', 'Brazil')
THEN true
-- users with the EMEA role can see data from these EMEA countries
WHEN IS_DATABASE_ROLE_IN_SESSION('TASTYBYTES_EMEA_ROLE')
AND country IN ('France', 'Germany', 'Poland', 'Sweden',
'Spain' , 'South Africa', 'Egypt')
THEN true
-- users with the APJ role can see data from these APJ countries
WHEN IS_DATABASE_ROLE_IN_SESSION('TASTYBYTES_APJ_ROLE')
AND country IN ('Japan', 'Australia', 'India', 'South Korea')
THEN true
-- users with the manager role can see all data
WHEN IS_DATABASE_ROLE_IN_SESSION('TASTYBYTES_MANAGER_ROLE')
THEN true
-- the account admin in the provider account can see all data
WHEN current_account_name() IN ('*** FILL IN AWS Provider Account Name ***')
AND current_role() = 'ACCOUNTADMIN'
THEN true
ELSE false
END;
ALTER VIEW CUSTOMER_LOYALTY_METRICS_V
ADD ROW ACCESS POLICY country_filter ON (country);
For a larger number of roles and countries you can certainly use a mapping table from role to country so that the policy simply performs a lookup in the mapping table.
The final step in the provider account is to share the database roles to the consumer accounts along with the data product. This is achieved by granting the database roles to the share:
-- Use your share name in these commands.
-- See below for hints on how to find your share name.
GRANT DATABASE ROLE tastybytes_emea_role TO SHARE <share_name>;
GRANT DATABASE ROLE tastybytes_americas_role TO SHARE <share_name>;
GRANT DATABASE ROLE tastybytes_apj_role TO SHARE <share_name>;
GRANT DATABASE ROLE tastybytes_manager_role TO SHARE <share_name>;
Here are two options how to find the share name for your listing:
Option 1:
In the provider account, navigate to the Provider Studio, select "Listings" from the horizontal menu at the top, and open your listing. In the section "Data Product" you find the name of the Secure Share that bundles the shared data objects.
Option 2:
Use the SHOW SHARES command:
Copy the share name to a text file or worksheet because you will need it again later.
To complete the configuration of cross-account role-based access control you need to assign the shared database roles to local account roles in the consumer roles.
Switch to your consumer accounts.
In each of your consumer accounts you should already have the following 4 roles that you created during the account setup. Verify that these roles are in place. Else, create them now.
SHOW ROLES LIKE 'SALES%';
CREATE OR REPLACE ROLE sales_emea_role COMMENT = 'EMEA Sales role';
CREATE OR REPLACE ROLE sales_americas_role COMMENT = 'Americas Sales role ';
CREATE OR REPLACE ROLE sales_apj_role COMMENT = 'APJ Sales role ';
CREATE OR REPLACE ROLE sales_manager_role COMMENT = 'Manager (all-access) role';
In a real-world scenario you would now assign each of these roles to different users. For simplicity in this lab, grant all of these roles to yourself:
SET my_user_var = CURRENT_USER();
GRANT ROLE sales_emea_role TO USER identifier($my_user_var);
GRANT ROLE sales_americas_role TO USER identifier($my_user_var);
GRANT ROLE sales_apj_role TO USER identifier($my_user_var);
GRANT ROLE sales_manager_role TO USER identifier($my_user_var);
And finally, grant the shared database roles to the local account roles. This connects these local consumers roles to the row-access policy that you create on the provider side.
use database TASTY_BYTES_ANALYTICS;
grant database role tastybytes_emea_role to role sales_emea_role;
grant database role tastybytes_americas_role to role sales_americas_role;
grant database role tastybytes_apj_role to role sales_apj_role;
grant database role tastybytes_manager_role to role sales_manager_role;
The following picture illustrates the use of our database roles in this data sharing scenario.
Now switch to the different local roles (sales_emea_role, sales_apj_role, etc) in each of your consumer accounts to verify that each local role can only see those rows in the CUSTOMER_LOYALTY_METRICS_V view that are permitted by the row-level access policy in the provider account.
Frosty the data steward has a new requirement for us. In the consumer accounts, only admins and managers may see the detailed per-customer loyalty data. Anyone else may see aggregated data only.
Create the following aggregation policy to implement this requirement:
Fill in AWS Provider Account Name below
CREATE OR REPLACE AGGREGATION POLICY tasty_aggregation_policy
AS () RETURNS AGGREGATION_CONSTRAINT ->
CASE
WHEN current_account_name() IN ('*** FILL IN AWS Provider Account Name ***')
AND current_role() = 'ACCOUNTADMIN'
THEN NO_AGGREGATION_CONSTRAINT()
WHEN IS_DATABASE_ROLE_IN_SESSION('TASTYBYTES_MANAGER_ROLE')
THEN NO_AGGREGATION_CONSTRAINT()
ELSE AGGREGATION_CONSTRAINT(MIN_GROUP_SIZE => 50) -- at least 50 rows in aggregate
END;
ALTER VIEW analytics.CUSTOMER_LOYALTY_METRICS_V
SET AGGREGATION POLICY tasty_aggregation_policy;
Now switch to your consumer account horizon_lab_aws_consumer to verify the effect of the aggregation policy:
use role sales_apj_role;
-- sales_apj_role gets blocked from accessing any individual records:
SELECT * FROM analytics.CUSTOMER_LOYALTY_METRICS_V;
-- sales_apj_role can execute aggregation queries:
SELECT city, count(*) as num_cust_per_city
FROM analytics.CUSTOMER_LOYALTY_METRICS_V
GROUP BY city;
-- sales_manager_role is permitted to access individual records:
use role sales_manager_role;
SELECT * FROM analytics.CUSTOMER_LOYALTY_METRICS_V LIMIT 100;
Switch back to your AWS Provider account and issue the following command to deactivate the aggregation policy.
ALTER TABLE analytics.CUSTOMER_LOYALTY_METRICS_V UNSET AGGREGATION POLICY;
Next, let's also create a projection policy that prevents the city column from appearing in a result set but allows its usage in predicates to the restrict a query result:
Fill in AWS Provider Account Name below
CREATE OR REPLACE PROJECTION POLICY tasty_projection_policy
AS () RETURNS PROJECTION_CONSTRAINT ->
CASE
WHEN current_account_name() IN ('*** FILL IN AWS Provider Account Name ***')
AND current_role() = 'ACCOUNTADMIN'
THEN PROJECTION_CONSTRAINT(ALLOW => true)
WHEN IS_DATABASE_ROLE_IN_SESSION('TASTYBYTES_MANAGER_ROLE')
THEN PROJECTION_CONSTRAINT(ALLOW => true)
ELSE PROJECTION_CONSTRAINT(ALLOW => false)
END;
ALTER VIEW analytics.CUSTOMER_LOYALTY_METRICS_V
MODIFY COLUMN city
SET PROJECTION POLICY tasty_projection_policy;
Switch to your consumer account horizon_lab_aws_consumer again to explore the effect of the projection policy on the results or the following queries:
use role sales_apj_role;
SELECT * FROM analytics.CUSTOMER_LOYALTY_METRICS_V;
SELECT * EXCLUDE city FROM analytics.CUSTOMER_LOYALTY_METRICS_V;
SELECT * EXCLUDE city
FROM analytics.CUSTOMER_LOYALTY_METRICS_V
WHERE city IN ('Delhi','Tokyo','Seoul','Melbourne','Sydney','Mumbai');
SELECT city, count(*) as num_cust_per_city
FROM analytics.CUSTOMER_LOYALTY_METRICS_V
GROUP BY city;
Note that a projection policy by itself does not prevent users from detecting information about individuals. For example, the following query is permitted (and returns customer details if you remove the masking policy pii_string_mask):
SELECT * EXCLUDE city
FROM analytics.CUSTOMER_LOYALTY_METRICS_V
WHERE city = 'Melbourne' AND last_name = 'Arellano';
In this section the data provider will capture data quality metrics and share them with the data consumers. In particular, we want to monitor the data quality in the view ANALYTICS.ORDERS_BY_POSTAL_CODE_V.
On AWS Provider account, execute the following commands to create a database where we will define any custom quality functions.
use role accountadmin;
create or replace database tasty_bytes_quality;
use database tasty_bytes_quality;
create schema dq_functions;
Next, let's define how often the quality of ORDERS_BY_POSTAL_CODE_V should be checked. For a table, the quality checks can be triggered by data changes or executed on a schedule. For views, the quality metrics can (currently) be evaluated on a schedule.
Let's set the schedule to the shortest possible interval, which is 5 minutes:
ALTER VIEW FROSTBYTE_TASTY_BYTES.ANALYTICS.ORDERS_BY_POSTAL_CODE_V
SET DATA_METRIC_SCHEDULE = '5 MINUTE';
Now, let's use two of Snowflake's built-in data quality functions to count the number of NULL valuse in the column POSTAL_CODE as well as the number of distinct cities reported in this view:
ALTER VIEW FROSTBYTE_TASTY_BYTES.ANALYTICS.ORDERS_BY_POSTAL_CODE_V
ADD DATA METRIC FUNCTION SNOWFLAKE.CORE.NULL_COUNT ON (POSTAL_CODE);
ALTER VIEW FROSTBYTE_TASTY_BYTES.ANALYTICS.ORDERS_BY_POSTAL_CODE_V
ADD DATA METRIC FUNCTION SNOWFLAKE.CORE.UNIQUE_COUNT ON (CITY);
Additionally, let's create a custom data quality function that counts the number of outliers, i.e. postal areas with an exceptionally high or low number of orders:
CREATE OR REPLACE DATA METRIC FUNCTION tasty_bytes_quality.dq_functions.postal_code_order_outliers (t TABLE (count_order INTEGER) )
RETURNS INTEGER
AS
$$
select count(*)
from t
where count_order > 300000
or count_order < 30
$$;
The owner of the object that is being monitored needs to have the privilege to execute the custom data metric function and use the database and schema where that function resides. Additional privileges are required to execute the built-in metric functions or view their results. To keep it simple, let's grant the following privileges to all users:
GRANT ALL ON FUNCTION tasty_bytes_quality.dq_functions.postal_code_order_outliers(TABLE(INTEGER)) to role public;
GRANT USAGE ON DATABASE tasty_bytes_quality to role public;
GRANT USAGE ON SCHEMA tasty_bytes_quality.dq_functions to role public;
GRANT EXECUTE DATA METRIC FUNCTION ON ACCOUNT to role public;
GRANT DATABASE ROLE SNOWFLAKE.DATA_METRIC_USER to role public;
GRANT DATABASE ROLE SNOWFLAKE.USAGE_VIEWER to role public;
Now we can apply our customer quality function to the view ORDERS_BY_POSTAL_CODE_V:
ALTER VIEW FROSTBYTE_TASTY_BYTES.ANALYTICS.ORDERS_BY_POSTAL_CODE_V
ADD DATA METRIC FUNCTION tasty_bytes_quality.dq_functions.postal_code_order_outliers
ON (count_order);
Use the following command to verify that all three quality metrics have been scheduled correctly. Any permission problems would be reflected in the column "schedule_status". Possible status values are documented here.
SELECT schedule_status, *
FROM TABLE(
INFORMATION_SCHEMA.DATA_METRIC_FUNCTION_REFERENCES(
REF_ENTITY_NAME => 'FROSTBYTE_TASTY_BYTES.ANALYTICS.ORDERS_BY_POSTAL_CODE_V',
REF_ENTITY_DOMAIN => 'VIEW' )
);
After 5 minutes you can start observing quality metrics in the default event table where all quality results are recorded:
Unfortunately, accessing
snowflake.local.data_quality_monitoring_results
is not yet available in Snowflake trial accounts! Please skip ahead to the next section Sharing Data Quality Metrics if you are using a trial account.
SELECT scheduled_time, measurement_time, metric_name, metric_schema,
value, table_name, table_schema, table_database
FROM snowflake.local.data_quality_monitoring_results /* not yet available in trial accounts! */
ORDER BY measurement_time DESC;
Additionally, you could define Alerts to watch the data quality metrics and take action automatically if acceptable thresholds are exceeded. For example, if the number of outliers reported by our custom quality function exceeds a certain value an alert could copy the offending rows into an exception table for review and send an email notification.
How to share quality metrics from the event table with data consumers? At the time of authoring this lab (May 2024) event tables cannot be shared in a Listing directly. Similarly, views, streams, and dynamic tables are not yet an option for sharing data quality events.
And since Snowflake trial accounts cannot access the event table (yet!), let's setup a task that regularly inserts data quality metrics into a table for sharing:
USE DATABASE FROSTBYTE_TASTY_BYTES;
CREATE SCHEMA FROSTBYTE_TASTY_BYTES.dq;
-- this table will hold and share 7 days worth of quality metrics:
CREATE OR REPLACE TABLE FROSTBYTE_TASTY_BYTES.dq.shared_quality_events
(measurement_time TIMESTAMP,
table_name VARCHAR,
table_schema VARCHAR,
table_database VARCHAR,
metric_name VARCHAR,
metric_schema VARCHAR,
value INTEGER );
-- this task will maintain the table above:
CREATE OR REPLACE TASK FROSTBYTE_TASTY_BYTES.dq.subset_quality_events
SCHEDULE = '3 MINUTE'
AS BEGIN
DELETE FROM dq.shared_quality_events
WHERE measurement_time < current_date - 7;
INSERT INTO dq.shared_quality_events
SELECT current_timestamp, 'ORDERS_BY_POSTAL_CODE_V',
'ANALYTICS', 'FROSTBYTE_TASTY_BYTES',
'NULL_COUNT', 'SNOWFLAKE.CORE',
SNOWFLAKE.CORE.NULL_COUNT(
SELECT POSTAL_CODE
FROM ANALYTICS.ORDERS_BY_POSTAL_CODE_V);
INSERT INTO FROSTBYTE_TASTY_BYTES.dq.shared_quality_events
SELECT current_timestamp, 'ORDERS_BY_POSTAL_CODE_V',
'ANALYTICS', 'FROSTBYTE_TASTY_BYTES',
'UNIQUE_COUNT', 'SNOWFLAKE.CORE',
SNOWFLAKE.CORE.UNIQUE_COUNT(
SELECT city
FROM ANALYTICS.ORDERS_BY_POSTAL_CODE_V);
INSERT INTO FROSTBYTE_TASTY_BYTES.dq.shared_quality_events
SELECT current_timestamp, 'ORDERS_BY_POSTAL_CODE_V',
'ANALYTICS', 'FROSTBYTE_TASTY_BYTES',
'postal_code_order_outliers', 'dq_functions',
tasty_bytes_quality.dq_functions.postal_code_order_outliers(
SELECT count_order
FROM ANALYTICS.ORDERS_BY_POSTAL_CODE_V);
END;
ALTER TASK subset_quality_events RESUME;
Now you can add the table "shared_quality_events" to the shared data product. Here are 2 options how you can so this.
Option 1: Programmatically
Grant the share the necessary access to the "shared_quality_events" table. You should already have the share name from the early section on Database Roles. Else, get the share name as in the first step of option 2 below.
GRANT USAGE ON SCHEMA FROSTBYTE_TASTY_BYTES.dq TO SHARE <share_name>;
GRANT SELECT ON FROSTBYTE_TASTY_BYTES.dq.shared_quality_events TO SHARE <share_name>;
Option 2: In the UI
Take the following 3 steps in the UI:
ALTER TASK subset_quality_events SUSPEND;
ALTER VIEW FROSTBYTE_TASTY_BYTES.ANALYTICS.ORDERS_BY_POSTAL_CODE_V
UNSET DATA_METRIC_SCHEDULE;
The Snowflake Native Application Framework provides developers the ability to create data intensive applications that run within the Snowflake platform with versioning controls.
You will build a native app that visualizes Tasty Bytes food truck Sales over Time by City.
The app will allow filtering on Year and City, while displaying the underlying raw data and associated SQL query. Consumer account admins responsible for installing the application will supply a country lookup table that restricts what is shown by Country using a row-access policy.
Here is the directory structure of the code repository that was cloned or downloaded from GitHub earlier, relevant to app
artifacts for the native app:
|-- app
| |-- src
| |-- libraries
| | |-- environment.yml
| | |-- frosty_bytes_sis.py
| | |-- procs.py
| | |-- udf.py
| |-- manifest.yml
| |-- setup_script.sql
| |-- readme.md
|-- scripts
| |-- setup_package_script.sql
|-- snowflake.yml
src
- stores all source code including stored procedures, user-defined functions (UDFs), streamlit application and installation script setup_script.sql
.
setup_script.sql
- defines all Snowflake objects used within the application. Runs every time a user installs the application, such as when a Consumer account installs the listing.
Here is a snippet from setup_script.sql
that shows how a row-access policy limits country sales based on consumer region and role.
-- Create Row-Access Policy to limit access by Consumer region using a local mapping table
CREATE OR REPLACE TABLE app_instance_schema.region_country_map(region STRING, country STRING);
INSERT INTO app_instance_schema.region_country_map
VALUES ('PUBLIC.AZURE_WESTEUROPE','Germany'),
('PUBLIC.AZURE_WESTEUROPE','Spain'),
(......);
CREATE OR REPLACE ROW ACCESS POLICY app_instance_schema.country_row_policy
AS (country STRING) RETURNS BOOLEAN ->
country = 'Canada'
OR current_role() IN ('ACCOUNTADMIN','SYSADMIN','SALES_MANAGER_ROLE')
OR EXISTS
(
SELECT 1
FROM app_instance_schema.region_country_map map
WHERE 1=1
AND map.region = current_region()
AND map.country = country
)
COMMENT = 'Policy to limit rows returned based on region';
ALTER VIEW app_instance_schema.orders_v
ADD ROW ACCESS POLICY app_instance_schema.country_row_policy ON (country);
manifest.yml
- defines version metadata, location of artifacts (setup script, streamlit), configuration of logging/tracing.
setup_package_script.sql
- shares provider data that is needed by the application, executed when the application package is deployed or upgraded.
snowflake.yml
- Snowflake CLI configuration file that contains native app name, location of source code directory, location of package setup script setup_package_script.sql
frosty_bytes_sis.py
- Streamlit visualization python script that accesses the shared data within the application package.
Refer to the Getting Started with Native Apps Quickstart for more details.
Now let's create the Snowflake Application Package from the project files in our repo!
Open a new terminal at the root of the repository cloned/downloaded from Github and execute the following commands:
# verify that 3 account connections exist, with AWS Provider Account as DEFAULT
snow connection list
+-------------------------------------------------------------------------------------------------------------------------------+
| connection_name | parameters | is_default |
|------------------------+-----------------------------------------------------------------------------------------+------------|
| horizon-aws-provider | {'account': '***orgname-accountname***', 'user': 'horizonadmin', 'password': | True |
| | '****', 'warehouse': 'compute_wh', 'role': 'accountadmin'} | |
| horizon-aws-consumer | {'account': '**orgname**-horizon_lab_aws_consumer', 'user': 'horizonadmin', | False |
| | 'password': '****', 'warehouse': 'compute_wh', 'role': 'accountadmin'} | |
| horizon-azure-consumer | {'account': '***orgname***-horizon_lab_azure_consumer', 'user': 'horizonadmin', | False |
| | 'password': '****', 'warehouse': 'compute_wh', 'role': 'accountadmin'} | |
+-------------------------------------------------------------------------------------------------------------------------------+
snow app version create --skip-git-check
snow sql -q 'ALTER APPLICATION PACKAGE horizon_quickstart_package SET DEFAULT RELEASE DIRECTIVE version=V1 patch=0'
+---------------------------------------------------------+
| status |
|---------------------------------------------------------|
| Default release directive set to version 'V1', patch 0. |
+---------------------------------------------------------+
The app version create
command will upload source files to a stage and create the application package. Setting the default release version and patch is a required step before the application package can be published in a listing. Documentation for publishing native applications is here
You should now see the HORIZON_QUICKSTART_PACKAGE listed under the Databases
panel in Snowsight.
Return to Snowsight and ensure your role is ACCOUNTADMIN
(check and set if necessary at the bottom left corner)
Navigate to Provider Studio (under Data Products) and click on + Listing
to author a native app listing.
Now select the HORIZON_QUICKSTART_PACKAGE
and fill in a brief description using the rich text editor. You can use this text to get started, but feel free to get creative!
Tasty Bytes is a fictitious global food truck networkthat is on a mission to serve unique food options with high quality items in a safe, convenient and cost effective way.In order to drive forward on their mission, Tasty Bytes is beginning to leverage the Snowflake Data Cloud.
This application provides Total Sales (USD) by Year for selected cities in an interactive graphical visualization. The cities available are dependent upon the Consumer account region: only sales for regional cities are shown. Also displayed are the underlying raw sales data and associated SQL query.
Add 2 Consumer Accounts:
and
The addition of a consumer account in a different cloud (or region) will reveal the auto-fulfillment panel. Provide your email to receive notifications and Publish
Initially the auto-fulfillment frequency is set to 1 Day, you can change it to 15 minutes after the listing is published.
We are now at the moment of truth!
Switch to the HORIZON_LAB_AZURE_CONSUMER account
With listing auto-fulfillment, replication will only be initiated from Provider AWS region to Consumer Azure region when there is a request. Navigate to Snowsight Data Products -> Private Sharing
to acquire the Tasty Bytes Global Sales application:
While the application objects are being delivered cross-cloud, we can now switch to the local AWS region consumer and install immediately.
Switch to the HORIZON_LAB_AWS_CONSUMER account
Open the application and while it is spinning up, click on the Manage Access
button at top-right and give PUBLIC access.
Now try adding a few cities in addition to Vancouver:
Tasty Bytes is sold in 15 countries and 30 cities worldwide. Consider these questions and see if you can come up with convincing answers:
frosty_bytes_sis.py
]setup_script.sql
]ACCOUNTADMIN
,SALES_MANAGER_ROLE
,PUBLIC
] - does the city list change?Now it is time to return to HORIZON_LAB_AZURE_CONSUMER and check if auto-fulfillment has completed...
Open the application and again while it is spinning up, click on the Manage Access
button to give PUBLIC access.
Try adding a few cities in addition to Vancouver as we did with the AWS US West consumer account
Streamlit does not currently allow role-based policies because of security concerns, so only the region mapping in the policy will apply.
To observe how the row-access policy evaluation of current_role()
works in an application instance, outside of Streamlit, step through this SQL snippet in a Worksheet
on the HORIZON_LAB_AWS_CONSUMER and HORIZON_LAB_AZURE_CONSUMER accounts:
USE APPLICATION TASTY_BYTES_GLOBAL_SALES;
USE ROLE ACCOUNTADMIN;
select distinct primary_city from tasty_bytes_global_sales.app_instance_schema.orders_v;
USE ROLE SALES_AMERICAS_ROLE;
select distinct primary_city from tasty_bytes_global_sales.app_instance_schema.orders_v;
USE ROLE SALES_APJ_ROLE;
select distinct primary_city from tasty_bytes_global_sales.app_instance_schema.orders_v;
USE ROLE SALES_EMEA_ROLE;
select distinct primary_city from tasty_bytes_global_sales.app_instance_schema.orders_v;
USE ROLE SALES_MANAGER_ROLE;
select distinct primary_city from tasty_bytes_global_sales.app_instance_schema.orders_v;
Iceberg tables in Snowflake combine the performance and query semantics of regular Snowflake tables with external cloud storage managed by the customer. Snowflake supports Iceberg tables that use the Apache Parquet file format.
Creating and writing data into a Snowflake Iceberg table is beyond the scope of this lab; we will focus only on the Consumer side of Iceberg table sharing.
Using the method outlined in Getting Started with Iceberg Tables Quickstart, create Iceberg Tables and policies based on what we created in the previous lab section.
On the Provider AWS Account
execute the steps listed in code/sql/reference/iceberg_provider.sql
cloned from Horizon Quickstart Scripts repository earlier:
FROSTBYTE_ICEBERG
database with schemas for ANALYTICS, RAW_POS, RAW_CUSTOMER, GOVERNANCE, TPCH.CUSTOMER_LOYALTY_ICEBERG
, ORDER_HEADER_ICEBERG
, CUSTOMER_TPCH_ICEBERG
, NATION_TPCH_ICEBERG
.CUSTOMER_LOYALTY_METRICS_V
that joins multiple iceberg tables.TASTYBYTES_MANAGER_ROLE
and TASTYBYTES_ANALYST_ROLE
to restrict access for consumers.CUSTOMER_COUNTRY_ROW_POLICY
to CUSTOMER_LOYALTY_ICEBERG.COUNTRY
column that filters based on database role using the is_database_role_in_session()
context function.ICEBERG_LAB_ANALYTICS
, attach iceberg tables and secure view.Now switch to the HORIZON_LAB_AWS_CONSUMER account.
Navigate to Private Sharing
in Snowsight and install the ICEBERG_LAB_ANALYTICS
listing that was shared by the AWS Provider. Run these post-installation steps in a worksheet.
USE ROLE accountadmin;
SHOW DATABASE ROLES IN DATABASE iceberg_lab_analytics;
GRANT DATABASE ROLE iceberg_lab_analytics.tastybytes_manager_role to ROLE sales_manager_role;
GRANT DATABASE ROLE iceberg_lab_analytics.tastybytes_analyst_role to ROLE public;
Create two users to test access controls on the incoming Iceberg Analytics listing.
USE ROLE ACCOUNTADMIN;
CREATE OR REPLACE USER horizonengineer
PASSWORD=''
DEFAULT_ROLE = PUBLIC
MUST_CHANGE_PASSWORD = FALSE
DEFAULT_WAREHOUSE = COMPUTE_WH;
CREATE OR REPLACE USER horizonmanager
PASSWORD=''
DEFAULT_ROLE = SALES_MANAGER_ROLE
MUST_CHANGE_PASSWORD = FALSE
DEFAULT_WAREHOUSE = COMPUTE_WH;
GRANT ROLE SALES_MANAGER_ROLE TO USER horizonmanager;
Now run these queries that were entered as Sample Queries in the Iceberg listing. Compare results as a horizonengineer
and horizonmanager
user that leverages the row-access policy to limit sales analytics.
// Customer Sales by City
/*
Total food truck sales in USD by city
*/
SELECT
clm.city,
ROUND(SUM(clm.total_sales), 0) AS total_sales_usd
FROM analytics.customer_loyalty_metrics_v clm
GROUP BY clm.city
ORDER BY total_sales_usd DESC;
// Total Orders by Gender
/*
What are the total order amounts in each city by gender?
*/
SELECT
cl.gender,
cl.city,
COUNT(oh.order_id) AS count_order,
ROUND(SUM(oh.order_amount),0) AS order_total,
current_time()
FROM raw_pos.order_header_iceberg oh
JOIN raw_customer.customer_loyalty_iceberg cl
ON oh.customer_id = cl.customer_id
GROUP BY ALL
ORDER BY order_total DESC;
// Visible Countries and Cities
/*
How many cities in what countries can I view analytics data for?
*/
SELECT DISTINCT COUNTRY, CITY FROM analytics.customer_loyalty_metrics_v ORDER BY COUNTRY;
;
// TPCH Benchmark - Returned Item Reporting Query (Q10)
/*
The Returned Item Reporting Query finds the top 10 customers, in terms of their effect on lost revenue for a given quarter, who have returned parts. The customers are listed in descending order of lost revenue.
*/
SELECT
c_custkey,
c_name,
TRUNCATE(SUM(l_extendedprice * (1 - l_discount))) AS lost_revenue,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
FROM
tpch.customer_tpch_iceberg,
SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.ORDERS,
SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.LINEITEM,
tpch.nation_tpch_iceberg
WHERE
c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND o_orderdate >= to_date('1993-10-01')
AND o_orderdate < dateadd(month, 3, to_date('1993-10-01'))
AND l_returnflag = 'R'
AND c_nationkey = n_nationkey
GROUP BY
c_custkey,
c_name,
c_acctbal,
c_phone,
n_name,
c_address,
c_comment
ORDER BY
lost_revenue DESC
LIMIT 10
;
We have previously staged 100 movie reviews extracted from the IMDB Large Movie Review Dataset.
Directory Tables are built-in Snowflake objects that provide an updated, tabular file catalog for staged data that makes it easy to search and lookup files.
USE ROLE sysadmin;
USE WAREHOUSE tasty_de_wh;
USE SCHEMA frostbyte_tasty_bytes.movie_reviews;
ALTER STAGE movie_stage refresh;
SELECT * FROM DIRECTORY(@movie_stage);
Snowflake offers access to unstructured data through three types of URLs based on the access policy that is required:
We will use Scoped URL to share these text files within and across cloud regions. Unlike with data and app sharing, the unstructured data will not be physically replicated cross-cloud.
On the AWS Provider Account open a SQL Worksheet called Unstructured Data
and execute these commands sequentially:
USE ROLE sysadmin;
USE WAREHOUSE tasty_de_wh;
USE SCHEMA frostbyte_tasty_bytes.movie_reviews;
-- Here are the 100 review text files: click on any scoped URL to download and view
SELECT relative_path
, build_scoped_file_url(@movie_stage, relative_path) as scoped_url
from directory(@movie_stage);
-- Snowpark Python UDF to extract review contents from each file URL
CREATE FUNCTION extract_review(file_path string)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python')
HANDLER = 'parse_text_file'
AS
$$
def parse_text_file(file_path):
from snowflake.snowpark.files import SnowflakeFile
with SnowflakeFile.open(file_path, 'r') as f:
lines = [ line.strip() for line in f ]
return lines
$$
;
-- Share secure view that extracts review contents for the Consumer
CREATE OR REPLACE SECURE VIEW movie_reviews_v
AS
SELECT relative_path as review_file_name,
extract_review(build_scoped_file_url(@movie_stage, relative_path)) as review_content,
build_scoped_file_url(@movie_stage, relative_path) as scoped_url
FROM DIRECTORY(@movie_stage);
SELECT * FROM movie_reviews_v limit 10;
-- Use the SUMMARIZE and SENTIMENT in-built Cortex LLM functions to derive insights from these reviews
SELECT review_file_name,
SNOWFLAKE.CORTEX.SUMMARIZE(review_content) as Summary,
TO_DECIMAL(SNOWFLAKE.CORTEX.SENTIMENT(review_content),3,2) as Sentiment,
scoped_url
FROM movie_reviews_v
ORDER BY Sentiment DESC
LIMIT 10;
Now this secure view can be shared in a private listing targeted at horizon_lab_aws_consumer
and horizon_lab_azure_consumer
just as we did in earlier sections with data and apps.
Create a new listing called IMDB Movie Reviews
in Provider Studio to publish the review files and content:
MOVIE_REVIEWS_V
viewSUMMARIZE
and SENTIMENT
Cortex LLM functions in the Sample Queries section to make it easy for ConsumersPublishing Fails - do you understand why?
Hint: MOVIE_REVIEWS_V
uses a Python UDF for review content extraction, which can only be shared in a Native App!
Return to the HORIZON_QUICKSTART_PACKAGE
we created in the previous section and publish an update after adding MOVIE_REVIEWS_V
. Here are the steps:
scripts/setup_package_script.sql
: share the movie_reviews_v
view similar to how orders_v
view was handledapp/src/setup_script.sql
: create a proxy view movie_reviews_v
similar to orders_v
in app_instance_schema
grant select on movie_reviews_v
to the application role app_instance_role
to make it visiblesnow app version create V2 --skip-git-check
snow sql -q 'ALTER APPLICATION PACKAGE horizon_quickstart_package SET DEFAULT RELEASE DIRECTIVE version=V2 patch=0'
And that is all it takes to upgrade a native app! (here is the App Upgrade Workflow doc)
On the AWS Consumer Account run this in a SQL Worksheet:
USE ROLE ACCOUNTADMIN;
-- install the app upgrade
ALTER APPLICATION TASTY_BYTES_GLOBAL_SALES UPGRADE;
USE APPLICATION TASTY_BYTES_GLOBAL_SALES;
SELECT * FROM app_instance_schema.movie_reviews_v limit 10;
SELECT review_file_name,
SNOWFLAKE.CORTEX.SUMMARIZE(review_content) as Summary,
TO_DECIMAL(SNOWFLAKE.CORTEX.SENTIMENT(review_content),3,2) as Sentiment,
scoped_url
FROM app_instance_schema.movie_reviews_v
ORDER BY Sentiment DESC
LIMIT 10;
-- Use this query to monitor usage of AI Services
-- Note: this will only populate on a daily basis
SELECT *
FROM snowflake.account_usage.metering_daily_history
WHERE SERVICE_TYPE='AI_SERVICES';
Try the same steps on the Azure Consumer Account:
How do you expect the refresh frequency for cross-cloud fulfillment to impact the application upgrade? (documentation on cross-region application upgrades here)
Congratulations, you made it through our Horizon Access journey! You have exercised a broad range of data sharing and governance capabilities. You have worked with different types of data products including structured data, unstructured data, and native applications. And you have deployed different types of governance policies to implement data access and data privacy restrictions.