"By failing to prepare, you are preparing to fail" - Benjamin Franklin

A sound business continuity/disaster recovery plan is an important milestone for organizations looking to build resilient data platforms. Data-driven insights grow revenue, provide services, support customers and aid with critical decision making. Businesses face many forms of disruptions like fire, floods, cyberattacks, ransomware etc. These disruptions can spiral into various undesirable outcomes - from operational systems coming to a standstill leading to financial losses to something worse, like reputational damage.

Technology is the center-piece of all businesses and mission-critical data can never be unavailable. As businesses move critical applications to the public cloud, continuous global operations rely even more on cloud dependability.

Snowflake's Replication and Failover/Failback feature enables account metadata, including everything from user identity and role-based access controls to governance policies, warehouses and resource monitors, to be automatically synchronized across clouds and regions for continuous availability. Multiple databases, shares and account metadata can be replicated as a unit, ensuring point-in-time consistency with the primary region.

Snowflake's Client Redirect feature facilitates seamless failover from primary to secondary so that apps and users can continue functioning without disruption. In the past, to recover, the connection string in every client application had to be manually changed to connect to the new primary site (the former secondary site) in a different region or cloud. With Client Redirect, administrators can use a single server-side command to redirect client applications to the new primary site, without disrupting business teams.

Prerequisites

What You'll Learn

In this quickstart you will learn

What You'll Need

What You'll Build

Download ZIP or Clone sfguide_failover_scripts Github repo to your local machine

Code Button

To match provided scripts, copy data and scripts directories to /tmp

cd sfguide_failover_scripts-main
cp -r data /tmp
cp -r scripts /tmp

Run SQL scripts 100-600 in sequential order on the AWS Primary account

Create Snowsight Worksheets from SQL File - use one worksheet per SQL script

Create Worksheet

  1. 100_create_replication_objects.sql: create roles, databases, warehouses, resource monitors
  2. 200_create_users_source.sql: create users with basic password authentication
  3. 300_grant_privileges.sql: grant object and account level privileges
  4. 400_load_data.sql: create and load tables, views, functions, stored procedures, shares
  5. 500_add_governance_policies.sql: assign tags, masking and row access policies
  6. 600_update_primary_task.sql: setup task to update primary tables every few mins

Ingest Payroll Data

Load data file data/hr_data_sample.csv into table PAYROLL.NOAM_NORTHEAST.EMPLOYEE_DETAIL with the Snowsight UI interface.

load_payroll_1

Browse to find the data/hr_data_sample.csv file and set the CSV format correctly

load_payroll_2

You should have successfully loaded 100 records

load_payroll_3

Create ELT Pipeline using the TPC-DI benchmark schema

This modified TPC-DI benchmark flow simulates the process of extracting data from operational (OLTP) systems in a variety of data formats, loading into a set of staging tables, then transforming into a unified model for analytical reporting and insights.

We will observe how Snowflake's pipeline replication features (in Public Preview) support seamless failover and idempotent restart of the pipeline on the new primary.

  1. 700_setup_tpcdi_pipeline.sql: create external stage, internal stage w/directory table, create staging tables and batch load plan, use dynamic table DAG to build a declarative pipeline from staging to warehouse.Stop after creating the internal stage here, run Step 8, then continueswitch-to-step8
  2. Use SnowSQL CLI to load batches from data/tpcdi-scale5 and data/tpcdi-scale10 to internal stageRun in a Terminal window (after installing SnowSQL): this script assumes file locations are /tmp/data and /tmp/scripts, modify appropriately.
% snowsql -a ORGNAME-ACCOUNTNAME -u snowgrid -f /tmp/scripts/snowsql/tpcdi_load.sql -o output_format=csv -o output_file=output_file.csv
  1. 800_execute_pipeline.sql: call batch load procedure that simulates an ingest pipelineONLY Load Batch 1 (for Scale 5, Scale 10, Scale 100)

Our scripts in the previous step have created a production like snowflake environment for us. Here's a list of objects you just created when you ran those scripts:

Phew! That's quite a list here's what all of this looks like in a picture:

snowflake_source_account

Let's review our source account

Observe your RBAC

Below is what the roles hierarchy should look like in your source account. Do note that the roles hierarchy is only available via the new UI - Snowsight. Log on to Snowsight and assume the "securityadmin" role and then navigate to home -> admin -> users & roles -> roles.

snowsight_screenshot

Verify Row Access Policy

Our row access policy is applied to the global_sales.online_retail.customer table with the following rules:

Try running this query with role syadmin, sales_analyst, sales_admin, product_manager and notice how data is returned based on the row access policy.

use role sysadmin;
use warehouse sales_wh;
select * from global_sales.online_retail.customer limit 100;

When we replicate our data and our account objects, row level security is applied to the target account as well. This ensures that your access rules around data are retained even on the DR instance.

Verify dynamic data masking policy

Run the query below with two different roles - hr_analyst and hr_admin, observe all fields in the return results. What values does hr_analyst see for email, iban, cc and salary columns? What values does the hr_admin see?

use role hr_analyst;
use warehouse hr_wh;
select * from payroll.noam_northeast.employee_detail limit 100;

Verify data shares

We've created a few data shares with different variations, we'll observe what permissions each of these shares have and whether these are replicated as is to our secondary account.

Below code snippet displays permissions on three data shares - GLOBAL_SALES_SHARE, INVENTORY_SHARE and CROSS_DATABASE_SHARE

use role accountadmin;
show grants to share global_sales_share;
show grants to share inventory_share;
show grants to share cross_database_share;

Verify location, type and owner of governance policies

We have built 6 masking policies, 4 object tags and 2 row access policies that we use to protect our data. Observe their details like which schema are these policies kept in, who owns them etc.

use role governance_admin;
show masking policies;
show row access policies;
show tags;

The output of the three commands should look something like this:

Masking Policies:masking_policy

Row Access Policy:row_access_policy

Object Tags:object_tags

View Data Pipeline built from TPC-DI spec

dynamic-table-graph

Now that our Primary AWS account has been populated with users, database objects, governance policies, account metadata and data tranformation pipelines - we are ready to configure our Azure account as a Secondary target.

Snowflake BCDR is simple to setup, maintain and test. The key capabilities are Replication with Failover and Client Redirect, available only with Business Critical edition (or higher).

We will create these two first-class Snowflake objects that deliver business continuity:

Run in a SQL worksheet named BCDR Configuration on the Primary(AWS)

SUBSTITUTE organization_name and target account_name in these commands to create connection and failover group objects:

USE ROLE accountadmin;
SHOW REPLICATION ACCOUNTS;

-- record organization_name, target_account_name (Azure) here

CREATE CONNECTION IF NOT EXISTS prodsnowgrid;
ALTER CONNECTION prodsnowgrid 
    ENABLE FAILOVER TO ACCOUNTS <organization_name.target_account_name>;

SHOW CONNECTIONS;

-- record connection-url here

-- no databases with stages should be included yet
CREATE FAILOVER GROUP sales_payroll_financials
    OBJECT_TYPES = users, roles, warehouses, resource monitors, databases, shares, account parameters, network policies
    ALLOWED_DATABASES = common, crm, cross_database, global_sales, inventory, loyalty, payroll, products, externals, references, sales, salesforce, snowflake_ha_monitor, stores, suppliers, support, web_logs
    ALLOWED_SHARES = global_sales_share,sales_history_share, cross_database_share, crm_share, inventory_share
    ALLOWED_ACCOUNTS = XLWGQVZ.SNOWFLAKE_AZURE_TARGET;

-- enable replication of stages, storage integrations, pipes, load history
ALTER FAILOVER GROUP sales_payroll_financials SET ENABLE_ETL_REPLICATION = TRUE;

-- now add integrations, along with all existing types
ALTER FAILOVER GROUP sales_payroll_financials SET
    OBJECT_TYPES = users, roles, warehouses, resource monitors, databases, shares, account parameters, network policies, integrations
    ALLOWED_INTEGRATION_TYPES = STORAGE INTEGRATIONS;

-- now add databases with pipeline objects
ALTER FAILOVER GROUP sales_payroll_financials ADD tpcdi_ods, tpcdi_stg, tpcdi_wh TO ALLOWED_DATABASES;

-- check that all object types, databases and shares are there
SHOW FAILOVER GROUPS;
SHOW SHARES IN FAILOVER GROUP sales_payroll_financials;
SHOW DATABASES IN FAILOVER GROUP sales_payroll_financials;

Run in a SQL worksheet named BCDR Configuration on the Secondary(Azure)

Here you'll create a secondary connection and a secondary failover group.

SUBSTITUTE organization_name and source account_name to create replica connection and failover group on Secondary (Azure)

USE ROLE accountadmin;

SHOW CONNECTIONS;

CREATE CONNECTION prodsnowgrid
    AS REPLICA OF <orgname.source_account_name.prodsnowgrid>;

SHOW REPLICATION ACCOUNTS;

CREATE FAILOVER GROUP sales_payroll_financials
    AS REPLICA OF <orgname.source_account_name.sales_payroll_financials>;

Note down the CONNECTION_URL: observe that it is account and region agnostic (orgname-connection_name.snowflakecomputing.com)

Client applications will use the CONNECTION_URL that can be redirected to either account.

Welcome App Developers!

We have options for you today to build quick and simple, Snowflake-powered apps that will continue to be operational through a region failure.

You can choose to build one or all of these.

  1. Build a Sigma dashboard: step by step instructions are available in the video below.
  2. Build a python-based Streamlit app to visualize sales payroll data. If prerequisite libraries and connectors have been installed, you can use the code provided below and connect to the Snowflake connection object.
  3. Connect with SnowSQL CLI to run a few simple queries to showcase how client failover and pipeline replication and idempotent replay works.

Sigma BI Dashboard

In this step, we'll build a sigma dashboard that will rely on data available on our primary account. Once we replicate this data to our secondary account and failover, we'll see the sigma dashboard seamlessly point to the failover region and powered by our secondary account.

In the video, you will see we connect Sigma to Snowflake by creating a connection. When filling up the connection details ensure that account name is same as the value recorded for "connection_url" at the end of step 4.

Query used to build the global sales dashboard

Here's the code snippet used for charting the global revenue figures.

select n_name nation
    , sum(o_totalprice) revenue
from global_sales.online_retail.orders 
inner join global_sales.online_retail.customer on c_custkey = o_custkey
inner join global_sales.online_retail.nation on c_nationkey = n_nationkey
where o_orderdate between dateadd(day,-2,current_date) and current_date()
group by 1
order by 2 desc

Final Dashboard

Once you're done building the dashboard, the end result should look something like this sigma_dashboard

Python Streamlit App

Python code for the Streamlit app is located at scripts/streamlit_app/streamlit_failover_app.py

Edit the python file to fill in user, password, account:

# navigate to project directory where streamlit and libraries were installed
% cd streamlit
% streamlit run /tmp/scripts/streamlit_app/streamlit_failover_app.py

# may fail with a specific version of pyarrow required!
% pip install pyarrow={version required in error}

Here is what the Streamlit App should look like!

streamlit_dash

All that work behind us, we set up our account resources like users, roles, RBAC hierarchy, databases (not to mention, the crucial data it contains), compute warehouses, governance policies with RLS and CLS features, direct shares and then some beautiful customer facing apps - Phew!

Now we have to do this all over again just to have a DR instance - how much more work will this be?? Oh right, negligible. With Snowflake a single command will bring all that setup from our source account to our target account in a matter of minutes if not seconds (depending upon the size/number of objects).

But wait a minute, our source account and target account are on different public cloud providers - won't that factor into the setup or cause some kind of extra work? Nope, not with Snowflake - we're cloud agnostic and we hide all that complexity from you.

Replicate to Secondary

Run the command below on your target/secondary account to begin replication

USE ROLE accountadmin;

ALTER FAILOVER GROUP sales_payroll_financials REFRESH;

SELECT * FROM TABLE(snowflake.information_schema.replication_group_refresh_progress('SALES_PAYROLL_FINANCIALS'));

SELECT start_time, end_time, replication_group_name, credits_used, bytes_transferred
  FROM TABLE(snowflake.information_schema.replication_group_usage_history(date_range_start=>dateadd('day', -7, current_date())));

SELECT * FROM snowflake.account_usage.REPLICATION_GROUP_REFRESH_HISTORY;

Did the replication fail?

Why do you think the first attempt to replication fail? Notice that there's an externals db that contains an external table which is not supported for replication and is the reason why replication failed.

Let's fix this by removing the externals db from our failover group. Run the below command on the primary account.

USE ROLE accountadmin;
ALTER FAILOVER GROUP sales_payroll_financials REMOVE externals from ALLOWED_DATABASES;

Now lets re-reun our replication, it should succeed this time. Run the below command on the secondary account.

USE ROLE accountadmin;
ALTER FAILOVER GROUP sales_payroll_financials REFRESH;

This command would take about a minute to run , but wait where's it getting the compute from? is it using any of our WHs that we've provisioned? Nope, we got you covered - this is serverless compute that Snowflake provides and autoscales depending upon the amound of work that needs to get done. There will be a separate lineitem under "Replication" for this on your bill.

After this command has run - you should all of the databases and other objects that we included in the failover group definition available in your secondary account.

Verify Replication

In order to ensure that replication worked, go back to step 3 and run all commands under "Lets review our source account" on your target account and ensure that you see the exact same results as you did on your source account. This will confirm that our replication worked as expected.

Replicate on a schedule

With the initial replication successfully completed, we want to now replicate on a schedule so that any additional changes on the primary account are regularly made available to the secondary. Let's assume a strict RPO and replicate every 3 minutes. It is important to note that if there are no changes to primary, nothing will be replicated to secondary and there will be no replication cost incurred. Run the command below (on the primary account) to replicate our group evey three minutes.

USE ROLE accountadmin;
ALTER FAILOVER GROUP sales_payroll_financials SET REPLICATION_SCHEDULE = '3 MINUTES';

Moment of truth! With our data and account objects safely replicated to the secondary account. Let's assume disaster has struck! Our primary account is experiencing outage and business and tech teams (The awesome failover HoL participants) have invested time and money in ensuring that they are prepared for this very event.

So what do we do? Again, something very simple - fire two commands.

Run the two commands below on the Secondary Account

USE ROLE accountadmin;

ALTER CONNECTION prodsnowgrid PRIMARY;
ALTER FAILOVER GROUP sales_payroll_financials PRIMARY;

Let's Revisit Our Apps

With your connection_url now pointing to a new primary account, refresh your BI Dashboards and/or Streamlit Apps and notice which accounts are they powered by? But let's not blindly believe the visualizations! Login to the new primary account and ensure the queries are executing on primary account.

Restart the TPC-DI Pipeline

Check status of Dynamic Tables (Snowsight Activity tab):

Load script 800_execute_pipeline.sql into a SQL Worksheet on the new Primary (Azure)

Failback to original Primary (AWS)