Welcome to Snowflake! This entry-level guide designed for database and data warehouse administrators and architects will help you navigate the Snowflake interface and introduce you to some of our core capabilities. Sign up for a free 30-day trial of Snowflake and follow along with this lab exercise. Once we cover the basics you'll be ready to start processing your own data and diving into Snowflake's more advanced features like a pro.

Free Virtual Hands-on Lab

This Snowflake Guide is available as a free, instructor-led Virtual Hands on Lab. Sign up for the VHOL today.

Prerequisites:

What You'll Learn:

Steps to Prepare Your Lab Environment

If you haven't already, register for a Snowflake free 30-day trial.

The Snowflake edition (Standard, Enterprise, Business Critical, e.g.), cloud provider (AWS, Azure, e.g.), and Region (US East, EU, e.g.) do not matter for this lab. We suggest you select the region which is physically closest to you and the Enterprise Edition, our most popular offering. After registering, you will receive an email with an activation link and your Snowflake account URL.

Logging Into the Snowflake User Interface (UI)

Open a browser window and enter the URL of your Snowflake 30-day trial environment that was sent with your registration email.

You should see the login screen below​. Enter the username and password used for registration.

login screen

Close any Welcome Boxes and Tutorials

You may see welcome and helper boxes appear when you log in for the first time. Also an "Enjoy your free trial..." ribbon at the top of the screen. Minimize and close these boxes.

welcome messages

Navigating the Snowflake UI

Let's get you acquainted with Snowflake! This section covers the basic components of the user interface. We will move left to right from the top of the UI.

The navigation bar allows you to switch between the different areas of Snowflake:

snowflake navbar

The ​Databases​ tab shows information about the databases you have created or have permission to access. You can create, clone, drop, or transfer ownership of databases, as well as load data in the UI. Notice that several databases already exist in your environment. However, we will not be using these in this lab.

databases tab

The ​Shares​ tab is where data sharing can be configured to easily and securely share Snowflake tables among separate Snowflake accounts or external users, without having to create a copy of the data. We will cover data sharing in Section 10.

shares tab

The ​Warehouses​ tab is where you set up and manage compute resources known as virtual warehouses to load or query data in Snowflake. A warehouse called COMPUTE_WH (XL) already exists in your environment.

warehouses tab

The ​Worksheets​ tab provides an interface for submitting SQL queries, performing DDL and DML operations, and viewing results as your queries or operations complete. The default Worksheet 1 appears when this tab is accessed.

The left pane contains the database objects browser which enables users to explore all databases, schemas, tables, and views accessible by the role selected for a worksheet. The bottom pane displays the results of queries and operations.

The various sections of this page can be resized by adjusting their sliders. If during the lab you need more room in the worksheet, collapse the database objects browser in the left pane. Many of the screenshots in this guide will keep this section closed.

worksheets tab

The History tab allows you to view the details of all queries executed in the last 14 days from your Snowflake account. Click on a Query ID to drill into it for more information.

history tab

Clicking on your username in the top right of the UI allows you to change your password, roles, and preferences. Snowflake has several system defined roles. You are currently in the default role of SYSADMIN and will stay in this role for the majority of the lab.

user preferences dropdown

The Lab Story

This lab is based on the analytics team at Citi Bike, a real, citywide bike sharing system in New York City, USA. The team wants to run analytics on data from their internal transactional systems to better understand their riders and how to best serve them.

We will first load structured .csv data from rider transactions into Snowflake. Later we will work with open-source, semi-structured JSON weather data to determine if there is any correlation between the number of bike rides and the weather.

Let's start by preparing to load the structured Citi Bike rider transaction data into Snowflake.

This section will walk you through the steps to:

The data we will be using is bike share data provided by Citi Bike NYC. The data has been exported and pre-staged for you in an Amazon AWS S3 bucket in the US-EAST region. The data consists of information about trip times, locations, user type, gender, age, etc. On AWS S3, the data represents 61.5M rows, 377 objects, and 1.9GB compressed.

Below is a snippet from one of the Citi Bike CSV data files:

data snippet

It is in comma-delimited format with double quote enclosing and a single header line. This will come into play later in this section as we configure the Snowflake table which will store this data.

Create a Database and Table

First, let's create a database called CITIBIKE that will be used for loading the structured data.

Navigate to the Databases tab. Click Create, name the database CITIBIKE, then click Finish.

worksheet creation

Now navigate to the Worksheets tab. You should see an empty worksheet. You can copy the SQL from each step below and paste it here to run.

new worksheet

We need to set the context appropriately within the worksheet. In the top right, click on the drop-down arrow next to the Context section to show the worksheet context menu. Here we control what elements the user can see and run from each worksheet. We are using the UI here to set the context. Later in the lab we will accomplish this via SQL commands within the worksheet.

Select the following context settings: Role: SYSADMIN Warehouse: COMPUTE_WH (XL) Database: CITIBIKE Schema = PUBLIC

context settings

Next we'll create a table called TRIPS that will be used for loading the comma-delimited data. We will use the UI within the Worksheets tab to run the DDL that creates the table. Copy the SQL text below into your worksheet:

create or replace table trips
(tripduration integer,
starttime timestamp,
stoptime timestamp,
start_station_id integer,
start_station_name string,
start_station_latitude float,
start_station_longitude float,
end_station_id integer,
end_station_name string,
end_station_latitude float,
end_station_longitude float,
bikeid integer,
membership_type string,
usertype string,
birth_year integer,
gender integer);

Run the query by placing your cursor anywhere in the command and clicking the blue Run button at the top of the page. Or use the keyboard shortcut Ctrl/Cmd+Enter.

command select and run

If you highlighted the entire SQL text of the command and ran it, a confirmation box asking "Do you want to run the following queries?" should appear. Click the blue Run button in the box. In the future you can continue using this confirmation box or check the "Don't ask me again (All Worksheets)" option.

run confirmation box

Verify that your TRIPS table has been created. At the bottom of the worksheet you should see a Results section displaying a "Table TRIPS successfully created" message.

TRIPS confirmation message

Navigate to the Databases tab and click on the CITIBIKE database link. You should see your newly created TRIPS table. If you do not see the databases, expand your browser as they may be hidden.

TRIPS table

Click on the TRIPS hyperlink to see the table structure you just configured.

TRIPS table structure

Create an External Stage

We are working with structured, comma-delimited data that has already been staged in a public, external S3 bucket. Before we can use this data, we first need to create a Stage that specifies the location of our external bucket.

From the Databases tab, click on the CITIBIKE database, select Stages, then Create...

stages create

Select the option for Existing Amazon S3 Location and click Next.

Existing Amazon S3 Location option

On the Create Stage box, enter the following settings, then click Finish.

Name: citibike_trips Schema Name: PUBLIC URL: s3://snowflake-workshop-lab/citibike-trips

create stage settings

Now let's take a look at the contents of the citibike_trips stage. Navigate to the Worksheets tab, then execute the following statement:

list @citibike_trips;

worksheet command

You should see the output in the Results window in the bottom pane:

worksheet result

Create a File Format

Before we can load the data into Snowflake, we have to create a File Format that matches the data structure.

From the Databases tab, click on the CITIBIKE database hyperlink. Select File Formats and Create.

create file format

On the resulting page we create a file format. In the box that appears, leave all the default settings as-is but make the changes below:

Name: CSV Field optionally enclosed by: Double Quote Null string: [ ] Error on Column Count Mismatch:

If you do not see the "Error on Column Count Mismatch" box, scroll down in the dialogue box.

When you are done, the box should look like:

create file format settings

Click Finish to create the file format.

In this section, we will use a data warehouse and the COPY command to initiate bulk loading of structured data into the Snowflake table we just created.

Resize and Use a Warehouse for Data Loading

Compute power is needed for loading data. Snowflake's compute nodes are called virtual warehouses and they can be dynamically sized up or out according to workload, whether the workload is loading data, running a query, or performing a DML operation. Each workload can have its own data warehouse so there is no resource contention.

Navigate to the Warehouses tab. This is where you can view all of your existing warehouses, as well as analyze their usage trends.

Note the Create... option at the top is where you can quickly add a new warehouse. However, we want to use the existing warehouse COMPUTE_WH included in the 30-day trial environment.

Click on the row of this COMPUTE_WH warehouse (not the blue hyperlink that says COMPUTE_WH) and highlight the entire row. Then click on the Configure... text above it to see the configuration details of the COMPUTE_WH. We will use this warehouse to load in the data from AWS S3.

compute warehouse configure

Let's walk through the settings of this warehouse and learn some of Snowflake's unique functionality.

configure settings

We are going to use this data warehouse to load the structured data into Snowflake. However, we are first going to decrease the size of the warehouse to reduce the compute power it contains. In later steps we will note the time this load takes and re-do the same load operation with a larger warehouse, observing its faster load time.

Change the Size of this data warehouse from X-Large to Small. Then click the Finish button.

configure settings small

Load the Data

Now we can run a COPY command to load the data into the TRIPS table we created earlier.

Navigate back to the Worksheets tab. In the top right of the worksheet, make sure the context is correct with these settings:

Role: SYSADMIN Warehouse: COMPUTE_WH (S) Database: CITIBIKE Schema = PUBLIC

worksheet context

Execute the following statements in the worksheet to load the staged data into the table. This may take up to 30 seconds.

copy into trips from @citibike_trips
file_format=CSV;

In the Results window, you should see the status of the load:

results load status

Once the load is done, at the bottom right of the worksheet click on the small arrow next to the Open History text to show the history of Snowflake operations performed in that worksheet.

open history arrow

In the History window see the copy into trips from @citibike_trips file_format=CSV; SQL query you just ran and note the duration, bytes scanned, and rows. Use the slider on the left side of the pane to expand it if needed.

history and duration

Go back to the worksheet to clear the table of all data and metadata by using the TRUNCATE TABLE command.

truncate table trips;

Open the worksheet context menu, then click on Resize to increase the warehouse to size Large and click Finish. This warehouse is four times larger than the Small size.

resize context to large

Go back to the worksheet and execute the following statement to load the same data again.

copy into trips from @citibike_trips
file_format=CSV;

Once the load is done, at the bottom of the worksheet in the History window compare the times between the two loads. The load using the Large warehouse was significantly faster.

compare load durations

4.3 Create a New Warehouse for Data Analytics

Going back to the lab story, let's assume the Citi Bike team wants to eliminate resource contention between their data loading/ETL workloads and the analytical end users using BI tools to query Snowflake. As mentioned earlier, Snowflake can easily do this by assigning different, appropriately-sized warehouses to various workloads. Since Citi Bike already has a warehouse for data loading, let's create a new warehouse for the end users running analytics. We will use this warehouse to perform analytics in the next section.

Navigate to the Warehouses tab, click Create..., and name the new warehouse ANALYTICS_WH with size Large. If you have Snowflake's Enterprise Edition or higher, you will see a setting for Maximum Clusters. Set this to 1.

Leave the other settings at their defaults. It should look like:

warehouse settings

Click on the Finish button to create the warehouse.

In the previous exercises, we loaded data into two tables using Snowflake's bulk loader COPY command and the warehouse COMPUTE_WH. Now we are going to take on the role of the analytics users at Citi Bike who need to query data in those tables using the worksheet and the second warehouse ANALYTICS_WH.

Execute SELECT Statements and Result Cache

Go to the Worksheets tab. Within the worksheet, verify your context is the following:

Role: SYSADMIN Warehouse: ANALYTICS_WH (L) Database: CITIBIKE Schema = PUBLIC

Run the query below to see a sample of the trips data:

select * from trips limit 20;

sample data query results

First let's look at some basic hourly statistics on Citi Bike usage. Run the query below in the worksheet. It will show for each hour the number of trips, average trip duration, and average trip distance.

select date_trunc('hour', starttime) as "date",
count(*) as "num trips",
avg(tripduration)/60 as "avg duration (mins)",
avg(haversine(start_station_latitude, start_station_longitude, end_station_latitude, end_station_longitude)) as "avg distance (km)"
from trips
group by 1 order by 1;

hourly query results

Snowflake has a result cache that holds the results of every query executed in the past 24 hours. These are available across warehouses, so query results returned to one user are available to any other user on the system who executes the same query, provided the underlying data has not changed. Not only do these repeated queries return extremely fast, but they also use no compute credits.

Let's see the result cache in action by running the exact same query again.

select date_trunc('hour', starttime) as "date",
count(*) as "num trips",
avg(tripduration)/60 as "avg duration (mins)",
avg(haversine(start_station_latitude, start_station_longitude, end_station_latitude, end_station_longitude)) as "avg distance (km)"
from trips
group by 1 order by 1;

In the History window note that the second query runs significantly faster because the results have been cached.

cached query duration

Next, let's run this query to see which months are the busiest:

select
monthname(starttime) as "month",
count(*) as "num trips"
from trips
group by 1 order by 2 desc;

months query results

Clone a Table

Snowflake allows you to create clones, also known as "zero-copy clones" of tables, schemas, and databases in seconds. A snapshot of data present in the source object is taken when the clone is created, and is made available to the cloned object. The cloned object is writable and independent of the clone source. Therefore changes made to either the source object or the clone object are not included in the other.

A popular use case for zero-copy cloning is to clone a production environment for use by Development & Testing to test and experiment without adversely impacting the production environment and eliminating the need to set up and manage two separate environments.

Run the following command in the worksheet to create a development (dev) table:

create table trips_dev clone trips

If closed, expand the database objects browser on the left of the worksheet. Click the small Refresh button in the left-hand panel and expand the object tree under the CITIBIKE database. Verify that you can see a new table under the CITIBIKE database named TRIPS_DEV. The development team now can do whatever they want with this table, including delete it, without impacting the TRIPS table or any other object.

trips_dev table

Going back to the lab's example, the Citi Bike analytics team wants to determine how weather impacts ride counts. To do this, in this section we will:

The JSON data consists of weather information provided by OpenWeatherMap detailing the historical conditions of New York City from 2016-07-05 to 2019-06-25. It is also staged on AWS S3 where the data consists of 57.9k rows, 61 objects, and 2.5MB compressed. The raw JSON in GZ files and in a text editor looks like:

raw JSON sample

Create a Database and Table

First, via the worksheet, let's create a database called WEATHER that will be used for storing the unstructured data.

create database weather;

Set the context appropriately within the worksheet.

use role sysadmin;
use warehouse compute_wh;
use database weather;
use schema public;

Next, let's create a table called JSON_WEATHER_DATA that will be used for loading the JSON data. In the worksheet, run the SQL text below. Snowflake has a special column type called VARIANT which will allow us to store the entire JSON object and eventually query it directly.

create table json_weather_data (v variant);

Verify that your table JSON_WEATHER_DATA has been created. At the bottom of the worksheet you should see a Results section which says "Table JSON_WEATHER_DATA successfully created."

success message

Navigate to the Databases tab and click on the WEATHER database link. You should see your newly created JSON_WEATHER_DATA table.

JSON_WEATHER_DATA table

Create an External Stage

Via the worksheet create a stage from where the unstructured data is stored on AWS S3.

create stage nyc_weather
url = 's3://snowflake-workshop-lab/weather-nyc';

Now let's take a look at the contents of the nyc_weather stage. Navigate to the Worksheets tab. Execute the following statement with a LIST command to display the list of files:

list @nyc_weather;

You should see the output in the Results window in the bottom pane with many gz files from S3:

results output

Loading and Verifying the Unstructured Data

For this section, we will use a warehouse to load the data from the S3 bucket into the Snowflake table we just created.

Via the worksheet, run a COPY command to load the data into the JSON_WEATHER_DATA table we created earlier.

Note how in the SQL command we can specify a FILE FORMAT object inline. In the previous section where we loaded structured data, we had to define a file format in detail. Because the JSON data here is well formatted, we are able to use default settings and simply specify the JSON type.

copy into json_weather_data
from @nyc_weather
file_format = (type=json);

Take a look at the data that has been loaded.

select * from json_weather_data limit 10;

query result

Click on one of the values. Notice how the data is stored in raw JSON. Click Done when finished.

JSON data snippet

Create a View and Query Semi-Structured Data

Let's look at how Snowflake allows us to create a view and also query the JSON data directly using SQL.

Run the following command from the Worksheets tab. It will create a view of the unstructured JSON weather data in a columnar view so it is easier for analysts to understand and query. The city_id 5128638 corresponds to New York City.

create view json_weather_data_view as
select
v:time::timestamp as observation_time,
v:city.id::int as city_id,
v:city.name::string as city_name,
v:city.country::string as country,
v:city.coord.lat::float as city_lat,
v:city.coord.lon::float as city_lon,
v:clouds.all::int as clouds,
(v:main.temp::float)-273.15 as temp_avg,
(v:main.temp_min::float)-273.15 as temp_min,
(v:main.temp_max::float)-273.15 as temp_max,
v:weather[0].main::string as weather,
v:weather[0].description::string as weather_desc,
v:weather[0].icon::string as weather_icon,
v:wind.deg::float as wind_dir,
v:wind.speed::float as wind_speed
from json_weather_data
where city_id = 5128638;

SQL dot notation v.city.coord.lat is used in this command to pull out values at lower levels within the JSON hierarchy. This allows us to treat each field as if it were a column in a relational table.

The new view should appear just under the table json_weather_data at the top left of the UI. You may need to expand or refresh the database objects browser in order to see it.

JSON_WEATHER_DATA _VIEW in dropdown

Via the worksheet, verify the view with the following query. Notice the results look just like a regular structured data source. Your result set may have different observation_time values.

select * from json_weather_data_view
where date_trunc('month',observation_time) = '2018-01-01'
limit 20;

query results with view

Use a Join Operation to Correlate Against Data Sets

We will now join the JSON weather data to our CITIBIKE.PUBLIC.TRIPS data to answer our original question of how weather impacts the number of rides.

Run the command below to join WEATHER to TRIPS and count the number of trips associated with certain weather conditions .

select weather as conditions
,count(*) as num_trips
from citibike.public.trips
left outer join json_weather_data_view
on date_trunc('hour', observation_time) = date_trunc('hour', starttime)
where conditions is not null
group by 1 order by 2 desc;

weather results

The initial goal was to determine if there was any correlation between the number of bike rides and the weather by analyzing both ridership and weather data. Per the table above we have a clear answer. As one would imagine, the number of trips is significantly higher when the weather is good!

Snowflake's Time Travel capability enables historical data access at any point within a pre-configurable period of time. The default window is 24 hours and with Snowflake's Enterprise Edition it can be up to 90 days. Most data warehouses cannot offer this functionality, but - you guessed it - Snowflake makes it easy!

Some useful applications include:

Drop and Undrop a Table

First let's see how we can restore data objects that have been accidentally or intentionally deleted.

From the worksheet, run the following DROP command to remove the json_weather_data table:

drop table json_weather_data;

Run a SELECT statement on the json_weather_data table. In the Results pane you should see an error because the underlying table has been dropped.

select * from json_weather_data limit 10;

table dropped error

Now restore the table:

undrop table json_weather_data;

The json_weather_data table should be restored.

restored table result

Roll Back a Table

Let's roll back a table to a previous state to fix an unintentional DML error that replaces all the station names in the CITIBIKE database's TRIPS table with the word "oops."

First make sure your worksheet has the proper context:

use role sysadmin;
use warehouse compute_wh;
use database citibike;
use schema public;

Run the following command to replace all of the station names in the table with the word "oops".

update trips set start_station_name = 'oops';

Now run a query that returns the top 20 stations by number of rides. Notice that the station names result is only one row.

select
start_station_name as "station",
count(*) as "rides"
from trips
group by 1
order by 2 desc
limit 20;

one row result

Normally we would need to scramble and hope we have a backup lying around. In Snowflake, we can simply run commands to find the query ID of the last UPDATE command and store it in a variable called $QUERY_ID.

set query_id =
(select query_id from table(information_schema.query_history_by_session (result_limit=>5))
where query_text like 'update%' order by start_time limit 1);

Re-create the table with the correct station names:

create or replace table trips as
(select * from trips before (statement => $query_id));

Run the SELECT statement again to verify that the station names have been restored:

select
start_station_name as "station",
count(*) as "rides"
from trips
group by 1
order by 2 desc
limit 20;

restored names result

In this section we will explore aspects of Snowflake's role-based access control (RBAC), such as creating a new role and granting it specific permissions. We will also cover the ACCOUNTADMIN (Account Administrator) role.

To continue with the Citi Bike story, let's assume a junior DBA has joined Citi Bike and we want to create a new role for them with less privileges than the system-defined, default role of SYSADMIN.

Create a New Role and Add a User

In the worksheet switch to the ACCOUNTADMIN role to create a new role. ACCOUNTADMIN encapsulates the SYSADMIN and SECURITYADMIN system-defined roles. It is the top-level role in the system and should be granted only to a limited number of users in your account. In the worksheet, run:

use role accountadmin;

Notice at the top right of the worksheet the context has changed to the role ACCOUNTADMIN

ACCOUNTADMIN context

In order for any role to function, we need at least one user assigned to it. So let's create a new role called junior_dba and assign your username to it. Your username appears at the top right of the UI. In the screenshot below it is USER123.

username display

Create the role and add your username to it:

create role junior_dba;
grant role junior_dba to user YOUR_USER_NAME_GOES HERE;

Change your worksheet context to the new junior_dba role:

use role junior_dba;

At the top right of the worksheet, note that the context has changed to reflect the junior_dba role.

JUNIOR_DBA context

On the left side of the UI in the database object browser pane the CITIBIKE and WEATHER databases no longer appear. This is because the junior_dba role does not have access to view them.

object browser pane without databases

Switch back to the ACCOUNTADMIN role and grant the junior_dba the ability to view and use the CITIBIKE and WEATHER databases:

use role accountadmin;
grant usage on database citibike to role junior_dba;
grant usage on database weather to role junior_dba;

Switch to the junior_dba role:

use role junior_dba;

Note that the CITIBIKE and WEATHER databases now appear. Try clicking the refresh icon if they do not.

object browser pane with databases

Account Administrator View

Let's change our security role to ACCOUNTADMIN to see other parts of the UI accessible only to this role.

In the top right corner of the UI, click on your username to show the User Preferences menu. Go to Switch Role, then select the ACCOUNTADMIN role.

switch role

Notice at the top of the UI you will now see a sixth tab called Account that you can only view in the ACCOUNTADMIN role.

Click on this Account tab. Towards the top of the page click on Usage. Here you can find information on credits, storage, and daily or hourly usage for each warehouse, including cloud services. Select a day to review its usage.

account usage

To the right of Usage is Billing where you can add a credit card and continue beyond your free trial's $400 of credits. Further to the right is information on Users, Roles, and Resource Monitors. The latter set limits on your account's credit consumption so you can appropriately monitor and manage your credits.

Stay in the ACCOUNTADMIN role for the next section.

Snowflake enables data access between accounts through shares. Shares are created by data providers and imported by data consumers, either through their own Snowflake account or a provisioned Snowflake Reader account. The consumer could be an external entity or a different internal business unit that is required to have its own unique Snowflake account.

With Secure Data Sharing:

Snowflake uses secure data sharing to provide account usage data and sample data sets with all Snowflake accounts. In this capacity, Snowflake acts as the provider of the data and all other accounts act as consumers.

Secure data sharing also powers the Snowflake Data Marketplace, which is available to all Snowflake customers and allows you to discover and access third-party datasets from numerous data providers and SaaS vendors. Again in this example the data doesn't leave the provider's account and you can use the datasets without any transformation.

See Existing Shares

Click on the blue Snowflake logo at the very top left of the UI. On the left side of the page in the database object browser, notice the database SNOWFLAKE_SAMPLE_DATA. The small arrow on the database icon indicates this is a share.

arrow over database icon

At the top right of the UI, verify that you are in the ACCOUNTADMIN role. Navigate to the Shares tab and notice you are looking at your Inbound Secure Shares. There are two shares provided by Snowflake. One contains your account usage and the other has sample data. This is data sharing in action - your Snowflake account is a consumer of data shared by Snowflake!

secure Snowflake shares

Create an Outbound Share

Let's go back to the Citi Bike story and assume we are the Account Administrator for Snowflake at Citi Bike. We have a trusted partner who wants to analyze the data in our TRIPS database on a near real-time basis. This partner also has their own Snowflake account in our region. So let's use Snowflake Data Sharing to allow them to access this information.

Navigate to the Shares tab. Further down on the page click on the Outbound button.

shares outbound button

Click the Create button and fill in the following fields:

Secure Share Name: TRIPS_SHARE Database: CITIBIKE Tables & Views: CITIBIKE > PUBLIC > TRIPS.

share fields

Click Apply then Create.

Note that the window indicates the secure share was created successfully.

success message

Realistically, the Citi Bike Account Administrator would click on the Next: Add Consumers button to input their partner's Snowflake account name and type. We will stop here for the purposes of this lab.

Click on the Done button at the bottom of the box.

Note this page now shows the TRIPS_SHARE secure share. It only took seconds to give other accounts access to data in Snowflake in a secure manner with no copies of the data required!

TRIPS_SHARE share

Snowflake provides several ways to securely share data without compromising confidentiality. You can share not only tables and views, but also Secure Views, Secure UDFs (User Defined Functions), and Secure Joins. For more details on how to use these methods for sharing data while preventing access to sensitive information, see the Snowflake documentation.

Snowflake Data Marketplace

Navigate to the Data Marketplace tab.

data marketplace tab

Select "Explore the Snowflake Data Marketplace." If it's your first time using the Data Marketplace the following login screens will appear:

login page

Enter your credentials to access the Snowflake Data Marketplace.

Snowflake data marketplace

To change your role, follow the steps below:

check context

Find a listing

The search bar at the top center allows you to search for a listings. The menu below the search box lets you filter data listings by Provider, Business Needs, and Category. Type COVID into the search box, then select the Starschema COVID-19 Epidemiological Data tile.

health tab

Here you can learn more about the data set and see some usage example querires. Now click on the Get Data button to access this information within your Snowflake Account.

get data fields

get data fields

get data fields

Now you can run any of the sample queries provided by Star Schema. Just select the query you want to run, and click the run button in the upper right corner. You can view the data results in the bottom pane. Once you are done running the sample queries, click on the Home icon in the upper left corner.

get data fields

Now navigate to the Databases view in the navigation panel on the left. Click on Data, then Databases, then click on the COVID19_BY_STARSCHEMA_DM Database. Now you can see the schemas, tables, and views that are available to query.

covid19 databases

You have now successfully subscribed to the COVID-19 dataset from StarSchema which is updated daily with global COVID data. Note that we didn't have to create databases, tables, views, or an ETL process. We simply can search for and access shared data from the Snowflake Data Marketplace.

If you would like to reset your environment by deleting all the objects created as part of this lab, run the SQL below in a worksheet.

First set the worksheet context:

use role accountadmin;
use warehouse compute_wh;
use database weather;
use schema public;

Then run this SQL to drop all the objects we created in the lab:

drop share if exists trips_share;
drop database if exists citibike;
drop database if exists weather;
drop warehouse if exists analytics_wh;
drop role if exists junior_dba;

Congratulations on completing this introductory lab exercise! You've mastered the Snowflake basics and are ready to apply these fundamentals to your own data. Be sure to reference this guide if you ever need a refresher.

We encourage you to continue with your free trial by loading your own sample or production data and by using some of the more advanced capabilities of Snowflake not covered in this lab.

Additional Resources:

What we've covered: