Monitoring your real-time assets in Snowflake is crucial as it allows you to spot anomalies fast and handle the incoming issues proactively to minimize the blast radius.
Here are some of the use cases that can benefit from tracking your live data streams:
In this workshop, we will visualize real-time commercial flight data over the San Francisco Bay Area from the Opensky Network with Amazon Managed Service for Grafana and Streamlit. The architecture diagram below illustrates the deployment, with the non-greyed out area being the focus of our workshop. The grey out area is covered in the Snowpipe Streaming quickstarts.
The data in Snowflake table is visualized in real-time with AMG (Amazon Managed Service for Grafana) and Streamlit Note that Streamlit is containerized and managed by Amazon Elastic Container Service (ALB). An Application Load Balancer is used to front-end the container.
DO NOTuse Snowflake account in production for this workshop. Always use trial or test account for the demo purposes.
To participate in the virtual hands-on lab, attendees need the following resources.
If you have completed the quickstarts for Snowpipe Streaming and Amazon MSK and haven't cleaned up the database, there is no further actions required, just skip ahead to the section for
Monitor with Amazon Managed Service for Grafana (AMG) on the left pane.
At this point, you can either choose to complete
Option 1 above first then skip ahead to the section for
Monitor with Amazon Managed Service for Grafana (AMG) on the left pane, or you can start by processing some sample data described in steps below before using AMG or Streamlit.
First login to your Snowflake account as a power user with ACCOUNTADMIN role. Then run the following SQL commands in a worksheet to create a user, database and the role that we will use in the lab.
-- Set default value for multiple variables -- For purpose of this workshop, it is recommended to use these defaults during the exercise to avoid errors -- You should change them after the workshop SET PWD = 'Test1234567'; SET USER = 'STREAMING_USER'; SET DB = 'MSK_STREAMING_DB'; SET SCHEMA = 'MSK_STREAMING_SCHEMA'; SET WH = 'MSK_STREAMING_WH'; SET ROLE = 'MSK_STREAMING_RL'; USE ROLE ACCOUNTADMIN; -- CREATE USERS CREATE USER IF NOT EXISTS IDENTIFIER($USER) PASSWORD=$PWD COMMENT='STREAMING USER'; -- CREATE ROLES CREATE OR REPLACE ROLE IDENTIFIER($ROLE); -- CREATE DATABASE AND WAREHOUSE CREATE DATABASE IF NOT EXISTS IDENTIFIER($DB); USE IDENTIFIER($DB); CREATE OR REPLACE WAREHOUSE IDENTIFIER($WH) WITH WAREHOUSE_SIZE = 'SMALL'; -- GRANTS GRANT CREATE WAREHOUSE ON ACCOUNT TO ROLE IDENTIFIER($ROLE); GRANT ROLE IDENTIFIER($ROLE) TO USER IDENTIFIER($USER); GRANT OWNERSHIP ON DATABASE IDENTIFIER($DB) TO ROLE IDENTIFIER($ROLE); GRANT USAGE ON WAREHOUSE IDENTIFIER($WH) TO ROLE IDENTIFIER($ROLE); GRANT ROLE IDENTIFIER($ROLE) TO ROLE ACCOUNTADMIN; -- SET DEFAULTS ALTER USER IDENTIFIER($USER) SET DEFAULT_ROLE=$ROLE; ALTER USER IDENTIFIER($USER) SET DEFAULT_WAREHOUSE=$WH; -- SET TIME ZONE TO UTC ALTER ACCOUNT SET TIMEZONE = 'UTC'; -- CREATE SCHEMA USE DATABASE IDENTIFIER($DB); USE ROLE IDENTIFIER($ROLE); CREATE OR REPLACE SCHEMA IDENTIFIER($SCHEMA);
Run the following command to load sample flight data into Snowflake.
USE DATABASE MSK_STREAMING_DB; USE SCHEMA MSK_STREAMING_SCHEMA; USE WAREHOUSE MSK_STREAMING_WH; CREATE OR REPLACE STAGE FLIGHTS URL='s3://jsnow-vhol-assets/amg/sample-flight-data/' FILE_FORMAT = (TYPE = 'parquet'); CREATE OR REPLACE TABLE FLIGHTS_TBL (RECORDID NUMBER AUTOINCREMENT, V VARIANT) CHANGE_TRACKING = TRUE; COPY INTO FLIGHTS_TBL (V) FROM (SELECT $1 FROM @FLIGHTS/) FILE_FORMAT=(TYPE = 'parquet'); SELECT * FROM FLIGHTS_TBL LIMIT 10;
Now you should see raw data in variant type is copied into the
Next, we will create a view and run DDL to convert the raw data into a multi-column table.
CREATE OR REPLACE VIEW FLIGHTS_O_VW AS SELECT V:utc::TIMESTAMP_NTZ ts_utc_o, CONVERT_TIMEZONE('UTC','America/Los_Angeles',TS_UTC_O::TIMESTAMP_NTZ) AS ts_pt, V:alt::INTEGER alt, V:id::TEXT id, V:icao::TEXT icao, V:dest::TEXT dest, V:orig::TEXT orig, V:lat::FLOAT lat, V:lon::FLOAT lon, ST_GEOHASH(TO_GEOGRAPHY(ST_MAKEPOINT(LON, LAT)),12) GEOHASH, YEAR(TS_UTC_O)::INTEGER yr, MONTH(TS_UTC_O)::INTEGER mo, DAY(TS_UTC_O)::INTEGER dd, HOUR(TS_UTC_O)::INTEGER hr FROM FLIGHTS_TBL; SELECT * FROM FLIGHTS_O_VW LIMIT 10;
You should see the view
FLIGHTS_O_VW with multiple columns for timestamps, altitude, longitude, latitude, destination airport, etc.
Because the sample data is from a time in the past, we now need to adjust the old timestamps to current to simulate the real-time pattern. Run the following commands to adjust the timestamps.
-- ADJUST TIME STAMPS CREATE OR REPLACE VIEW FLIGHTS_VW AS SELECT *, DATEADD(SECOND, (SELECT DATE_PART(EPOCH_SECOND, CURRENT_TIMESTAMP()) - DATE_PART(EPOCH_SECOND, MAX(TS_UTC_O)) FROM FLIGHTS_O_VW), TS_UTC_O) AS TS_UTC FROM FLIGHTS_O_VW; SELECT TS_UTC_O, TS_UTC FROM FLIGHTS_VW ORDER BY TS_PT DESC;
Now you should see the timestamps(
TS_UTC_O) are adjusted to the current time(
We need to run a task in the background every minute to simulate the incoming stream of new data by running the following commands:
-- CREATE A TASK TO UPDATE TIMESTAMP EVERY MINUTE CREATE OR REPLACE TASK TASK1 WAREHOUSE = MSK_STREAMING_WH SCHEDULE = '1 MINUTE' AS CREATE OR REPLACE VIEW FLIGHTS_VW AS SELECT *, DATEADD(SECOND, (SELECT DATE_PART(EPOCH_SECOND, CURRENT_TIMESTAMP()) - DATE_PART(EPOCH_SECOND, MAX(TS_UTC_O)) FROM FLIGHTS_O_VW), TS_UTC_O) AS TS_UTC FROM FLIGHTS_O_VW; USE ROLE ACCOUNTADMIN; GRANT EXECUTE TASK ON ACCOUNT TO ROLE MSK_STREAMING_RL; USE ROLE MSK_STREAMING_RL; ALTER TASK TASK1 RESUME; SHOW TASKS;
The task is scheduled to run evey minute.
Before we start, for simplicity reason, we will set the timezone in Snowflake to UTC. Run the following command as the Administrator to change the default timezone to UTC:
USE ROLE ACCOUNTADMIN; ALTER ACCOUNT SET TIMEZONE = 'UTC';
First, click here to launch an AMG workspace. Note the default AWS region is
us-west-2 (Oregon), feel free to select a region you prefer to deploy the environment.
Next at the
Create stack page. Set the Stack name or modify the default value to customize it to your identity.
See below sample screen capture for reference.
Leave everything as default in the
Configure stack options page and click
Next. In the
Review page, check
I acknowledge that AWS CloudFormation might create IAM resources in the
Capabilities section, click
In about 10 minutes, the Cloudformation template provisions an AMG workspace.
Go to the AWS AMG workspace console in the same region where you deployed the Cloudformation template. The Snowflake Plugin for Grafana is only available in [Grafana Enterprise](https://grafana.com/products/enterprise/. We will need to upgrade our current workspace to the enterprise version. Click the name of the workspace, then click
Upgrade to Grafana Enterprise.
In the next page, check
30-day free trial, review the terms and conditions before clicking
In a few minutes, the upgrade is complete. You now have 30 days to try out the Grafana Enterprise edition with Snowflake plugin for free.
Go back to the AWS AMG workspace console. Click the name of the workspace, we need to assign new user(s) for this workspace, including the Administrator of the Grafana UI. If you already have previously set up
AWS IAM Identity Center (Successor to AWS SSO), you will be prompted to assign new user or group. Click
Assign new user or group located under the
If you have not set up IAM Identity Center, you will be prompted to enable it. Click
Enable then click
Assign new user or group. Note that you will probably need to add a new user in IAM Identity Center, please refer to this doc to complete this step. You will need to add at least one user who will also be the Administrator of Grafana UI.
Coming back to the page for assigning new user, check the user who you wish to assign to the workspace and click
Assign users and groups.
Now we need to promote this user's role from Viewer to Administrator. Check the user and select
Make admin in the
Action drop-down menu.
Go back to the AWS AMG workspace console. Click the Grafana URL.
You should see the Grafana sign-in page. Click the
Sign in with AWS IAM Identity Center button.
At this point, you will be prompted to enter username and password for authentication. Depending on how your IAM Identity Center is setup, you may also have to perform a MFA (Multi-Factor Authentication).
Once logged in, click the hamburger menu located in the top-left corner of the page, select
Data sources from the Administration drop-down menu.
In the search box, type in Snowflake, and the Snowflake plugin appears. Select it.
Type in the name for your source, e.g.,
Snowflake and the connection info such as your Snowflake account locator, its AWS region, username
streaming_user and password
Test1234567. For Environment, type in the role
msk_streaming_db and schema
msk_streaming_schema you used in the quickstarts for Snowpipe Streaming and Amazon MSK. Leave the remaining options as default and click ‘Save & test'. You will see the ‘Data source is working' sign when it is done. If you are not sure about the Snowflake account locator, please run the following query in Snowflake to get it:
WITH HOSTLIST AS (SELECT * FROM TABLE(FLATTEN(INPUT => PARSE_JSON(SYSTEM$allowlist())))) SELECT REPLACE(VALUE:host,'.snowflakecomputing.com','') AS ACCOUNT_IDENTIFIER FROM HOSTLIST WHERE VALUE:type = 'SNOWFLAKE_DEPLOYMENT';
Now download a dashboard template and save it in your local computer.
In the UI, click the hamburger menu, click
Dashboards, then select
Import from the
New drop-down menu. Drag and drop the downloaded template, select
Snowflake as the data source, click
Import to complete the import.
Now you should be able to see the sample dashboard that tracks the real-time flight data.
To learn more about Grafana's features, feel free to visit their documentation portal.
For developers with Python skills, they might want to quickly develop a custom monitoring dashboard of their own. Streamlit will be a great tool for this use case as it is open-source and can use all the supported Python graphic libraries to build sleek dashboards.
Here we are going to containerize a Streamlit Python app that retrieves real-time flight data from Snowflake and visualize them in a similar fashion as Grafana. The container is orchestrated by Amazon ECS and front-ended by an Application Load Balancer (ALB).
First click here to deploy the environment with Cloudformation. Click
In the next page, pick a stack name and ECS cluster name of your choice, pick the VPC and Subnets where you want to deploy the ECS cluster. For Snowflake parameters, use the default values (already filled in), the Snowflake password should be
Test1234567 from the Snowpipe Streaming and Amazon MSK workshop if you have not changed it. For
SnowflakeAccount, run the following command in your Snowflake cluster to get the
WITH HOSTLIST AS (SELECT * FROM TABLE(FLATTEN(INPUT => PARSE_JSON(SYSTEM$allowlist())))) SELECT REPLACE(VALUE:host,'.snowflakecomputing.com','') AS ACCOUNT_IDENTIFIER FROM HOSTLIST WHERE VALUE:type = 'SNOWFLAKE_DEPLOYMENT_REGIONLESS';
Leave everything else as default and click
Review page, check the
Capabilities and click
Submit. In less than 10 minutes, the environment should deploy successfully, navigate to the
Output tab, and click the URL for the ALB.
Now you should be able to see the sample dashboard built with Streamlit.
Here is the source code for Streamlit.
In this lab, we showed a demo on how to use AMG and Streamlit to visualize and monitor live flight data, we also provided a sample Grafana dashboard and Streamlit code for you to get started quickly.