In this guide, you'll learn how to set up real-time event tracking and analytics, configure the Snowflake Streaming Loader for immediate data availability, and deploy Snowplow's Unified Digital dbt package for simplified data modeling and consumption. By the end of this tutorial, you'll have a fully functioning analytics pipeline, enabling you to make data-driven decisions swiftly and accurately.
With Snowplow, tracking capabilities are virtually limitless, offering a wide range of possibilities.
Basic tracking functionalities can be easily integrated into your app by following this guide, while more advanced features can be unlocked by utilizing Snowplow's full suite of products.
Here's a summary of the steps to implement a basic web tracking setup using Snowplow's Data Products and advanced Snowplow features:
For a complete setup guide/recipe, visit Snowplow's official documentation.
In this section, you'll learn how to set up the Snowflake Streaming Loader to ingest Snowplow event data into Snowflake in near real-time. By the end of this step, you'll have a streaming pipeline that delivers enriched event data with minimal latency.
Snowpipe Streaming is Snowflake's real-time data ingestion service, which allows for the immediate availability of data as it's ingested. This capability is crucial for businesses that rely on up-to-the-minute data to make timely, informed decisions.
The Snowflake Streaming Loader is an advanced data integration solution that continuously captures and delivers real-time information to Snowflake, enabling organizations to access up-to-the-minute insights and drive informed decision-making.
Go to your Snowplow BDP console, under infra in destinations, and select the Snowflake ( Streaming Option )
Complete your details
and then generate and run the script to set up everything that will be needed in Snowflake
Click the "Test and complete destination set up" button and reach out to Snowplow Support (support@snowplow.io) to finish the process
In this section, we'll explore how to deploy the Unified Digital dbt package in Snowflake to transform raw event data into modeled data and actionable insights. We will use the raw data that we previously ingested in Snowflake using the streaming loader
The unified data package is essential for consistent analytics across different data sources and platforms. It standardizes the structure and schema of your data, making it easier to query and derive insights.
We'll use Snowplow BDP to run the standard data models in Snowflake.
We can go to the dedicated screen per model and configure the model as we want
Steps:
In this section, we'll explore your processed behavioral derived data using Snowflake Notebooks in depth. We'll focus on the derived tables created by the Unified Digital package, such as derived.snowplow_unified_users
, derived.snowplow_unified_sessions
, and derived.snowplow_unified_views
. This hands-on session will guide you through querying these tables to extract valuable insights into user behavior, session metrics, and content performance.
Derived tables are the result of transforming raw event data into structured, analytical datasets. They aggregate and organize data to provide meaningful insights without the complexity of raw event logs.
derived.snowplow_unified_users
: Aggregated user-level data.derived.snowplow_unified_sessions
: Data summarizing user sessions.derived.snowplow_unified_views
: Information about page or screen views.snowplow.derived
).SHOW TABLES IN SCHEMA snowplow.derived;
This command lists all the derived tables available for analysis.Let's start by exploring the derived.snowplow_unified_users
table to gain insights into your user base.
Query 1: View Sample User Data
SELECT * FROM derived.snowplow_unified_users LIMIT 5;
Query 2: User Acquisition Over Time
SELECT
DATE_TRUNC('day', START_TSTAMP) AS first_session_date,
COUNT(*) AS new_users
FROM
derived.snowplow_unified_users
GROUP BY
first_session_date
ORDER BY
first_session_date;
Query 3: Top Engaged Users
SELECT
USER_ID,
SESSIONS,
ENGAGED_TIME_IN_S
FROM derived.snowplow_unified_users
ORDER BY
ENGAGED_TIME_IN_S DESC
LIMIT 10;
Next, we'll delve into the derived.snowplow_unified_sessions
table to understand user interaction patterns.
Query 1: Average Session Duration
SELECT
AVG(engaged_time_in_s) AS average_session_duration
FROM
derived.snowplow_unified_sessions;
Query 2: Sessions Over the Last 7 Days
SELECT
DATE_TRUNC('day', START_TSTAMP) AS session_date,
COUNT(*) AS session_count
FROM
derived.snowplow_unified_sessions
WHERE
START_TSTAMP >= DATEADD('day', -7, CURRENT_DATE)
GROUP BY
session_date
ORDER BY
session_date;
Query 3: Session Counts by User
SELECT
user_id,
COUNT(*) AS session_count
FROM derived.snowplow_unified_sessions
GROUP BY
user_id
ORDER BY
session_count DESC
LIMIT 10;
Now, let's explore the derived.snowplow_unified_views
table to assess content performance.
Query 1: Most Viewed Pages
SELECT
page_urlpath,
COUNT(*) AS view_count
FROM derived.snowplow_unified_views
GROUP BY
page_urlpath
ORDER BY
view_count DESC
LIMIT 10;
Query 2: Average Time Spent per Page
SELECT
page_urlpath,
AVG(ENGAGED_TIME_IN_S) AS average_time_spent
FROM derived.snowplow_unified_views
GROUP BY page_urlpath
ORDER BY average_time_spent DESC
LIMIT 10;
Query 3: Page Views by Device Type
SELECT
DEVICE_CATEGORY,
COUNT(*) AS view_count
FROM derived.snowplow_unified_views
GROUP BY DEVICE_CATEGORY
ORDER BY view_count DESC;
Analyzing geographic data helps tailor content and marketing strategies to your audience locations.
Query 1: Sessions by Country
SELECT
FIRST_GEO_COUNTRY,
COUNT(*) AS session_count
FROM derived.snowplow_unified_sessions
GROUP BY FIRST_GEO_COUNTRY
ORDER BY session_count DESC;
Query 2: New Users by Region
SELECT
FIRST_GEO_REGION_NAME,
COUNT(*) AS new_user_count
FROM derived.snowplow_unified_users
GROUP BY FIRST_GEO_REGION_NAME
ORDER BY new_user_count DESC;
User segmentation allows for personalized marketing and improved user experiences.
Query 1: High-Value User Segment
SELECT
user_id,
ENGAGED_TIME_IN_S
FROM derived.snowplow_unified_users
WHERE
ENGAGED_TIME_IN_S > (
SELECT AVG(ENGAGED_TIME_IN_S) FROM derived.snowplow_unified_users
)
ORDER BY ENGAGED_TIME_IN_S DESC;
Maximize the value of your analysis by collaborating within your team.
By systematically exploring the derived tables, you've uncovered:
With a solid understanding of your data, consider:
By focusing on the derived tables, you've efficiently navigated through high-level insights, enabling strategic decision-making without getting frustrated by raw data complexity. Continue to explore and customize your analyses to align with your business objectives.
derived.users
, derived.snowplow_unified_sessions
, derived.snowplow_unified_views
).At the end of this guide, you've successfully set up a near real-time data ingestion pipeline using Snowplow and Snowflake and deployed the Unified Digital dbt package for streamlined analytics. This powerful combination allows your business to harness real-time insights and make data-driven decisions efficiently.
Happy Streaming and Analyzing!