Through this guide, you will learn how to move beyond slow, nightly batch jobs and stream data in real-time from an operational database like SQL Server directly into Snowflake. Using Openflow, a cloud-native data movement platform, you will build a continuous Change Tracking pipeline that unlocks immediate access to your business data for faster, more accurate analytics.
By the end of this guide, you will learn to work with:
NOTE
Please note that Openflow on SPCS is not available on Snowflake's free trial account. Please input credit card details to work through this quickstart or use your own Snowflake accounts.
Openflow is a cloud-native data movement platform built on Apache NiFi, designed specifically for scalable, real-time streaming and Change Data Capture (CDC) pipelines. It provides a unified experience for building and monitoring data integration workflows, complete with built-in observability and governance.
Openflow is engineered for high-speed, continuous ingestion of all data types—from structured database records to unstructured text, images, and sensor data—making it ideal for feeding near real-time data into modern cloud platforms for AI and analytics.
Change tracking captures the fact that rows in a table were changed, but doesn't capture the data that was changed. This enables applications to determine the rows that have changed with the latest row data being obtained directly from the user tables. Therefore, change tracking is more limited in the historical questions it can answer compared to change data capture. However, for those applications that don't require the historical information, there's far less storage overhead because of the changed data not being captured. A synchronous tracking mechanism is used to track the changes. This has been designed to have minimal overhead to the DML operations.
By the end of this quickstart guide, you will learn how to build:
A Snowflake account with Snowflake Openflow and Snowpark Container Services access.
NOTE
Please note that Openflow on SPCS is not available on Snowflake's free trial account. Please input credit card details to work through this quickstart or use your own Snowflake accounts.
To set the stage for our real-time data streaming demonstration, we first need a source database. We will use SQL Server as a transaction db for this use-case. This will serve as the live OLTP environment from which we will stream data changes into Snowflake.
This is the primary, step-by-step guide from AWS for creating and connecting to a SQL Server instance on RDS. It covers everything from the initial setup in the AWS console to configuring the security groups.
This is a more general guide that provides context on all available settings when creating an RDS instance.
Once your RDS instance is running, this AWS document shows you exactly how to find your database endpoint and connect to it using the most common tool, SSMS.
The installation script instawnd.sql
for the Northwind database is provided by Microsoft. The link below is to the official Microsoft SQL Server samples repository on GitHub, which is the most reliable place to get the script.
Once you download the instawnd.sql
file from that repository, you can simply open it in SSMS (while connected to your RDS instance) and execute it to create and populate all the Northwind tables.
To configure change-tracking, execute the console.sql
script from this repository against the Northwind database.
UPDATE
statement to simulate a real-world transaction. It finds and modifies a set of recent orders related to a specific product, changing their order and shipping dates. Because Change Tracking is now active, this modification is immediately captured and will be picked up by the Change Tracking process.In the next section, we will configure Snowflake Openflow connector and analyze real-time data from SQL Server to generate business insights
Openflow supports 19+ connectors including:
Before creating a deployment, you need to configure core Snowflake components including the OPENFLOW_ADMIN
role and network rule.
setup_roles.sql
from this repository.setup_roles.sql
)NOTE
For a detailed, step-by-step guide on these prerequisite configurations, please complete Steps 2 of the following Snowflake Quickstart guide: Snowflake Configuration for Openflow.
With the core Snowflake components configured, the next step is to create the Openflow deployment. This deployment provisions the necessary containerized environment within your Snowflake account where the Openflow service will execute.
Before proceeding, ensure your current active role in the Snowsight UI is set to OPENFLOW_ADMIN
. You can verify and switch your role using the user context menu located in the top-left corner of the Snowsight interface. Failure to assume the correct role will result in permissions errors during the deployment creation process.First, login to Snowflake UI.
Snowflake
and Name as CDC_QS_DEPLOYMENT
Create a runtime role that will be used by your Openflow runtime. This role needs access to databases, schemas, and warehouses for data ingestion.
-- Create runtime role
USE ROLE ACCOUNTADMIN;
CREATE ROLE IF NOT EXISTS NORTHWIND_ROLE;
-- Create database for Openflow resources
CREATE DATABASE IF NOT EXISTS NORTHWIND_QS;
-- Create warehouse for data processing
CREATE WAREHOUSE IF NOT EXISTS NORTHWIND_WH
WAREHOUSE_SIZE = MEDIUM
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE;
-- Grant privileges to runtime role
GRANT USAGE ON DATABASE NORTHWIND_QS TO ROLE NORTHWIND_ROLE;
GRANT USAGE ON WAREHOUSE NORTHWIND_WH TO ROLE NORTHWIND_ROLE;
-- Grant runtime role to Openflow admin
GRANT ROLE NORTHWIND_ROLE TO ROLE OPENFLOW_ADMIN;
External Access Integrations allow your runtime to connect to external data sources. This quickstart creates one integration with network rules for SQL Server.
USE COMPANION NOTEBOOKS:
For detailed External Access Integration setup for specific connectors, use the notebooks from the companion repository and look for SQL Server
Next step is to create a runtime associated with the previously created runtime role. A runtime is the execution environment for your Openflow connectors. Follow these steps to create your runtime:* Navigate to Data → Ingestion → Openflow → Runtimes tab
CDC_QS_DEPLOYMENT
CDC_QS_RUNTIME
M
, Min nodes: 1, Max nodes: 1SQL_SERVER_NETWORK_RULE
from the dropdownEAI_SQL_SERVER_INTEGRATION
from the dropdownNOTE
Runtime creation typically takes 3-5 minutes. The status will progress from CREATING → ACTIVEOnce your runtime is active, you can access the Openflow canvas to add and configure connectors. We will add and configure connectors in the next section.
This section details the final step of launching the Openflow connector.
CDC_QS_RUNTIME
from the available runtimes drop-down list. Select Add.allow
when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete.
You can configure the connector to replicate a set of tables in real-time.
Incremental Load
process group and select Parameters. NOTE
The connector does not replicate any data until any tables to be replicated are explicitly added to its ingestion configuration.
sqlserver-connector
and select Start. The connector starts the data ingestion.Once the initial data replication from SQL Server to the NORTHWIND_QS
database in Snowflake is complete, you can perform an exploratory analysis to validate the ingested data.
northwind.ipynb
Jupyter Notebook from this git repository.northwind.ipynb
notebook within the Snowflake UI's notebook environment.This final procedure validates the end-to-end CDC pipeline by simulating new transactional data in the source SQL Server and observing its immediate replication and availability in Snowflake.
To generate changes for the CDC process to capture, you will execute pre-defined SQL scripts from this repository against the source Northwind database.
After the simulation scripts have been executed in SQL Server, the changes will be captured and streamed by Openflow. You can verify their arrival in Snowflake nearly instantly.
northwind.ipynb
notebook that you previously ran within the Snowflake UI.You've successfully built a real-time Change Tracking pipeline to stream data from SQL Server to Snowflake using Openflow. This modern approach to data integration eliminates the delays of traditional batch jobs, enabling immediate data analysis and faster business decisions.