ML Lineage provides comprehensive tracing of data as it flows through your machine learning pipeline. This feature enables you to track the lineage between various data artifacts, including source tables/views/stages, feature views, datasets, registered models, and deployed model services. Additionally, ML Lineage captures the relationships between cloned artifacts and artifacts of similar types, ensuring a complete view of data transformations and dependencies within your pipeline. A possible pipeline is illustrated below:
This quickstart will introduce ML Lineage by building an ML pipeline containing lineage at every step. You will see how source data, Feature Store, Datasets, and models all are connected via lineage for easy tracking.
Complete the following steps to setup your account:
USE ROLE ACCOUNTADMIN;
-- Using ACCOUNTADMIN, create a new role for this exercise and grant to applicable users
CREATE OR REPLACE ROLE ML_LINEAGE_ROLE;
GRANT ROLE ML_LINEAGE_ROLE to USER <YOUR_USER>;
-- grant lineage
GRANT VIEW LINEAGE ON ACCOUNT TO ROLE ML_LINEAGE_ROLE;
-- create our virtual warehouse
CREATE OR REPLACE WAREHOUSE ML_LINEAGE_WH AUTO_SUSPEND = 60;
GRANT ALL ON WAREHOUSE ML_LINEAGE_WH TO ROLE ML_LINEAGE_ROLE;
-- Next create a new database and schema,
CREATE OR REPLACE DATABASE ML_LINEAGE_DATABASE;
CREATE OR REPLACE SCHEMA ML_LINEAGE_SCHEMA;
GRANT OWNERSHIP ON DATABASE ML_LINEAGE_DATABASE TO ROLE ML_LINEAGE_ROLE COPY CURRENT GRANTS;
GRANT OWNERSHIP ON ALL SCHEMAS IN DATABASE ML_LINEAGE_DATABASE TO ROLE ML_LINEAGE_ROLE COPY CURRENT GRANTS;
ML Lineage provides a powerful tool for tracking and understanding data flow throughout your machine learning pipeline. By capturing relationships between data artifacts—such as source tables, feature views, datasets, models, and services—ML Lineage ensures full visibility into data transformations and dependencies. In this quickstart you saw how many different objects within Snowflake ML are connected via lineage.
Start your journey now to unlock the full potential of ML Lineage!