Through this quickstart, you will learn how to build your first data engineering pipeline using Snowflake notebooks.
pandas on Snowflake lets you run your pandas code in a distributed manner scalably and securely directly on your data in Snowflake. Just by changing the import statement and a few lines of code, you can get the same pandas-native experience you know and love with the scalability and security benefits of Snowflake.
pandas is the go-to data processing library for millions worldwide, including countless Snowflake users. However, pandas was never built to handle data at the scale organizations are operating today. Running pandas code requires transferring and loading all of the data into a single in-memory process. It becomes unwieldy on moderate-to-large data sets and breaks down completely on data sets that grow beyond what a single node can handle. With pandas on Snowflake, you can run the same pandas code, but with all the pandas processing pushed down to run in a distributed fashion in Snowflake. Your data never leaves Snowflake, and your pandas workflows can process much more efficiently using the Snowflake elastic engine. This brings the power of Snowflake to pandas developers everywhere.
Snowpark is the set of libraries and code execution environments that run Python and other programming languages next to your data in Snowflake. Snowpark can be used to build data pipelines, ML models, apps, and other data processing tasks.
This section covers the Snowflake objects creation and other setup needed to run this quickstart successfully.
AVALANCHE_DB
.avalanche_db
, select create schema
and call it AVALANCHE_SCHEMA
.DE_M
. Select type as standard and size as M.DE_100.ipynb
file.Import ipynb file
to load a notebook. Call it AVALANCHE_ANALYTICS_NB
. Select AVALANCHE_DB
for database and AVALANCHE_SCHEMA
for schema, Query warehouse as DE_M
and create notebook.order-history.csv
and shipping-logs.csv
files to your local.+
sign. Click on +
to load the order-history.csv
and shipping-logs.csv
files to your notebook workspace.With this, we are ready to run our first data engineering pipeline in Snowflake using Python.
During this step you will learn how to use pandas on Snowflake to:
You will also learn how to use Snowpark Python to:
In addition to the ingestion and transformation steps above, you will learn how to:
Follow along and run each of the cells in the Notebook.
Congratulations, you have successfully completed this quickstart!