In this guide, we'll be walking you through how to setup the integration between Feast and Snowflake to use Snowflake as the batch engine to perform push-down processing of data to create offline feature store. We will also be using Snowflake as an online feature store, however, configuration of Feast could be updated to easily use any Feast supported database for online feature serving.

For both offline and online feature generation, all the operations would be pushed down on Snowflake where the source data exists. You will be able to leverage Snowflake's scalable compute to process large volumes of data and serve the features for various use cases without moving the data out of Snowflake, unless explicitly specified.

In this guide, we will be:

  1. Setting up your python environment to run Snowpark Python
  2. Loading and transforming data into Snowflake using Snowpark
  3. Setting up Feast with Snowflake
  4. Using Snowpark and Feast to walk through an end-to-end Machine Learning use case.

The source code for this quickstart is available on GitHub.

Prerequisites

What You'll Learn

What You'll Need

What You'll Build

You will build an end-to-end data science workflow leveraging Snowpark for Python to load, clean and prepare data, Feast to use the data and create online and offline feature stores utilising Snowflake infrastructure for required data processing, build a model using offline store, and lastly, deploy our trained model in Snowflake using Python UDF for inference.

You are part of a team of data engineers and data scientists at a Telecom company that has been tasked to reduce customer churn using a machine learning based solution.

To build this, you have access to customer demographic and billing data. Using Snowpark, we will ingest, analyse and transform this data to train a model that will then be deployed inside Snowflake to score new data.

With Snowflake, it is easy to make all relevant data instantly accessible to your machine learning models whether it is for training or inference. For this guide, we are going to do all of our data and feature engineering with Snowpark for Python but users can choose to work with SQL or any of the other Snowpark supported languages including Java and Scala without the need for separate environments.

To streamline your path to production, we will learn how to bring trained models (whether trained inside Snowflake or in an external environment) to run directly inside Snowflake as a UDF bringing models where the data and data pipelines live.

Let's set up the Python environment necessary to run this quickstart:

  1. First, clone the source code for this repo to your local environment:
git clone https://github.com/Snowflake-Labs/sfguide-getting-started-snowpark-python-feast.git
cd sfguide-getting-started-snowpark-python-feast/

Snowpark Python via Anaconda

  1. If you are using Anaconda on your local machine, create a conda env for this quickstart:
conda env create -f jupyter_env.yml
conda activate getting_started_snowpark_python

Conda will automatically install snowflake-snowpark-python and all other dependencies for you.

  1. Once Snowpark is installed, create a kernel for Jupyter:
python -m ipykernel install --user --name=getting_started_snowpark_python
  1. Now, launch Jupyter Notebook on your local machine:
jupyter notebook
  1. Open the config.py file located in the cloned git repository and modify with your account, username, and password information:

Now, you are ready to get started with the notebooks. For the first and the third notebooks, make sure that you select the getting_started_snowpark_python kernel when running. You can do this by navigating to: Kernel -> Change Kernel and selecting getting_started_snowpark_python after launching each Notebook.

Apple M1

There is a known issue with running Snowpark Python on Apple M1 chips due to memory handling in pyOpenSSL. Please refer to the Snowpark documentation to solve this issue: Issue with running Snowpark Python on Apple M1 chips

Persona: DBA/Platform Administrator/Data Engineer

What You'll Do:

Open up the 01-Load-Data-with-Snowpark Jupyter notebook and run each of the cells to explore loading and transforming data with Snowpark Python.

Persona: ML Engineer/Data Scientist

What You'll Do:

Open up the 02-Install-and-Setup-Feast-Feature-Store Jupyter notebook.

Note:

  1. There are required setup instructions in this notebook that require a terminal CLI for execution. In this notebook, you will be toggling between your notebook and the terminal window. Do not skip any instructions to ensure a successful setup.
  2. Feast repository is installed on your machine, that is used to setup this guide, as part of this setup.

Persona: Data Scientist/ML Engineer

What You'll Do:

Open up the 03-Snowpark-UDF-Deployment Jupyter notebook and run each of the cells to train a model and deploy it for in-Snowflake inference using Snowpark Python UDFs

Through this Quickstart we were able to experience how Feast could be setup to utilise Snowflake as processing engine to create offline as well as offline feature stores. You experienced push down processing for the generation of offline and online features. This is particularly useful when you are dealing with very large datasets. Here's what you were able to complete:

For more information on Snowpark Python, and Machine Learning in Snowflake, check out the following resources:

For more information on Feast on Snowflake, check out the following resources: