Forecasting is a business process that predicts future events over time based on historical time-series data, and has cross-industry applicability and utility in nearly every organization. Multiple business units can benefit from this modelling technique to more accurately predict performance, demand, or or any activity that can lead to improved business reactivity and optimization.

The Forecast Model Builder accelerates time to value by offering modeling flexibility, a low code environment, and automated model deployment through Snowflake Model Registry. This solution walks through a complete time series forecasting workflow, from Exploratory Data Analysis (EDA) to model training, and finally batch inference in Snowflake. By leveraging XGBoost, Snowpark, and Snowflake's Model Registry, we create a scalable and efficient forecasting solution.

App

Prerequisites

What You'll Learn

This guide will walk you through the process of:

What You'll Need

What You'll Build

App

This solution leverages several key Snowflake features:

To setup the Forecast Model Builder solution in Snowflake you will:

  1. Download from Github and then import the Forecast_Model_Builder_Deployment.ipynb notebook to Snowsight. (For instructions on how create a new Snowflake Notebook from an existing file, please see this documentation and follow the instructions for creating a notebook from an existing file.)
  2. Follow the documented instructions in the Deployment notebook. Here are instructions for using the zipped file to stage method:a. Go to the Emerging Solutions Toolbox Github Repository and download a zipped file of the repository.Imageb. Go to the Forecast Model Builder Deployment Notebook and run the DEPLOYMENT cell in the notebook. This cell will create a stage if it doesn't already exist named FORECAST_MODEL_BUILDER.BASE.NOTEBOOK_TEMPLATES.c Upload a zipped copy of the Forecast Model Builder Github Repository to that stage. Imaged. Re-run the DEPLOYMENT celle. You should see the success message of "FORECAST_MODEL_BUILDER fully deployed!"f. Run the cell PROJECT_DEPLOY.g. Set your project name. Each project gets own schema and set of notebooks and each notebook will be prefixed with the Project Name. In this example I am naming the project "RBLUM". Imageh. Click the Create button and go back to the main Notebooks page.
  3. The project name you provide will be the prefix for the notebooks and the schema name that are created in this deployment
  4. The solution including three notebooks (eda, modeling and inference) will be created within your new named schema (<YOUR_PROJECT_NAME>).

Once setup is complete you should see the following database and associated objects:

  1. Sample time series data with a daily granularity has been provided in the table DAILY_PARTITIONED_SAMPLE_DATA.

Understanding the structure and patterns of time series data is crucial for building accurate forecasting models. This notebook walks through an Exploratory Data Analysis (EDA) of time series data, providing statistical summaries and visual insights to inform model development.

Key Highlights

By the end of this notebook, you will have a deep understanding of the dataset's characteristics, enabling informed decisions on feature engineering and model selection for accurate time series forecasting.

To run the EDA Notebook:

  1. Go to Projects > Notebooks in Snowsight. You should see notebooks for EDA, MODELING & INFERENCE with your Project prefix.
  2. Open the notebook <YOUR_PROJECT_NAME>_EDA.
  3. In the upper right hand corner of your UI, click the down arrow next to the word Start and select ‘Edit compute settings'.
  4. Select ‘Run on container' and click Save. You will also do this for the Modeling and Inference notebooks. (Please note that you will need to use a role other than ACCOUNTADMIN, SECURITYADMIN or ORGADMIN to run the notebook on a container.)
  5. Follow the instructions provided in the each notebook cell.

In this notebook, we explore a partitioned time series modeling approach using XGBoost and Snowflake's Snowpark, enabling efficient and scalable forecasting for large datasets.

Key Highlights

By the end of this notebook, you'll have a structured approach to forecasting time series data using Snowflake & XGBoost, optimizing performance while maintaining flexibility across different datasets.

To run the Feature Engineering and Advanced Modeling Notebook:

  1. Go to Projects > Notebooks in Snowsight.
  2. Open the notebook <YOUR_PROJECT_NAME>_MODELING.
  3. Switch to Container Runtime.
  4. Follow the instructions provided in the each notebook cell.

This notebook is designed to perform inference using the trained time series model from the modeling pipeline. It leverages Snowflake's Snowpark environment to efficiently make predictions on new data, ensuring seamless integration between model training and deployment.

Key Highlights

By the end of this notebook, you'll have a scalable and efficient pipeline for time series inference, enabling real-time and batch forecasting within Snowflake's powerful data ecosystem.

To run the Inferencing Notebook:

  1. Go to Projects > Notebooks in Snowsight.
  2. Open the notebook <YOUR_PROJECT_NAME>_INFERENCE.
  3. Switch to Container Runtime.
  4. Follow the instructions provided in the each notebook cell.

By following this structured workflow, businesses can build scalable, reliable, and high-performing forecasting models. Whether applied to retail traffic, sales predictions, or resource allocation, this pipeline ensures that forecasting models are accurate, interpretable, and easy to deploy in production.

What You Learned

After completing all the notebooks in this series, a user has gained a comprehensive understanding of time series forecasting and how to implement it efficiently using Snowflake, Snowpark, and XGBoost. Specifically, they have learned:

1. Exploring Time Series Data (EDA)

2. Feature Engineering for Time Series Forecasting

3. Building and Training a Forecast Model

4. Deploying and Running Forecast Inference

By following this workflow, a user is now equipped with the knowledge to build, deploy, and scale time series forecasting models efficiently within Snowflake, enabling data-driven decision-making in real-world business scenarios. 🚀

Related Resources