Forecasting is a business process that predicts future events over time based on historical time-series data, and has cross-industry applicability and utility in nearly every organization. Multiple business units can benefit from this modelling technique to more accurately predict performance, demand, or or any activity that can lead to improved business reactivity and optimization.

The Forecast Model Builder accelerates time to value by offering modeling flexibility, a low code environment, and automated model deployment through Snowflake Model Registry. This solution walks through a complete time series forecasting workflow, from Exploratory Data Analysis (EDA) to model training, and finally batch inference in Snowflake. By leveraging XGBoost, Snowpark, and Snowflake's Model Registry, we create a scalable and efficient forecasting solution.

App

Prerequisites

What You'll Learn

This guide will walk you through the process of:

What You'll Need

What You'll Build

App

This solution leverages several key Snowflake features:

To setup the Forecast Model Builder solution in Snowflake you will:

  1. Download from Github and then import the Forecast_Model_Builder_Deployment.ipynb notebook to Snowsight. (For instructions on how create a new Snowflake Notebook from an existing file, please see this documentation and follow the instructions for creating a notebook from an existing file.)
  2. Follow the deployment instructions found in the Forecast Model Builder README to run the deployment notebook.

Understanding the structure and patterns of time series data is crucial for building accurate forecasting models. This notebook walks through an Exploratory Data Analysis (EDA) of time series data, providing statistical summaries and visual insights to inform model development.

Key Highlights

By the end of this notebook, you will have a deep understanding of the dataset's characteristics, enabling informed decisions on feature engineering and model selection for accurate time series forecasting.

To run the EDA Notebook:

  1. Go to Projects > Notebooks in Snowsight. You should see notebooks for EDA, MODELING & INFERENCE with your Project prefix.
  2. Open the notebook <YOUR_PROJECT_NAME>_EDA.
  3. In the upper right hand corner of your UI, click the down arrow next to the word Start and select ‘Edit compute settings'.
  4. Select ‘Run on container' and click Save. You will also do this for the Modeling and Inference notebooks. (Please note that you will need to use a role other than ACCOUNTADMIN, SECURITYADMIN or ORGADMIN to run the notebook on a container.)
  5. Follow the instructions provided in the each notebook cell.

In this notebook, we explore a partitioned time series modeling approach using XGBoost and Snowflake's Snowpark, enabling efficient and scalable forecasting for large datasets.

Key Highlights

By the end of this notebook, you'll have a structured approach to forecasting time series data using Snowflake & XGBoost, optimizing performance while maintaining flexibility across different datasets.

To run the Feature Engineering and Advanced Modeling Notebook:

  1. Go to Projects > Notebooks in Snowsight.
  2. Open the notebook <YOUR_PROJECT_NAME>_MODELING.
  3. Switch to Container Runtime.
  4. Follow the instructions provided in the each notebook cell.

This notebook is designed to perform inference using the trained time series model from the modeling pipeline. It leverages Snowflake's Snowpark environment to efficiently make predictions on new data, ensuring seamless integration between model training and deployment.

Key Highlights

By the end of this notebook, you'll have a scalable and efficient pipeline for time series inference, enabling real-time and batch forecasting within Snowflake's powerful data ecosystem.

To run the Inferencing Notebook:

  1. Go to Projects > Notebooks in Snowsight.
  2. Open the notebook <YOUR_PROJECT_NAME>_INFERENCE.
  3. Switch to Container Runtime.
  4. Follow the instructions provided in the each notebook cell.

By following this structured workflow, businesses can build scalable, reliable, and high-performing forecasting models. Whether applied to retail traffic, sales predictions, or resource allocation, this pipeline ensures that forecasting models are accurate, interpretable, and easy to deploy in production.

What You Learned

After completing all the notebooks in this series, a user has gained a comprehensive understanding of time series forecasting and how to implement it efficiently using Snowflake, Snowpark, and XGBoost. Specifically, they have learned:

1. Exploring Time Series Data (EDA)

2. Feature Engineering for Time Series Forecasting

3. Building and Training a Forecast Model

4. Deploying and Running Forecast Inference

By following this workflow, a user is now equipped with the knowledge to build, deploy, and scale time series forecasting models efficiently within Snowflake, enabling data-driven decision-making in real-world business scenarios. 🚀

Related Resources