app

In this Quickstart, you'll build a Snowflake Native Application that provides food truck owners with a machine learning model they can use to forecast sales. Here are some additional details behind the scenario:

Consumer

Provider

Prerequisites

To successfully complete this Quickstart, your Snowflake account must have the Tasty Bytes data set and Tasty Bytes machine learning models. Make sure you complete the following before attempting this Quickstart:

  1. Step 2 of An Introduction to Tasty Byes
  2. Steps 2 and 3 of Tasty Bytes - Snowpark 101 for Data Science

If these prerequisites are not completed, you will not able to complete this Quickstart.

What You'll Learn

What You'll Need

What You'll Build

To successfully complete this Quickstart, your Snowflake account must have the Tasty Bytes data set and Tasty Bytes machine learning models. Make sure you complete the following before attempting this Quickstart:

  1. Step 2 of An Introduction to Tasty Byes
  2. Steps 2 and 3 of Tasty Bytes - Snowpark 101 for Data Science

If these prerequisites are not completed, you will not able to complete this Quickstart.

In addition, clone or download the source code for the application you'll build in this Quickstart here: Tasty Bytes - Native App.

For the purposes of convenience, we'll use one Snowflake account to build the application. This means the account will contain consumer data and provider data. In practice, there would be two separate accounts. Providers would build the application package in their account, publish it to the Snowflake marketplace, and consumers would discover the application there and install it into their Snowflake account.

Let's start by setting up the consumer objects and data in the account. In Snowflake, open a SQL worksheet and run the following SQL:

-- Create consumer roles and grant privileges
USE ROLE accountadmin;

CREATE ROLE IF NOT EXISTS franchise_admin;
CREATE ROLE IF NOT EXISTS franchise_viewer;

GRANT ROLE franchise_admin TO ROLE accountadmin;
GRANT ROLE franchise_viewer TO ROLE accountadmin;

GRANT CREATE DATABASE ON ACCOUNT TO ROLE franchise_admin;
GRANT CREATE APPLICATION ON ACCOUNT TO ROLE franchise_admin;
GRANT EXECUTE TASK, EXECUTE MANAGED TASK ON ACCOUNT TO ROLE franchise_admin WITH GRANT option;
GRANT ROLE franchise_viewer TO ROLE franchise_admin;

-- Create a warehouse to be used by the application consumer
CREATE WAREHOUSE IF NOT EXISTS franchise_wh;
GRANT USAGE ON WAREHOUSE franchise_wh TO ROLE franchise_admin;
GRANT USAGE ON WAREHOUSE franchise_wh TO ROLE franchise_viewer;

-- Grant usage on source data to consumer roles
GRANT USAGE ON DATABASE frostbyte_tasty_bytes_dev TO ROLE franchise_admin;
GRANT USAGE ON ALL SCHEMAS IN DATABASE frostbyte_tasty_bytes_dev TO ROLE franchise_admin;
GRANT SELECT ON TABLE frostbyte_tasty_bytes_dev.analytics.shift_sales_v TO ROLE franchise_admin;

The SQL above does the following:

With the base roles and privileges now defined, let's now create the consumer data in the account. Open a SQL worksheet and run the following:

-- Create consumer data

USE ROLE franchise_admin;
USE WAREHOUSE franchise_wh;

-- Create consumer database
CREATE DATABASE IF NOT EXISTS franchise_consumer_data;
USE DATABASE franchise_consumer_data;
CREATE SCHEMA IF NOT EXISTS data;
USE SCHEMA data;

-- Create sales table for consumer
CREATE OR REPLACE TABLE shift_sales AS
(
    WITH row_cte 
    AS
    (
        SELECT
            location_id,
            shift_sales,
            city,
            ROW_NUMBER() OVER (ORDER BY shift_sales) as row_count
        FROM frostbyte_tasty_bytes_dev.analytics.shift_sales_v
    )
    SELECT 
        * EXCLUDE row_count 
    FROM row_cte
    WHERE city IN ('Toronto', 'Vancouver', 'Montreal')
    AND (row_count % 10) = 0 -- only copy 10% of sales
);

The SQL above creates a database called franchise_consumer_data that contains some sales data for the cities of Toronto, Vancouver, and Montreal.

consumer data

Let's now prep the account with the roles, grants, and objects that the application provider will need to create the application.

Open a SQL worksheet and run the following:

-- Create application provider roles and grant privileges
USE ROLE accountadmin;

CREATE ROLE IF NOT EXISTS sales_forecast_provider;
GRANT ROLE sales_forecast_provider TO ROLE accountadmin;
GRANT CREATE application package ON ACCOUNT TO ROLE sales_forecast_provider;
GRANT CREATE DATABASE ON ACCOUNT TO ROLE sales_forecast_provider;

-- Create warehouse and ensure a warehouse is usable by provider
CREATE WAREHOUSE IF NOT EXISTS franchise_wh;
GRANT USAGE ON WAREHOUSE franchise_wh TO ROLE sales_forecast_provider;

-- Grant usage on source data to provider role
GRANT USAGE ON DATABASE frostbyte_tasty_bytes TO ROLE sales_forecast_provider;
GRANT USAGE ON ALL SCHEMAS IN DATABASE frostbyte_tasty_bytes TO ROLE sales_forecast_provider;
GRANT SELECT ON TABLE frostbyte_tasty_bytes.raw_pos.location TO ROLE sales_forecast_provider;
GRANT SELECT ON TABLE frostbyte_tasty_bytes.raw_pos.country TO ROLE sales_forecast_provider;

The SQL above does the following:

With the provider roles, grants, and objects now defined, let's now create some provider data in the account. This is the data that will enrich the consumer data for use in the forecasting model.

Open a SQL worksheet and run the following:

-- Create provider data

USE ROLE sales_forecast_provider;
USE WAREHOUSE franchise_wh;

CREATE DATABASE IF NOT EXISTS provider_data;
CREATE SCHEMA IF NOT EXISTS provider_data.shared_data;

CREATE OR REPLACE TABLE provider_data.shared_data.locations_detail
AS
(
    WITH _location_point AS
    (SELECT 
        cpg.placekey,
        l.location_id,
        cpg.city,
        cpg.country,
        ST_MAKEPOINT(cpg.longitude, cpg.latitude) AS geo_point
    FROM frostbyte_safegraph.public.frostbyte_tb_safegraph_s cpg
    LEFT JOIN frostbyte_tasty_bytes.raw_pos.location l
        ON cpg.placekey = l.placekey),
    _distance_between_locations AS
    (SELECT 
        a.placekey,
        a.location_id,
        b.placekey AS placekey_2,
        b.location_id AS location_id_2,
        a.city,
        a.country,
        ROUND(ST_DISTANCE(a.geo_point, b.geo_point)/1609,2) AS geography_distance_miles
    FROM _location_point a
    JOIN _location_point b
        ON a.city = b.city
        AND a.country = b.country
        AND a.placekey <> b.placekey
    ),
    _locations_within_half_mile AS
    (
    SELECT
        dbl.placekey,
        dbl.location_id,
        COUNT(DISTINCT dbl.placekey_2) AS count_locations_within_half_mile
    FROM _distance_between_locations dbl
    WHERE dbl.geography_distance_miles <= 0.5
    GROUP BY dbl.location_id, dbl.placekey
    ),
    _locations_within_mile AS
    (
    SELECT
        dbl.placekey,
        dbl.location_id,
        COUNT(DISTINCT dbl.placekey_2) AS count_locations_within_mile
    FROM _distance_between_locations dbl
    WHERE dbl.geography_distance_miles <= 1
    GROUP BY dbl.location_id, dbl.placekey
    )
    SELECT
        l.location_id,
        ZEROIFNULL(hm.count_locations_within_half_mile) AS count_locations_within_half_mile,
        ZEROIFNULL(m.count_locations_within_mile) AS count_locations_within_mile,
        cpg.placekey,
        cpg.location_name,
        cpg.top_category,
        cpg.sub_category,
        cpg.naics_code,
        cpg.latitude,
        cpg.longitude,
        ST_MAKEPOINT(cpg.longitude, cpg.latitude) AS geo_point,
        cpg.street_address,
        cpg.city,
        c.city_population,
        cpg.region,
        cpg.country,
        cpg.polygon_wkt,
        cpg.geometry_type
    FROM frostbyte_tasty_bytes.raw_pos.location l
    JOIN frostbyte_safegraph.public.frostbyte_tb_safegraph_s cpg
        ON l.placekey = cpg.placekey
    JOIN frostbyte_tasty_bytes.raw_pos.country c
        ON l.city = c.city
    LEFT JOIN _locations_within_half_mile hm
        ON l.location_id = hm.location_id
    LEFT JOIN _locations_within_mile m
        ON l.location_id = m.location_id
);

The SQL above creates a table containing location data that will be used alongside the consumer data for forecasting and visualization within the application.

location data

Before building the application, let's quickly take a look at the structure of the application.

tasty-bytes-native-app/
├─ python/
│  ├─ evaluate_model.py
│  ├─ inference_pipeline.py
│  ├─ linreg_location_sales_model.sav
├─ scripts/
│  ├─ setup.sql
├─ streamlit/
│  ├─ environment.yml
│  ├─ forecast_ui.py
├─ manifest.yml
├─ README.md

That's a high level overview of the structure and components of the application. Let's dive into more details in subsequent sections.

app

In GitHub, or your preferred code editor, navigate to streamlit/forecast_ui.py file. This file contains all of the code for the application's user interface (approximately 120 lines of code).

Broadly, this file:

Let's now take a look at some of the logic powering the application, including the data preparation and the sales forecasting generated by our pre-trained linear regression model.

In GitHub, or your preferred code editor, navigate to python/evaluate_model.py file.

# evaluate_model.py

import sys
import pandas as pd
from joblib import load
from _snowflake import vectorized

# vectorized decorator allows the UDF to bulk process rows during inference to speed up processing
@vectorized(input=pd.DataFrame)
def evaluate_model(X: pd.DataFrame) -> pd.Series:

    # import model from package stage
    import_dir = sys._xoptions.get("snowflake_import_directory")
    model_file = import_dir + 'linreg_location_sales_model.sav'
    model = load(model_file)

    # run inference
    sales = model.predict(X)
    return sales

The code in evaluate_model.py does the following:

In GitHub, or your preferred code editor, navigate to python/inference_pipeline.py file.

# inference_pipeline.py

import snowflake.snowpark.functions as F
import snowflake.snowpark.types as T
from snowflake.snowpark import Session

def inference(session: Session, city: T.StringType, shift: T.IntegerType, forecast_date: T.DateType):
    # create dimensional table for date and shifts
    dates_df = session.create_dataframe(
        [[forecast_date.month, forecast_date.weekday(), shift]], 
        schema = ["month", "day_of_week", "shift"]
    )

    # pull locations information from provider
    locations_df = session.sql('''
                        select * 
                        from package_shared.location_detail_v
                        ''')
    locations_df = locations_df.select([
        "city", 
        "location_name", 
        "location_id", 
        "count_locations_within_half_mile", 
        "latitude", "longitude", 
        "city_population",
    ])
    locations_df = locations_df.filter(F.col("city") == city)
    
    # cross join with dates to have all locations and date/shift combinations
    locations_df = locations_df.cross_join(dates_df)
    
    # get sales data from consumer
    sales_df = session.sql('''
                        select location_id, shift_sales 
                        from reference('shift_sales')
                        ''')
    sales_df = sales_df.group_by("location_id").agg(F.mean("shift_sales").alias("avg_location_shift_sales"))
    
    # join sales and location data for inference
    locations_df = locations_df.join(
        sales_df, locations_df.location_id == sales_df.location_id
    ).rename(
        locations_df.location_id, "location_id"
    ).drop(sales_df.location_id)

    # run inference by calling evaluate_model udf
    features = [
        'MONTH', 
        'DAY_OF_WEEK', 
        'LATITUDE', 
        'LONGITUDE', 
        'COUNT_LOCATIONS_WITHIN_HALF_MILE', 
        'CITY_POPULATION', 
        'AVG_LOCATION_SHIFT_SALES', 
        'SHIFT',
    ]
    df_results =  locations_df.select(
        F.col("city"),
        F.col("location_name"),
        F.col("location_id"),
        F.col("latitude"),
        F.col("longitude"),
        F.col("month"),
        F.col("day_of_week"),
        F.iff(F.col("shift") == 1, "AM", "PM").alias("shift"),
        F.call_udf("code_schema.evaluate_model", [F.col(c) for c in features]).alias("prediction")
    ).order_by(
        F.col("prediction").desc()
    ).limit(10)
    
    return df_results

The code in inference_pipeline.py does the following:

Now that we've covered the application user interface and logic, let's move on to a couple of files required by the Snowflake Native Application framework.

In GitHub, or your preferred code editor, navigate to the manifest.yml file.

manifest_version: 1

version:
  name: v1
  label: "version 1"
  comment: "Initial version of the application"

artifacts:
  setup_script: scripts/setup.sql
  readme: readme.md
  default_streamlit: code_schema.forecast_streamlit_app  

references:
  - shift_sales:
      label: "Shift Sales"
      description: "Table that contains franchise sales data."
      privileges:
        - SELECT
      object_type: TABLE
      multi_valued: false
      register_callback: config.register_sales_data_callback

This file is required by the Snowflake Native Application framework – all application packages must have a manifest.yml file. This file contains important metadata about the application (version, name), as well as references to artifacts and objects required by the application.

In artifacts, we provide paths to the set up script setup.sql (which we'll cover in the next step), the README file, and the default Streamlit (also covered in the next step).

In references, we describe the objects that the application will need access to. There's one entry, shift_sales, representing a table in the consumer's account that the application will require SELECT privileges on. The register_callback is set to a procedure that is intended to handle application references – we'll cover this in the next step.

In GitHub, or your preferred code editor, navigate to the setup.sql file.

-- Setup script for sales forecast application

CREATE APPLICATION ROLE app_public;
CREATE APPLICATION ROLE app_admin;

CREATE OR ALTER VERSIONED SCHEMA code_schema;
GRANT USAGE ON SCHEMA code_schema TO APPLICATION ROLE app_public;
GRANT USAGE ON SCHEMA code_schema TO APPLICATION ROLE app_admin;

CREATE OR REPLACE FUNCTION code_schema.evaluate_model(month float, day_of_week float, latitude float, longitude float, count_locations_within_half_mile float, city_population float, avg_location_shift_sales float, shift float)
RETURNS FLOAT
LANGUAGE PYTHON
RUNTIME_VERSION=3.8
IMPORTS = ('/python/evaluate_model.py', '/python/linreg_location_sales_model.sav')
PACKAGES = ('snowflake-snowpark-python', 'scikit-learn', 'joblib','pandas')
HANDLER='evaluate_model.evaluate_model';

CREATE OR REPLACE PROCEDURE code_schema.inference(city string, shift int, forecast_date date)
RETURNS TABLE()
LANGUAGE PYTHON
RUNTIME_VERSION=3.8
IMPORTS = ('/python/inference_pipeline.py')
PACKAGES = ('snowflake-snowpark-python', 'scikit-learn', 'joblib','pandas')
HANDLER='inference_pipeline.inference';

GRANT USAGE ON PROCEDURE code_schema.inference(STRING, INT, DATE) TO APPLICATION ROLE app_public;

CREATE OR ALTER VERSIONED SCHEMA config;
GRANT USAGE ON SCHEMA config TO APPLICATION ROLE app_admin;

CREATE PROCEDURE CONFIG.register_sales_data_callback(ref_name STRING, operation STRING, ref_or_alias STRING)
  RETURNS STRING
  LANGUAGE SQL
  AS $$
    BEGIN
      CASE (operation)
        WHEN 'ADD' THEN
          SELECT SYSTEM$SET_REFERENCE(:ref_name, :ref_or_alias);
        WHEN 'REMOVE' THEN
          SELECT SYSTEM$REMOVE_REFERENCE(:ref_name);
        WHEN 'CLEAR' THEN
          SELECT SYSTEM$REMOVE_REFERENCE(ref_name);
      ELSE
        RETURN 'unknown operation: ' || operation;
      END CASE;
      RETURN NULL;
    END;
  $$;

GRANT USAGE ON PROCEDURE CONFIG.register_sales_data_callback(STRING, STRING, STRING) TO APPLICATION ROLE app_admin;

CREATE STREAMLIT code_schema.forecast_streamlit_app
  FROM '/streamlit'
  MAIN_FILE = '/forecast_ui.py';

GRANT USAGE ON STREAMLIT code_schema.forecast_streamlit_app TO APPLICATION ROLE app_admin;
GRANT USAGE ON STREAMLIT code_schema.forecast_streamlit_app TO APPLICATION ROLE app_public;

This file is required by the Snowflake Native Application framework – all application packages must have a setup.sql file. This file executes when a consumer installs the application into their account.

Set up scripts will typically define application roles, bind logic to functions or stored procedures, and define all of the grants and privileges that the application roles will have.

We can summarize the majority of the script as follows:

There are a couple of other things that are created in the set up script that facilitate application references allow access to the user interface:

Now that we've covered all of the components of the application's source code, let's begin creating the application package that will contain the source code.

Open a SQL worksheet and run the following:

-- Create application package
USE ROLE sales_forecast_provider;
USE WAREHOUSE franchise_wh;

CREATE APPLICATION PACKAGE sales_forecast_package;
USE APPLICATION PACKAGE sales_forecast_package;

-- Allow package to reference provider data
GRANT REFERENCE_USAGE ON DATABASE provider_data
  TO SHARE IN APPLICATION PACKAGE sales_forecast_package;

-- This schema will be accessible inside of our setup script
CREATE SCHEMA IF NOT EXISTS sales_forecast_package.package_shared;
USE SCHEMA sales_forecast_package.package_shared;

-- View for location details
CREATE VIEW IF NOT EXISTS package_shared.location_detail_v
  AS SELECT *
  FROM provider_data.shared_data.locations_detail;

-- These grants allow our setup script to refer to our views
GRANT USAGE ON SCHEMA package_shared
  TO SHARE IN APPLICATION PACKAGE sales_forecast_package;
GRANT SELECT ON VIEW package_shared.location_detail_v
  TO SHARE IN APPLICATION PACKAGE sales_forecast_package;

CREATE SCHEMA sales_forecast_package.stage_content;
CREATE OR REPLACE STAGE sales_forecast_package.stage_content.sales_forecast_package_snowflake_stage;

The SQL above creates the application package sales_forecast_package. This application package will contain the application source code.

We also share provider data with the consumer in a safe and secure way. In the SQL above, we grant reference usage on the provider's data to the application package. We then create a schema that holds a view constructed from the provider's data. We grant the application package privileges on the schema and view, allowing the consumer to "access" the provider data through the application when they run it in their account.

Let's now upload the application source code into the application package. We'll upload the source code into a stage in the package.

Set your role to SALES_FORECAST_PROVIDER. Navigate to the sales_forecast_package.stage_content.sales_forecast_package_snowflake_stage stage. Click on the + FILES button at the top right. Next, click Browse in the modal that appears.

To avoid breaking any references to objects needed by the app, the folder structure within the stage must reflect the folder structure in the tasty-bytes-native-app directory. Be sure to upload the source code exactly as follows:

  1. Ensure you're in the sfguide-tasty-bytes-native-app/tasty-bytes-native-app directory and start by uploading the following files: README.md and manifest.yml.
  2. Next, upload the set up script. Once again, click on the + FILES button. Note the "Specify the path to an existing folder or create a new one (optional)" field in the modal. In this field, enter scripts. The modal will automatically prepend a forward slash to your entry, so the value will read as /scripts.
  3. Click Browse, navigate to the local sfguide-tasty-bytes-native-app/tasty-bytes-native-app/scripts folder on your computer, select setup.sql and upload the file.
  4. Next, upload the application logic files. Click on the + FILES button, and use /python as the destination path in the stage.
  5. Click Browse, navigate to the local sfguide-tasty-bytes-native-app/tasty-bytes-native-app/python folder on your computer, select all files, and upload them.
  6. Finally, upload the user interface files. Click on the + FILES button, and use /streamlit as the destination path in the stage.
  7. Click Browse, navigate to the local sfguide-tasty-bytes-native-app/tasty-bytes-native-app/streamlit folder on your computer, select all files, and upload them.

The folder structure within the stage should now reflect the folder structure in your local application directory.

stage

With the application source code uploaded, let's create the first version of the application.

Open a SQL worksheet and run the following:

USE ROLE sales_forecast_provider;

-- Allow the franchisee to install our new package
GRANT INSTALL, DEVELOP ON APPLICATION PACKAGE sales_forecast_package TO ROLE franchise_admin;

ALTER APPLICATION PACKAGE sales_forecast_package
  ADD VERSION v1_0 USING '@sales_forecast_package.stage_content.sales_forecast_package_snowflake_stage';

Recall that we are using a single Snowflake account for the purposes of convenience and demonstration. As a result, we must grant install and develop privileges to the application consumer so that we can install the application in the account. In practice, this would not be necessary.

The final few lines of SQL create the first version of the application (v1_0) using the source code.

Let's now install the application in the account. In practice, the consumer would be installing the application in their own account, so we'll assume the role of a user in the consumer account (in this case, franchise_admin) and install the application. In addition, running this SQL would not be necessary in practice. Instead, consumers would install the application into their account directly from the Snowflake Marketplace listing.

Open a SQL worksheet and run the following:

-- Install application in account

USE ROLE franchise_admin;
USE WAREHOUSE franchise_wh;

CREATE APPLICATION sales_forecast_app
FROM APPLICATION PACKAGE sales_forecast_package
USING VERSION v1_0;

GRANT APPLICATION ROLE sales_forecast_app.app_public
  TO ROLE franchise_viewer;

The SQL above creates the application from the application package and grants the app_public application role to the franchise_viewer role in the Snowflake account.

If all went as expected, the application is now installed in the account! Set your role to FRANCHISE_ADMIN, then navigate to the Apps section in your account. You should see an application titled SALES_FORECAST_APP installed. Click on the name of the app to run the app.

installed

When running the app for the first time, you'll be prompted to create bindings. The bindings link references defined in the manifest file (shift_sales) to corresponding objects (shift_sales table) in the Snowflake account. These bindings ensure that the application can access necessary objects in the account.

binding

Once the bindings are properly configured, the application will start up. Play around in the app to get a feel for how it works!

app

conda

To clean up, open a SQL worksheet and run the following:

-- Tear down

USE ROLE accountadmin;

DROP ROLE IF EXISTS sales_forecast_provider;
DROP ROLE IF EXISTS franchise_admin;
DROP ROLE IF EXISTS franchise_viewer;

DROP DATABASE IF EXISTS franchise_consumer_data;
DROP DATABASE IF EXISTS provider_data;

DROP APPLICATION IF EXISTS sales_forecast_app;
DROP APPLICATION PACKAGE IF EXISTS sales_forecast_package;

DROP WAREHOUSE IF EXISTS franchise_wh;

Congratulations! In just a few steps, you were able to build a Snowflake Native Application that uses a pre-trained machine learning model to make sales forecasts on consumer data. You played the role of the application provider, building an application that provides the consumer with data enrichment and application logic that performs model inference on their raw sales data to visualize sales forecasts.

What we've covered