The Coherent Spark Connector transforms business logic designed in Microsoft Excel spreadsheets into reusable SQL functions that call our Spark APIs from Snowflake Data Cloud. Joint customers can save significant time on development and testing, and hence roll out their products to the market quickly while the original dataset remains in the Snowflake environment. The entire workflow comes with enterprise-grade security and scalability. Please see the FAQs in Step 6 for additional information.

Benefits

Prerequisites

What You'll Learn

By the end of this guide, you'll learn:

What You'll Need

What You'll Build

Additional Technical Information

Naming convention

Make sure model names and Xparameter names meet the Snowflake identifier requirements, otherwise, the synchronization process might fail.

A name:

Supported/Unsupported Xservices

This table defines the active Xservices which are supported or not supported in the Snowflake native connector.

Spark model(s) using any Xservices defined below as "Not supported", might fail to be imported as Snowflake UDFs in the synchronization process, or the imported Snowflake UDFs might fail to return correct results.

This connector aims to support all Xservices available in Spark and this table will be updated regularly when there are new releases. It's always recommended to update the Snowflake native connector to the latest version.

Certain legacy Xservices are not included in this table and they will not be supported in the Snowflake connector unless further notice.

Xservices

Supported

xinput

YES

xoutput

YES

xparameter

YES

subservices

NO

xcall (Insert requestbody)

YES

xcall (Insert requestdata)

YES

xCSVinput

NO

xCSVoutput

NO

ximage / ximageoriginal

NO

xjinput

NO

xjoutput

NO

xreport

NO

xrichoutput

NO

xsolve

NO

xvalidate

NO

Table last updated: 3rd June 2023.

Download from Private Sharing

  1. Sign into the Snowflake platform.
  2. Go to "Apps" on the left panel and "Spark Connector' should appear under "Shared with you" if the Snowflake account is given access to the private application. Click the application widget to visit the full information page.
  3. Click "Get" to open the installation pop up.
  4. Select your preferred installation options (database name, account role for access) and then click "Get" again. Spark Connector will be installed in the consumer platform.
  5. Once Spark Connector is successfully installed, click "Open" to begin.

install from apps menu

Download from Public Marketplace

  1. Go to our Marketplace listing or search for "Spark Connector" in the Marketplace.
  2. Click "Get" to open the installation pop up.
  3. Select your preferred installation options (database name, account role for access) and then click "Get" again. Spark Connector will be installed in the consumer platform.
  4. Once Spark Connector is successfully installed, click "Open" to begin.

coherent spark connector marketplace listing

Review the installed application in the consumer platform

review installed app

1. Specify the active application for the session.

Replace the parameters in the curly brackets { } in the SQL queries, then execute in the Snowflake environment.

USE {APP_NAME};

2. Create API Integration for external functions.

Create and configure API integration based on the specific Snowflake native application you are operating.

{AWS_ARN}: Amazon resource name of a cloud platform role.

: Spark HTTPS proxy service endpoint.

Spark Connector (UATUS)

Parameters

Configuration for Spark Connector (UATUS)

arn:aws:iam::533606394992:role/Snowflake-WASM-Server-Invoker

https://ymeen1pkt6.execute-api.us-east-1.amazonaws.com

Spark Connector (PRODUS)

Parameters

Configuration for Spark Connector (PRODUS)

(Coming soon...)

(Coming soon...)

CREATE OR REPLACE API INTEGRATION SPARK_INTEG
API_PROVIDER = AWS_API_GATEWAY
API_AWS_ROLE_ARN = '{AWS_ARN}'
API_ALLOWED_PREFIXES = ('{SPARK_PROXY}')
ENABLED = TRUE;

3. Grant privileges to application/user access to allow service synchronization.

GRANT USAGE ON WAREHOUSE {WAREHOUSE_NAME} TO APPLICATION {APP_NAME};
GRANT USAGE ON INTEGRATION SPARK_INTEG TO APPLICATION {APP_NAME};

4. Initialize external functions in the application.

CALL SPARK_PUBLIC.EXECUTE_EXTERNAL_FUNCTIONS('SPARK_INTEG');

5. Identify the Spark folder to be synchronized.

{SPARK_FOLDER}: The Spark folder (URL) that hosts the services.

{SPARK_KEY}: Enter the synthetic key, and then clarify key type as ‘SYNTHETICKEY' in the third parameter.

CALL SPARK_PUBLIC.SETUP('{SPARK_FOLDER}', '{SPARK_KEY}', 'SYNTHETICKEY', CURRENT_WAREHOUSE());

Synchronize the folder regularly to maintain the updated services.

CALL SPARK_PUBLIC.SYNC();

Synchronize multiple versions of a single Spark service.

{SERVICE_NAME}: The service name as presented in the Spark platform.

CALL SPARK_PUBLIC.SYNC('{SERVICE_NAME}');

Spark user interface on the left showing three versions of a service. Snowflake interface on the right showing two SQL functions for each version in Spark

For each version of each service in a synchronized folder, you will see two database functions, one called {SERVICE_NAME} and one called {SERVICE_NAME}_VARIANT. You'll learn how to use these in the next step.

The two database functions operate in different modes:

{SERVICE_NAME}(Parameters): Returns results in a tabular format.

Query a single set of parameters

SELECT Delta, Gamma, Rho, Theta, Vega FROM TABLE
(SPARK.BLACKSCHOLES(90::float,0.5::float,0.9::float,56::float,0.5::float));

results from a single query

Process a table of data.

SELECT input.id,input.excercisePrice, input.risklessRate, input.stdDevi, input.stockPrice, input.timeToExpiry,output.* 
FROM {TABLE_NAME} input 
JOIN TABLE(SPARK.BLACKSCHOLES(
    input.excercisePrice, 
    input.risklessRate, 
    input.stdDevi, 
    input.stockPrice, 
    input.timeToExpiry)) output

Snowflake query returns table of results

{SERVICE_NAME}_VARIANT(Parameters): Returns results in raw JSON format.

SELECT SPARK.BLACKSCHOLES_VARIANT(90, 0.5, 0.9, 56, 0.5);

Snowflake query returns JSON array of results

Congratulations, you're now all set up to call your Spark services from your Snowflake environment! This means you can take business logic written in Excel, upload it to Spark, and immediately use it on all of your data in Snowflake via API, no coding needed!

You can find out more about Coherent Spark on our homepage or read up about other features in our documentation.

What we've covered

FAQs

1. Can I execute Spark functions against the data from another database?

Functions imported in the Spark synchronization process are ready for cross-database access. Snowflake users can execute Spark functions against the data from another database in the same cloud warehouse when the query contains the full naming conversion, for example: database.schema.function-name / database.schema.procedure-name).

SELECT SPARK_CONNECTOR.SPARK.BLACKSCHOLES_VARIANT(90, 0.5, 0.9, 56, 0.5);

2. How is my Snowflake data kept safe when using the synchronized Spark functions?

The Spark connector relies on the Spark API endpoint, which means that data from Snowflake will be taken into the Spark server for processing. All UDFs generated from the Spark connector setup process follow Snowflake's advice on ensuring the entire data transition is implemented in the secured environment. The UDF owner must grant callers appropriate privilege(s) on the UDF access and usage. Snowflake users need to provide subscription information (API key) when calling the Spark proxy service. For more information, please refer to Snowflake's documentation on external function security.