The Coherent Spark Connector transforms business logic designed in Microsoft Excel spreadsheets into reusable SQL functions that call our Spark APIs from Snowflake Data Cloud. Joint customers can save significant time on development and testing, and hence roll out their products to the market quickly while the original dataset remains in the Snowflake environment. The entire workflow comes with enterprise-grade security and scalability. Please see the FAQs in Step 6 for additional information.
By the end of this guide, you'll learn:
Make sure model names and Xparameter names meet the Snowflake identifier requirements, otherwise, the synchronization process might fail.
A name:
_
)._
), decimal digits (0-9), and dollar signs ($
).This table defines the active Xservices which are supported or not supported in the Snowflake native connector.
Spark model(s) using any Xservices defined below as "Not supported", might fail to be imported as Snowflake UDFs in the synchronization process, or the imported Snowflake UDFs might fail to return correct results.
This connector aims to support all Xservices available in Spark and this table will be updated regularly when there are new releases. It's always recommended to update the Snowflake native connector to the latest version.
Certain legacy Xservices are not included in this table and they will not be supported in the Snowflake connector unless further notice.
Xservices | Supported |
| YES |
| YES |
| YES |
| NO |
| YES |
| YES |
| NO |
| NO |
| NO |
| NO |
| NO |
| NO |
| NO |
| NO |
| NO |
Table last updated: 3rd June 2023.
Replace the parameters in the curly brackets { }
in the SQL queries, then execute in the Snowflake environment.
{APP_NAME}
: Application name ("SPARK_CONNECTOR" by default).USE {APP_NAME};
Create and configure API integration based on the specific Snowflake native application you are operating.
{AWS_ARN}
: Amazon resource name of a cloud platform role.
: Spark HTTPS proxy service endpoint.Spark Connector (UATUS)
Parameters | Configuration for Spark Connector (UATUS) |
arn:aws:iam::533606394992:role/Snowflake-WASM-Server-Invoker | |
https://ymeen1pkt6.execute-api.us-east-1.amazonaws.com |
Spark Connector (PRODUS)
Parameters | Configuration for Spark Connector (PRODUS) |
(Coming soon...) | |
(Coming soon...) |
CREATE OR REPLACE API INTEGRATION SPARK_INTEG
API_PROVIDER = AWS_API_GATEWAY
API_AWS_ROLE_ARN = '{AWS_ARN}'
API_ALLOWED_PREFIXES = ('{SPARK_PROXY}')
ENABLED = TRUE;
GRANT USAGE ON WAREHOUSE {WAREHOUSE_NAME} TO APPLICATION {APP_NAME};
GRANT USAGE ON INTEGRATION SPARK_INTEG TO APPLICATION {APP_NAME};
CALL SPARK_PUBLIC.EXECUTE_EXTERNAL_FUNCTIONS('SPARK_INTEG');
{SPARK_FOLDER}
: The Spark folder (URL) that hosts the services.
{SPARK_KEY}
: Enter the synthetic key, and then clarify key type as ‘SYNTHETICKEY' in the third parameter.
CALL SPARK_PUBLIC.SETUP('{SPARK_FOLDER}', '{SPARK_KEY}', 'SYNTHETICKEY', CURRENT_WAREHOUSE());
CALL SPARK_PUBLIC.SYNC();
{SERVICE_NAME}
: The service name as presented in the Spark platform.
CALL SPARK_PUBLIC.SYNC('{SERVICE_NAME}');
For each version of each service in a synchronized folder, you will see two database functions, one called {SERVICE_NAME}
and one called {SERVICE_NAME}_VARIANT
. You'll learn how to use these in the next step.
The two database functions operate in different modes:
{SERVICE_NAME}(Parameters)
: Returns results in a tabular format.SELECT Delta, Gamma, Rho, Theta, Vega FROM TABLE
(SPARK.BLACKSCHOLES(90::float,0.5::float,0.9::float,56::float,0.5::float));
SELECT input.id,input.excercisePrice, input.risklessRate, input.stdDevi, input.stockPrice, input.timeToExpiry,output.*
FROM {TABLE_NAME} input
JOIN TABLE(SPARK.BLACKSCHOLES(
input.excercisePrice,
input.risklessRate,
input.stdDevi,
input.stockPrice,
input.timeToExpiry)) output
{SERVICE_NAME}_VARIANT(Parameters)
: Returns results in raw JSON format.SELECT SPARK.BLACKSCHOLES_VARIANT(90, 0.5, 0.9, 56, 0.5);
Congratulations, you're now all set up to call your Spark services from your Snowflake environment! This means you can take business logic written in Excel, upload it to Spark, and immediately use it on all of your data in Snowflake via API, no coding needed!
You can find out more about Coherent Spark on our homepage or read up about other features in our documentation.
Functions imported in the Spark synchronization process are ready for cross-database access. Snowflake users can execute Spark functions against the data from another database in the same cloud warehouse when the query contains the full naming conversion, for example: database.schema.function-name
/ database.schema.procedure-name)
.
SELECT SPARK_CONNECTOR.SPARK.BLACKSCHOLES_VARIANT(90, 0.5, 0.9, 56, 0.5);
The Spark connector relies on the Spark API endpoint, which means that data from Snowflake will be taken into the Spark server for processing. All UDFs generated from the Spark connector setup process follow Snowflake's advice on ensuring the entire data transition is implemented in the secured environment. The UDF owner must grant callers appropriate privilege(s) on the UDF access and usage. Snowflake users need to provide subscription information (API key) when calling the Spark proxy service. For more information, please refer to Snowflake's documentation on external function security.