Coalesce is a best-in-class Data Transformation solution for Snowflake. With Coalesce, you build directed acyclic graphs (DAG) made up of nodes that run on a schedule and produce tested, up-to-date datasets ready for your business users.
The Coalesce product is built around the concept of "metadata" - column and table level information that describes the structure and transformations inside your data warehouse. This metadata makes both designing and deploying data warehouses easier, especially at enterprise scale.
This guide will help you quickly get started with Coalesce, connecting your database and creating a small data mart for a hypothetical sales team. Using the Snowflake sample database, we will be implementing a Dimension node to track historical changes of their customers and a Fact node to record order transactions.
The following video provides a brief overview of (most of) what this guide will be covering.
When you first sign in on a new account, you'll be presented with the Projects Dashboard, where a default Project and Development Workspace will be already created. Feel free to disregard the warning about Version Control, as this guide will not be using that feature. Click the Launch button to open the workspace to continue.
Once you've done this you'll be presented with the Build interface. This is where you'll spend most of your time creating nodes, building a graph with them, and transforming your data. In Coalesce, each Node is a logical unit of work that represents a table, a view, or multiple Snowflake objects. The other interface is the Deploy interface, which will be used to push your pipeline to other environments (QA, Production, etc.), including your graph and other metadata.
You've now connected Coalesce to your Snowflake instance!
A storage location is a logical name you provide to represent a database and schema (in Snowflake) and you will need them to make use of the Workspace you configured earlier. Depending on when you signed up for Coalesce, your account may have one or two Storage Locations already created. One will be mapped to the Snowflake Sample Data (SRC or SAMPLE) and the other will be where we write new tables (WORK or TARGET). If your workspace already has 2 storage locations, you can skip to Step 3 below.
Now it's time to add Sources to the graph. The graph node view is where you'll configure Nodes that will transform your data. Below is an example of a graph with several nodes -
To add Source Nodes take the following steps:
Now that you have a few Source Nodes on your graph, it's time to add a Stage Node. Stage Nodes are intermediate nodes in the graph where you prepare the data by applying business logic.
NationSource Node → Add Node → Stage Node. Note that you can select multiple Source Nodes by Shift+clicking them and then add multiple Stage Nodes simultaneously.
N_NAMEcolumn. Try a simple transform like
LOWER()and the name of your column, or you can use the syntax
Congratulations! You've connected your database and applied a basic transformation to your data.
Now let's create a simple slowly changing dimension (Type 2) — an industry standard for tracking historical data by creating multiple records for a given natural key.
STG_CUSTOMERnode. By default, Coalesce creates a Type 1 Dimension. In this guide we will be making a Type 2 Dimension to track historical data of a column.
C_CUSTKEYas a business key by selecting it and clicking the arrow to move it to the right
C_PHONE, clicking the arrow to move them to the right
You have now finished creating a Type 2 Dimension table. Next we will make a Fact Table.
Now let's create a fact table.
STG_ORDERSnode and delete all the columns except for
DIM_CUSTOMERnode on the left side, then select
DIM_CUSTOMER_KEYand drag it into your
You have now made a fact table! You can run this query in Snowflake (adjusting your schema and databases from
MY_SCHEMA to the ones in your environment) to confirm.
select DIM.C_NAME CUSTOMER_NAME, sum(FCT.O_TOTALPRICE) TOTAL_PRICE from "MY_DB"."MY_SCHEMA"."FCT_ORDERS" FCT inner join "MY_DB"."MY_SCHEMA"."DIM_CUSTOMER" DIM on FCT.DIM_CUSTOMER_KEY = DIM.DIM_CUSTOMER_KEY group by DIM.C_NAME;
Now that you're familiar with some of Coalesce's basic functionality and how it can be used to automate many of your SQL workflows, you can: