In Coalesce, you can parse semi-structured data in Snowflake using a handful of clicks. Coalesce's parsers analyze the structure of entries within a variant column and automatically generate columns and a JOIN that captures the different attributes within them.
Read on for a step-by-step guide on how to parse JSON and XML data in Snowflake with Coalesce.
Prerequisites
What You'll Need
- A Snowflake account (either a trial account or access to an existing account)
- A Coalesce account (either a trial account created via Snowflake Partner Connect, or access to an existing account)
- Google Chrome browser (recommended)
What You'll Learn
- How to load raw data from Snowflake into Coalesce
- How to parse semi-structured data (JSON / XML) in a matter of clicks in Coalesce
What You'll Build
- Source Nodes
- Staging Nodes
- Navigate to your Snowflake account and create a new Worksheet. Rename it "JSON Coalesce Sample."
- Copy this code and paste it into your Worksheet:
- Select and run the code to create a new table
DAILY_WEATHER
in the COALESCE_SAMPLE_DATA
database and COALESCE_SAMPLE_DATA_JSON
schema.
- Switch over to your Coalesce account and open your Development Workspace in your default Project.
- Once in your Development Workspace, click the gear icon in the sidebar in the lower left-hand corner to open up your Build Settings.
- Click New Storage Location and name your new storage location JSON.
- Click the Create button to create your new Storage Location.
- Click on the pencil icon in the upper left-hand corner of your screen (next to the name of your Workspace). This will open your Workspace Settings.
- Under Storage Mappings, select the
COALESCE_SAMPLE_DATABASE
next to your JSON Storage Location. Select WEATHER
as your schema from the dropdown menu, then click "Save" and close your Workspace Settings.
- Click back to your Browser tab and click the Nodes icon in the left sidebar. Click the + icon that appears next to the Search bar. Click Add Sources from the dropdown menu.
- Select the JSON Storage Location and the
DAILY_14_TOTAL_SHORT
table, then click Add 1 source at the bottom right of the window.
- You'll see that a source node has been added to your graph. Click Fetch Data at the bottom right corner of the screen to preview the raw data.
- Right click on your source node and hover over Add Node to open the dropdown menu. Click on Stage to create a stage node on top of your source node.
- In your stage node, right click on the first column named "V" and hover over Derive Mappings in the dropdown menu. Click on From JSON.
- You will see your first single column parsed into multiple columns. By deriving JSON mappings, you have created a column in the mapping grid for every primitive type (string, number, boolean, and null) within the object, with the appropriate transform to parse that value. Deriving JSON Mappings also recursively flattens every JSON array using a table function in the Join Tab.
Click the Create button to create your stage node, and then click the Run button to populate it.
- Congratulations! You have easily parsed JSON in just a few clicks.
You can explore the DDL and DML statements used to create your stage node under Results:
Or you can preview the data within your stage node:
Coalesce offers XML parsing capabilities (currently in beta). This process functions similarly to the JSON example that is previously covered in this guide.
- Navigate back to your Snowflake account and create a new Worksheet. Name it Coalesce XML Sample. Copy the following SQL and paste it into your XML Worksheet in Snowflake, then highlight and run the code:
CREATE DATABASE IF NOT EXISTS COALESCE_SAMPLE_DATABASE;
CREATE SCHEMA IF NOT EXISTS COALESCE_SAMPLE_DATABASE.XML;
CREATE OR REPLACE TABLE COALESCE_SAMPLE_DATABASE.XML.MUSIC(src variant)
as
select parse_xml('<catalog issue="spring" date="2015-04-15">
<cd id="cd105">
<title>XML Music</title>
<genre>Emo</genre>
<artist>Microsoft</artist>
<publish_date>1111-11-11</publish_date>
<price>1.00</price>
<description>This music sux!</description>
</cd>
<cd id="cd106">
<title>Greg\'s Song</title>
<genre>Funk</genre>
<artist>Greg Henkhaus</artist>
<publish_date>2023-01-01</publish_date>
<price>200.00</price>
<description>BUY IT</description>
</cd>
</catalog>')
- You'll see that a table
MUSIC
was created:
- Navigate back to your Coalesce account and click on the gear icon in the sidebar in the lower left-hand corner to open up your Build Settings.
- Click on New Storage Location to create a location named XML. Then click the Create button.
- Click the pencil icon in the upper right corner to open up your Workspace Settings.
- Under Storage Mappings, set your XML Storage Mappings as
COALESCE_SAMPLE_DATABASE
and XML schema. Then click the Save button and close your Workspace Settings.
- Navigate back to your Browser tab and click the + icon. Click Add Sources and select the XML source and
MUSIC
table. Click Add 1 source to add the raw data to your graph as a source node.
- Right click your XML source node and hover over Add Node. Click Stage to create a new stage node.
- In your Stage node, right click the first column and hover over Derive Mappings. Click "From XML."
- You'll see the contents of your single column parsed into multiple columns. Click the Create and Run buttons to create and populate your stage node. Preview the data by clicking on Data:
Congratulations, you've completed this guide and experienced how easy it is to parse semi-structured data in Snowflake with Coalesce! To learn more about Coalesce's JSON parsing capabilities, please visit our documentation.
Happy transforming!
What We've Covered
- How to load raw data from Snowflake into Coalesce
- How to parse semi-structured data (JSON / XML) in a matter of clicks in Coalesce