Parsing Semi-Structured Data with Coalesce

In Coalesce, you can parse semi-structured data in Snowflake using a handful of clicks. Coalesce's parsers analyze the structure of entries within a variant column and automatically generate columns and a JOIN that captures the different attributes within them.

Read on for a step-by-step guide on how to parse JSON and XML data in Snowflake with Coalesce.

Prerequisites

Familiarity with Snowflake
Basic knowledge of SQL, database concepts, and objects
Completion of the foundational Coalesce Quickstart Accelerating Transformations with Coalesce and Snowflake

What You'll Need

A Snowflake account (either a trial account or access to an existing account)
A Coalesce account (either a trial account created via Snowflake Partner Connect, or access to an existing account)
Google Chrome browser (recommended)

What You'll Learn

How to load raw data from Snowflake into Coalesce
How to parse semi-structured data (JSON / XML) in a matter of clicks in Coalesce

What You'll Build

Source Nodes
Staging Nodes

Navigate to your Snowflake account and create a new Worksheet. Rename it "JSON Coalesce Sample."

snowflakeUI

Copy this code and paste it into your Worksheet:

snowflakecode

Select and run the code to create a new table DAILY_WEATHER in the COALESCE_SAMPLE_DATA database and COALESCE_SAMPLE_DATA_JSON schema.

tableweather

Switch over to your Coalesce account and open your Development Workspace in your default Project.

defaultproject

Once in your Development Workspace, click the gear icon in the sidebar in the lower left-hand corner to open up your Build Settings.

buildinterface

Click New Storage Location and name your new storage location JSON.

jsonlocation

Click the Create button to create your new Storage Location.

storagelocations

Click on the pencil icon in the upper left-hand corner of your screen (next to the name of your Workspace). This will open your Workspace Settings.

workspace

Under Storage Mappings, select the COALESCE_SAMPLE_DATABASE next to your JSON Storage Location. Select WEATHER as your schema from the dropdown menu, then click "Save" and close your Workspace Settings.

storage

Click back to your Browser tab and click the Nodes icon in the left sidebar. Click the + icon that appears next to the Search bar. Click Add Sources from the dropdown menu.

jsonsrc

Select the JSON Storage Location and the DAILY_14_TOTAL_SHORT table, then click Add 1 source at the bottom right of the window.

dailytotal

You'll see that a source node has been added to your graph. Click Fetch Data at the bottom right corner of the screen to preview the raw data.

fetchjson

Right click on your source node and hover over Add Node to open the dropdown menu. Click on Stage to create a stage node on top of your source node.

stgjson

In your stage node, right click on the first column named "V" and hover over Derive Mappings in the dropdown menu. Click on From JSON.

derivejson

You will see your first single column parsed into multiple columns. By deriving JSON mappings, you have created a column in the mapping grid for every primitive type (string, number, boolean, and null) within the object, with the appropriate transform to parse that value. Deriving JSON Mappings also recursively flattens every JSON array using a table function in the Join Tab.

Click the Create button to create your stage node, and then click the Run button to populate it.

runjson

Congratulations! You have easily parsed JSON in just a few clicks.

jsonparsed

You can explore the DDL and DML statements used to create your stage node under Results:

ddljson

Or you can preview the data within your stage node:

preview

Coalesce offers XML parsing capabilities (currently in beta). This process functions similarly to the JSON example that is previously covered in this guide.

Navigate back to your Snowflake account and create a new Worksheet. Name it Coalesce XML Sample. Copy the following SQL and paste it into your XML Worksheet in Snowflake, then highlight and run the code:

CREATE DATABASE IF NOT EXISTS COALESCE_SAMPLE_DATABASE;
CREATE SCHEMA IF NOT EXISTS COALESCE_SAMPLE_DATABASE.XML;
CREATE OR REPLACE TABLE COALESCE_SAMPLE_DATABASE.XML.MUSIC(src variant)
as
select parse_xml('<catalog issue="spring" date="2015-04-15">
<cd id="cd105">
<title>XML Music</title>
<genre>Emo</genre>
<artist>Microsoft</artist>
<publish_date>1111-11-11</publish_date>
<price>1.00</price>
<description>This music sux!</description>
</cd>
<cd id="cd106">
<title>Greg\'s Song</title>
<genre>Funk</genre>
<artist>Greg Henkhaus</artist>
<publish_date>2023-01-01</publish_date>
<price>200.00</price>
<description>BUY IT</description>
</cd>
</catalog>')

You'll see that a table MUSIC was created:

tablemusic

Navigate back to your Coalesce account and click on the gear icon in the sidebar in the lower left-hand corner to open up your Build Settings.

buildsettings

Click on New Storage Location to create a location named XML. Then click the Create button.

xmllocation

Click the pencil icon in the upper right corner to open up your Workspace Settings.

xml2

Under Storage Mappings, set your XML Storage Mappings as COALESCE_SAMPLE_DATABASE and XML schema. Then click the Save button and close your Workspace Settings.

xmlmapping

Navigate back to your Browser tab and click the + icon. Click Add Sources and select the XML source and MUSIC table. Click Add 1 source to add the raw data to your graph as a source node.

xmlsrc

Right click your XML source node and hover over Add Node. Click Stage to create a new stage node.

xmlstg

In your Stage node, right click the first column and hover over Derive Mappings. Click "From XML."

xmlderive

You'll see the contents of your single column parsed into multiple columns. Click the Create and Run buttons to create and populate your stage node. Preview the data by clicking on Data:

parsedxml

Congratulations, you've completed this guide and experienced how easy it is to parse semi-structured data in Snowflake with Coalesce! To learn more about Coalesce's JSON parsing capabilities, please visit our documentation.

Happy transforming!

What We've Covered

How to load raw data from Snowflake into Coalesce
How to parse semi-structured data (JSON / XML) in a matter of clicks in Coalesce