Apache Iceberg has become a very popular open-source table format for data lakes as its flexibility enables freedom of choice for organizations. Whether an organization wants to use best of breed services or they've inherited multiple data platforms due to mergers or acquisitions, adopting an open table format could be key to eliminating data silos.

Use Case

There is often no one-size-fits-all approach to tackling complex business challenges. Organizations often store their data in different places and use multiple tools and platforms to put that data to work. By uniting data across platforms and query engines using an open table format, organizations can serve a variety of business needs, including:

Prerequisites

What You'll Learn

What You'll Need

What You'll Build

In this section, you will create a table in Snowflake using the Iceberg format and also create a BigLake (external) table in BigQuery that points to the same Iceberg files.

Login to Snowflake

USE ROLE accountadmin;

Create a warehouse to hold the data

CREATE OR REPLACE WAREHOUSE ICEBERG_WAREHOUSE WITH WAREHOUSE_SIZE='XSMALL';

Reference: https://docs.snowflake.com/en/sql-reference/sql/create-warehouse

Create a database (the database)

CREATE OR REPLACE DATABASE ICEBERG_DATABASE;

Create a bucket to hold your BigLake Managed Table

Open: https://console.cloud.google.com/storage/browser

Create a link between Snowflake and GCS.

CREATE STORAGE INTEGRATION bigquery_integration
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'GCS'
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ('gcs://bigquery-snowflake-sharing');

Reference: https://docs.snowflake.com/en/sql-reference/sql/create-storage-integration

Get the service principal that we will grant Storage Object Admin in our GCS bucket

DESC STORAGE INTEGRATION bigquery_integration;

Create a custom IAM role that has the permissions required to access the bucket and get objects.

Open the storage account you created

In Snowflake, create an external volume on GCS

CREATE EXTERNAL VOLUME snowflake_ext_volume
  STORAGE_LOCATIONS =
    (
      (
        NAME = 'us-central1'
        STORAGE_PROVIDER = 'GCS'
        STORAGE_BASE_URL = 'gcs://bigquery-snowflake-sharing/snowflake-volume/'
      )
    ),
    ALLOW_WRITES = TRUE;

Reference: https://docs.snowflake.com/en/sql-reference/sql/create-external-volume

Describe the volume

DESCRIBE EXTERNAL VOLUME snowflake_external_volume

Set the current database

USE ICEBERG_DATABASE;

Create a schema in Snowflake

CREATE SCHEMA iceberg_schema;

Make the schema active

USE SCHEMA iceberg_schema;
Create Iceberg table using Snowflake Catalog
CREATE ICEBERG TABLE driver (driver_id int, driver_name string)
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'snowflake_ext_volume'
  BASE_LOCATION = 'driver';

Reference: https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-snowflake

Show the table you just created

SHOW TABLES

Insert new data

INSERT INTO driver (driver_id, driver_name) VALUES (1, 'Driver 001');
SELECT * FROM driver;

Review the latest metadata json file that Snowflake is pointing to as we need to point BigQuery to the same place

SELECT REPLACE(JSON_EXTRACT_PATH_TEXT(
     SYSTEM$GET_ICEBERG_TABLE_INFORMATION('ICEBERG_DATABASE.iceberg_schema.driver'),
          'metadataLocation'), 'gcs://', 'gs://');

Open the storage account you created

Create a BigQuery Dataset

CREATE SCHEMA IF NOT EXISTS snowflake_dataset OPTIONS(location = 'us-central1');

Navigate to BigQuery

Expand your project in the left hand panel

Open the storage account you created

The URIs needs to be from the following Snowflake command

CREATE OR REPLACE EXTERNAL TABLE `snowflake_dataset.driver`
WITH CONNECTION `us-central1.snowflake-connection`
OPTIONS (
  format = "ICEBERG",
  uris = ["gs://bigquery-snowflake-sharing/snowflake-volume/driver/metadata/00001-2d763c77-df0a-4230-bd52-033877d02c40.metadata.json"]
);

View the data in BQ

SELECT * FROM `snowflake_dataset.driver`;

Update table metadata in BQ

BigQuery will not see updated data in Snowflake since we are pointing to a specific snapshot or metadata json. We will have to update the table definition to the latest table metadata.

SYSTEM$GET_ICEBERG_TABLE_INFORMATION('<iceberg_table_name>')

In this section, you will create a table in BigQuery using the Iceberg format and also create an Iceberg (external) table in Snowflake that points to the same Iceberg files.

Create a bucket to hold your BigLake Managed Table

Navigate to BigQuery

Expand your project in the left hand panel

Navigate to BigQuery

CREATE SCHEMA IF NOT EXISTS blmt_dataset OPTIONS(location = 'us');

Create a BigLake Managed table

CREATE OR REPLACE TABLE `blmt_dataset.driver`
(
 driver_id                 INT64,
 driver_name               STRING,
 driver_mobile_number      STRING,
 driver_license_number     STRING,
 driver_email_address      STRING,
 driver_dob                DATE,
 driver_ach_routing_number STRING,
 driver_ach_account_number STRING
)
CLUSTER BY driver_id
WITH CONNECTION `us-central1.blmt-connection`
OPTIONS (
 file_format = 'PARQUET',
 table_format = 'ICEBERG',
 storage_uri = 'gs://blmt-snowflake-sharing/driver'
);

Load the table with some data

LOAD DATA INTO `blmt_dataset.driver`
FROM FILES (
 format = 'parquet',
 uris = ['gs://data-analytics-golden-demo/biglake/v1-source/managed-table-source/driver/*.parquet']);

View the data

SELECT * FROM `blmt_dataset.driver` LIMIT 1000;

View the storage metadata

To export the most recent metadata run this in BigQuery (a separate window is preferred)

EXPORT TABLE METADATA FROM blmt_dataset.driver;

In your storage window

Let's connect the data to Snowflake

USE ROLE accountadmin;

Create a warehouse to hold the data

CREATE OR REPLACE WAREHOUSE BLMT_WAREHOUSE WITH WAREHOUSE_SIZE='XSMALL';

Create a database in snowflake

CREATE OR REPLACE DATABASE BLMT_DATABASE;

Select the database

USE DATABASE BLMT_DATABASE;

Create the schema to hold the table

CREATE SCHEMA IF NOT EXISTS BLMT_SCHEMA;

Select the schema

USE SCHEMA BLMT_SCHEMA;

Create our GCS volume integration. This will create a link between Snowflake and GCS.

CREATE STORAGE INTEGRATION gcs_storage_integration
 TYPE = EXTERNAL_STAGE
 STORAGE_PROVIDER = 'GCS'
 ENABLED = TRUE
 STORAGE_ALLOWED_LOCATIONS = ('gcs://blmt-snowflake-sharing/');

Get the service principal that we will grant Storage Object Admin in our GCS bucket

DESC STORAGE INTEGRATION gcs_storage_integration;

Configure an external volume for Google Cloud Storage

Open your storage account you created

Create an external volume on GCS Reference: https://docs.snowflake.com/en/sql-reference/sql/create-external-volume CREATE OR REPLACE EXTERNAL VOLUME gcs_volume STORAGE_LOCATIONS = ( ( NAME = ‘gcs_volume' STORAGE_PROVIDER = ‘GCS' STORAGE_BASE_URL = ‘gcs://blmt-snowflake-sharing/' ) );

Create a catalog integration to manage Iceberg tables in the external volume

CREATE OR REPLACE CATALOG INTEGRATION catalog_integration
   CATALOG_SOURCE=OBJECT_STORE   -- Indicates that the catalog is backed by an object store
   TABLE_FORMAT=ICEBERG          -- Specifies the table format as Iceberg
   ENABLED=TRUE;                 -- Enables the catalog integration

Create the Iceberg table, pointing to the existing metadata file

CREATE OR REPLACE ICEBERG TABLE driver
   CATALOG='catalog_integration'   -- The catalog where the table will reside
   EXTERNAL_VOLUME='gcs_volume'    -- The external volume for table data
   BASE_LOCATION=''                -- Optional: Subdirectory within the storage location
   METADATA_FILE_PATH='driver/metadata/v1730762887.metadata.json'; -- Path to the existing metadata file

This will show the table just created

SHOW TABLES

Query the Iceberg table

SELECT * FROM driver;
SELECT COUNT(*) FROM driver;

Now that you are linked, you can try the following, Insert a record into the BigQuery table:

INSERT INTO `blmt_dataset.driver`
(driver_id, driver_name, driver_mobile_number, driver_license_number, driver_email_address,
driver_dob, driver_ach_routing_number, driver_ach_account_number)
VALUES (0, 'New Driver', 'xxx-xxx-xxxx', 'driver_license_number', 'driver_email_address',
CURRENT_DATE(), 'driver_ach_routing_number','driver_ach_account_number');

Now Query the record in Snowflake

SELECT * FROM driver WHERE driver_id = 0;

NOTE: You will not see the new record. You first need to tell BigQuery to export the latest metadata (see step "View the storage metadata") and update the metadata used by Snowflake (see step "Create the Iceberg table, pointing to the existing metadata file") pointing to the latest JSON file.

In this section you will set up BigQuery to stay in-sync with Iceberg tables hosted in Snowflake.

  1. So far, in this Quickstart, you've created an Iceberg table and exported Snowflake metadata to the Google Cloud Storage account upon each update. (Section 1 of this Quickstart)
  2. You've also created BigLake Iceberg tables in BigQuery and pointed to a specific metadata JSON file on storage. (Section 2 of this Quickstart). However, you are pointing at a specific version of the data and if changes are made to the Iceberg table in Snowflake, BigQuery will not see them.
  3. To sync Iceberg tables in Snowflake with BigQuery, BigQuery will have to read the Iceberg Catalog in Snowflake which will provide the latest metadata json file. Please see this GitHub repository for steps to enable this sync and automating the process:
  1. Finally, use the PATCH command to update BigQuery's Iceberg table to the latest metadata from Snowflake.

Technical Considerations

Technical considerations for the data sharing integration between BigQuery and Snowflake today include:

What you learned

By following this quickstart, you learned how to:

Resources

Google Cloud and Snowflake's commitment to Apache Iceberg's open-source table format empowers our customers to unite their data across query engines and platforms and use it to solve business problems with familiar features, tools, and services from both organizations. Learn how to build your first open data lakehouse with Google Cloud and Snowflake by exploring documentation from Google Cloud and Snowflake, watching a tutorial, then diving into the Quickstart Guide.