Apache Iceberg is an open table format for huge analytical datasets that enables high performance analytics on open data formats with ACID compliance. Snowflake and AWS both support Iceberg format that enables customers to drastically improve data interoperability, speed of implementation and performance for integrated data lakes.
This guide will take you through the steps of converting existing parquet data to Iceberg and using it to build open analytic environments using Snowflake and AWS Glue with AWS Lake Formation providing fine-grained access controls and temporary access token to Iceberg tables.
For this guide we will use a Financial Services use case where Insurance data is analyzed. The Quotes data is collected from systems and stored as parquet on S3, while Customer and Policy data is already available as internal Snowflake tables. We will try to identify customers who are likely to churn or potential fraud with a high number of recent quote requests.
ACCOUNTADMIN
access in US WEST (OREGON REGION)Administrator Access
In this step we will use CloudFormation to configure objects in the AWS account needed for the quickstart, we will also cover basics of the AWS console and how to leverage browser tabs to multi task with various AWS services.
We will now run a CloudFormation Template to configure a S3 bucket and Role with specific Policies in the AWS account.
Glue-IRC-Int
then click NextI acknowledge that AWS CloudFormation might create IAM resources.
and then click Submit to create the stack.It will take a few seconds to create the resources in your AWS account. In the Events screen you can click the refresh button to see updates.
In this step we will use AWS Glue to create an iceberg table and convert parquet data to iceberg format. Glue Studio will be used.
iceberg
, add Description(optional) as Database to store iceberg table
and click Create database.In this section, we will show you how to create an ETL job in Glue Studio visual interface.
s3://aws-data-analytics-workshops/aws_glue/glue-snowflake/raw-data/quotes/
. This is a public bucket where our raw quotes file resides.Leave everything as is; don't change anything here. This is just to show you that you have an option to change schema with the help of this transform.
We can now run the Glue job to convert raw parquet files and save it as an Iceberg table.
You have successfully created an Iceberg table in your AWS account from parquet files using Glue Studio.
As a bonus step you can open the browser tab with the S3 console and see the new folders and files where the data and metadata/manifest files are stored.
In this step we will configure the AWS Lake Formation control access to the Iceberg tables. In Lake Formation, this means fine-grained access control to Data Catalog resources and Amazon S3 locations.
To configure your Lake Formation access controls, first set up the application integration:
Administration
in the navigation pane. Select Application integration settings
.Allow external engines to access data in Amazon S3 locations with full table access
.Save
.Next you can grant data locations to the role that was created.
Data locations
under Permissions
in the left pane.Grant
located at the top-right corner.IAM users and roles
, select the role that was created.Storage locations
, click on Browse
and select the S3 bucket.Grantable
box.Grant
.Now you can register the data lake location:
Data lake locations
under Administration
in the left pane.Register location
located at the top-right corner.Amazon S3 path
, enter the S3 bucket path created earlier.IAM role
, select the role that was created.Permission mode
, select Lake Formation
.Register location
.Now you can grant permissions to the IAM role that was created earlier:
Data permissions
in the navigation pane.Grant
.Principals
, select IAM users and roles
and choose the role that was created.Resources
, select Named Data Catalog resources
.Catalog
, choose your AWS account ID.Database
, choose iceberg
.Database permissions
, select SUPER
.Grantable permissions
, select SUPER
, leave everything else as default.Grant
.SUPER access is required for mounting the Iceberg table in Amazon S3 as a Snowflake table.
Now you are ready to setup Snowflake account and configure the AWS integrations.
In this step we will configure the Snowflake account with Internal tables to represent Customer and Policy data. We will also create Snowflake objects such as a databaase, warehouse and role that are used in the quickstart. Then we will configure the external volume to allow Snowflake to read and write on S3. We will also create an integration with the Glue Data Catalog to allow Snowflake to retrieve the Iceberg Catalog information directly from the Glue Catalog.
ACCOUNTADMIN
privileges.USE ROLE SECURITYADMIN;
CREATE OR REPLACE ROLE HOL_ICE_RL COMMENT='Iceberg Role';
GRANT ROLE HOL_ICE_RL TO ROLE SYSADMIN;
USE ROLE ACCOUNTADMIN;
GRANT CREATE INTEGRATION ON ACCOUNT TO ROLE HOL_ICE_RL;
GRANT CREATE EXTERNAL VOLUME ON ACCOUNT TO ROLE HOL_ICE_RL;
GRANT CREATE DATABASE ON ACCOUNT TO ROLE HOL_ICE_RL;
GRANT CREATE WAREHOUSE ON ACCOUNT TO ROLE HOL_ICE_RL;
USE ROLE HOL_ICE_RL;
CREATE OR REPLACE DATABASE HOL_ICE_DB;
CREATE OR REPLACE WAREHOUSE HOL_ICE_WH
WITH WAREHOUSE_SIZE = 'XSMALL'
INITIALLY_SUSPENDED = TRUE;
USE ROLE HOL_ICE_RL;
USE HOL_ICE_DB.PUBLIC;
USE WAREHOUSE HOL_ICE_WH;
your AWS Role that was created
Now we will create the integration with the Glue Data Catalog
.
with your AWS account ID (you can simply copy and paste the values from the earlier external volume command). Also replace the
with the AWS role that was created
with your AWS account ID and
with the role ARN that was created.
with your AWS region name.It will look something like this
CREATE OR REPLACE CATALOG INTEGRATION glue_catalog_irc_int
CATALOG_SOURCE = ICEBERG_REST
TABLE_FORMAT = ICEBERG
CATALOG_NAMESPACE = 'iceberg'
REST_CONFIG = (
CATALOG_URI = 'https://glue.us-west-2.amazonaws.com/iceberg'
CATALOG_API_TYPE = AWS_GLUE
CATALOG_NAME = '6546xxxxxxxxx'
ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
)
REST_AUTHENTICATION = (
TYPE = SIGV4
SIGV4_IAM_ROLE = 'arn:aws:iam::6546xxxxxxxx:role/glue-snowflake-GluesnowflakedevdayLabRole-xxxxxxxxxxxx'
SIGV4_SIGNING_REGION = 'us-west-2'
)
ENABLED = TRUE;
DESC CATALOG INTEGRATION glue_catalog_irc_int;
Select everything in the JSON policy and replace with the JSON below.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "glue.amazonaws.com"
},
"Action": "sts:AssumeRole"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "<API_AWS_IAM_USER_ARN>"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": [
"enter value of <API_AWS_EXTERNAL_ID> from DESC CATALOG INTEGRATION"
]
}
}
}
]
}
Take your time and be careful when pasting the values into the JSON policy in the next few steps.
with that value.enter value of from DESC CATALOG INTEGRATION
.Your IAM Trust policy will look something like this
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "glue.amazonaws.com"
},
"Action": "sts:AssumeRole"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::90541xxxxxx:user/vvyk0000-s"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": [
"YJB50193_SFCRole=2_f1IsD5b8/DAFYPxxxxxxxxxxxx"
]
}
}
}
]
}
This now completes the integration to the Glue Catalog
Now go back to the worksheet and run this command to validate that the integration is working.
SELECT SYSTEM$VERIFY_CATALOG_INTEGRATION('glue_catalog_irc_int');
The result should say success.
In this step we will create a catalog-linked database in Snowflake using the integrations with Lake Formation and Glue catalog IRC API to create an Iceberg table object referring the Quotes Iceberg table. That will be used for analysis with the internal Customer and Policy tables. We will then create an aggregate table written in Iceberg format on S3 and use Snowflake to query it.
Execute line 67 in the worksheet:
CREATE OR REPLACE DATABASE iceberg_linked_db
LINKED_CATALOG = (
CATALOG = 'glue_catalog_irc_int',
NAMESPACE_MODE = FLATTEN_NESTED_NAMESPACE,
NAMESPACE_FLATTEN_DELIMITER = '-',
ALLOWED_NAMESPACES = ('iceberg')
) ;
Check sync status:
SELECT SYSTEM$CATALOG_LINK_STATUS('iceberg_linked_db');
The result should be something similar to this:
{"failureDetails":[],"executionState":"RUNNING","lastLinkAttemptStartTime":"2025-10-06T16:32:42.426Z"}
Now that the database is linked to glue catalog, we should be able to query the existing iceberg table in glue:
Let's read the Quotes data in Iceberg format from S3. Note that you need to use the double quotes for the glue objects such as databases, schemas, and tables.
use database iceberg_linked_db;
use schema "iceberg";
SELECT * FROM "quotes" LIMIT 20;
We can now create an iceberg table and register it with Glue catalog by replacing
with your S3 bucket and run the SQL command in line 115.
CREATE OR REPLACE ICEBERG TABLE iceberg_linked_db."iceberg"."quote_analysis_ice" (
"fullname" STRING,
"postcode" STRING,
"custid" STRING,
"ipid" NUMBER(18,0),
"productname" STRING,
"quotecount" NUMBER(18,0),
"policyno" STRING,
"quotedate" DATE,
"quote_product" STRING,
"originalpremium" NUMBER(28,2),
"totalpremiumpayable" NUMBER(28,2),
"createddate" DATE,
"brand" STRING,
"branchcode" STRING,
"policy_status_desc" STRING,
"typeofcover_desc" STRING,
"insurer_name" STRING,
"inceptiondate" DATE,
"renewaldate" DATE
)
BASE_LOCATION = 's3://<Enter your S3 bucket name that was created>/iceberg/quote-analysis-iceberg';
We can now combine internal Snowflake tables CUSTOMER
and POLICIES
with the existing quotes
iceberg table and insert into the new Iceberg table quote_analysis_ice
.
insert into iceberg_linked_db."iceberg"."quote_analysis_ice"
select
c.fullname, c.postcode, c.custid, c.ipid, c.productname, c.quotecount,
q."policyno", q."quotedate", q."quote_product", q."originalpremium", q."totalpremiumpayable",
p.createddate, p.brand, p.branchcode, p.policy_status_desc,
p.typeofcover_desc, p.insurer_name, p.inceptiondate, p.renewaldate
from
hol_ice_db.public.customer c,
"quotes" q,
hol_ice_db.public.policies p
where
c.fullname = q."fullname"
and c.postcode = q."postcode"
and c.quotecount > 5
and c.custid = p.custid;
This aggregate data can also be written in Iceberg format back to S3 for consumption by other services and engines.
Now we can query the newly created Iceberg table in glue catalog.
select * from iceberg_linked_db."iceberg"."quote_analysis_ice" limit 10;
This completes the creation of iceberg tables in Snowflake using the Glue Data Catalog IRC API and Lake Formation credential vending. You also used the iceberg table with internal tables to perform analysis on insurance customers, quotes and policies. This demonstrates the powerful integration which provides read and write capabilities to glue from Snowflake.
Follow below steps to ensure the deployed resources are cleaned up. Snowflake:
DROP DATABASE HOL_ICE_DB;
DROP TABLE iceberg_linked_db."iceberg"."quote_analysis_ice";
DROP DATABASE iceberg_linked_db;
DROP CATALOG INTEGRATION glue_catalog_irc_int;
AWS:
You've successfully converted parquet data to Iceberg format using AWS Glue, integrated Snowflake with S3 and the Glue Data Catalog Iceberg Rest, then combined Iceberg table data with Internal Snowflake data for analytics, wrote aggregate data in Iceberg format to S3 and finally used Snowflake to analyze the Iceberg data.