This guide demonstrates how to build a declarative incremental pipeline and establish a multi-layered lakehouse architecture in Snowflake using Dynamic Iceberg Tables that read from and write back to an external AWS Glue Catalog-managed Iceberg data lake.
By the end of this guide, you will learn to work with:
You will build a three-tiered lakehouse architecture in Snowflake: Bronze Layer: Connects Snowflake to the AWS Glue Managed Iceberg tables using Snowflake Catalog Integration and Catalog Linked Database. Silver Layer: Creates dynamic Iceberg tables that apply cleaning and standardization to the Bronze data, with results stored as Snowflake Managed Iceberg tables back in S3. Gold Layer: Creates final, denormalized, and aggregated Snowflake dynamic Iceberg tables optimized for analytical reporting by joining the Silver-layer entities.
Creating Glue managed Iceberg tables in the AWS Glue Catalog is a key step to setting up the Bronze layer for the Snowflake Iceberg integration. You can create the Glue managed Iceberg tables using the AWS Management Console (via AWS Glue or Lake Formation) or using SQL via Amazon Athena. For now, we will use Athena to define the schema and table properties for Iceberg.
Before creating the tables, ensure these components are ready:
AWS Glue Database: You must have the bronze_analytics_db created in your AWS Glue Catalog. S3 Location: An S3 path (e.g., s3:///glue-iceberg/) where the Iceberg metadata and data files will reside. This location must correspond to the STORAGE_BASE_URL defined in your Snowflake EXTERNAL VOLUME. Permissions: You need the appropriate AWS IAM permissions to create tables in the Glue Catalog and read/write to the S3 location (often configured via AWS Lake Formation if it is enabled).
Amazon Athena provides a simple SQL interface to create and manage Glue-managed Iceberg tables.
de_orders Table Creation. This table contains the core transactional order information.
CREATE TABLE bronze_analytics_db.de_orders (
billing_address STRING,
created_at TIMESTAMP,
currency STRING,
customer_id BIGINT,
delivery_date TIMESTAMP,
discount_amount DOUBLE,
notes STRING,
order_date TIMESTAMP,
order_id BIGINT,
order_status STRING,
order_uuid STRING,
payment_method STRING,
payment_status STRING,
shipping_address STRING,
shipping_cost DOUBLE,
shipping_date TIMESTAMP,
shipping_method STRING,
subtotal DOUBLE,
tax_amount DOUBLE,
total_amount DOUBLE,
updated_at TIMESTAMP)
LOCATION 's3://<s3_bucket>/glue-iceberg/de_orders'
TBLPROPERTIES (
'table_type'='iceberg',
'write_compression'='snappy'
);
de_order_items Table Creation. This table holds the line-item details for each order.
CREATE TABLE IF NOT EXISTS bronze_analytics_db.de_order_items (
created_at TIMESTAMP,
discount_percent BIGINT,
line_total DOUBLE,
order_id BIGINT,
order_item_id BIGINT,
product_id BIGINT,
quantity BIGINT,
tax_rate DOUBLE,
total_price DOUBLE,
unit_price DOUBLE,
updated_at TIMESTAMP
)
LOCATION 's3://<s3_bucket>/glue-iceberg/de_order_items'
TBLPROPERTIES (
'table_type'='iceberg',
'write_compression'='snappy'
);
de_products Table Creation. This table stores the product dimension data.
CREATE TABLE bronze_analytics_db.de_products (
barcode string,
brand string,
category string,
color string,
cost_price double,
created_at timestamp,
description string,
dimensions_cm string,
is_active boolean,
launch_date timestamp,
material string,
product_id bigint,
product_name string,
product_uuid string,
reorder_level bigint,
size string,
sku string,
stock_quantity bigint,
subcategory string,
supplier_id bigint,
unit_price double,
updated_at timestamp,
weight_kg double)
LOCATION 's3://<s3_bucket>/glue-iceberg/de_products'
TBLPROPERTIES (
'table_type'='iceberg',
'write_compression'='snappy'
);
de_customers Table Creation. This table stores the customer dimension data.
CREATE TABLE bronze_analytics_db.de_customers (
address_line1 string,
address_line2 string,
city string,
country string,
created_at timestamp,
customer_id bigint,
customer_segment string,
customer_uuid string,
date_of_birth timestamp,
email string,
first_name string,
gender string,
is_active boolean,
last_login_date timestamp,
last_name string,
phone string,
postal_code string,
registration_date timestamp,
state string,
total_orders bigint,
total_spent double,
updated_at timestamp)
LOCATION 's3://<s3_bucket>/glue-iceberg/de_customers'
TBLPROPERTIES (
'table_type'='iceberg',
'write_compression'='snappy'
);
Note: I have also provided an AWS Glue job to populate the Glue managed Iceberg tables with test data. This is the AWS glue script which can be reused for populating the tables with mock data.
Download the iceberg_dt.py file from this git repository and run it from Glue.
This step creates a Snowflake Catalog Integration to connect to the AWS Glue Catalog, which acts as the metadata store for the Iceberg tables. A catalog integration is a named, account-level Snowflake object that stores information about how your table metadata is organized for the following scenarios:
CREATE or replace CATALOG INTEGRATION glue_catalog_integration
CATALOG_SOURCE = ICEBERG_REST -- Connects to the Iceberg REST Catalog endpoint
TABLE_FORMAT = ICEBERG
CATALOG_NAMESPACE = 'bronze_analytics_db'
REST_CONFIG = (
CATALOG_URI = 'https://glue.us-east-1.amazonaws.com/iceberg'
CATALOG_API_TYPE = AWS_GLUE
CATALOG_NAME = '<Your AWS Account ID>'
)
REST_AUTHENTICATION = (
TYPE = SIGV4
SIGV4_IAM_ROLE = 'arn:aws:iam::<Your AWS Account ID>:role/<role_arn>'
SIGV4_SIGNING_REGION = 'us-east-1'
)
ENABLED = TRUE;
An External Volume is created to define the S3 location where the actual Iceberg data files reside and specifies the IAM role for Snowflake to read/write the data.
CREATE OR REPLACE EXTERNAL VOLUME de_external_volume
STORAGE_LOCATIONS =
(
(
NAME = 'de_external_vol'
STORAGE_PROVIDER = 'S3'
STORAGE_BASE_URL = 's3://<bucket_name>/glue-iceberg/' -- S3 path for Iceberg data files
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::<account_id>:role/<role_arn>' -- IAM role for S3 read/write
ENCRYPTION = ( TYPE = 'AWS_SSE_S3' )
)
)
ALLOW_WRITES = TRUE;
A catalog-linked database is a Snowflake database connected to an external Iceberg REST catalog. Snowflake automatically syncs with the external catalog to detect namespaces and Iceberg tables, and registers the remote tables to the catalog-linked database. Catalog-linked databases also support creating and dropping schemas or Iceberg tables.
CREATE OR REPLACE DATABASE glue_catalog_linked_db
LINKED_CATALOG = (
CATALOG = 'glue_catalog_integration',
ALLOWED_NAMESPACES = ('bronze_analytics_db') -- Limit scope to the Bronze database
NAMESPACE_MODE = FLATTEN_NESTED_NAMESPACE,
NAMESPACE_FLATTEN_DELIMITER = '-',
SYNC_INTERVAL_SECONDS = 60 -- Snowflake checks for metadata updates every 60 seconds
),
EXTERNAL_VOLUME = 'de_external_volume';
Verify the setup and explore the discovered Bronze tables.
SELECT SYSTEM$VERIFY_CATALOG_INTEGRATION('glue_catalog_integration');
-- List schemas/namespaces available
SELECT SYSTEM$LIST_NAMESPACES_FROM_CATALOG('glue_catalog_integration', '',0);
-- Check the status of the new linked database
SELECT SYSTEM$CATALOG_LINK_STATUS('glue_catalog_linked_db');
-- List the Iceberg tables discovered
SELECT SYSTEM$LIST_ICEBERG_TABLES_FROM_CATALOG('glue_catalog_integration');
-- Show the linked schemas/namespaces
show schemas in database glue_catalog_linked_db;
use schema "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db";
show iceberg tables;
The Silver layer applies cleaning, standardization, and enrichment, using Dynamic Iceberg Tables for automated refresh. The resulting tables are stored as Iceberg tables back in S3 via the External Volume.
CREATE DATABASE IF NOT EXISTS silver_analytics_db;
USE DATABASE silver_analytics_db;
Creates a dynamic Iceberg table for cleaned and enhanced product data.
CREATE OR REPLACE DYNAMIC ICEBERG TABLE silver_analytics_db.public."de_products_cleaned"
TARGET_LAG = '1 minute'
WAREHOUSE = HOL_ICE_WH
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'de_external_volume'
BASE_LOCATION = 'de_products_cleaned'
AS
SELECT
"product_id",
"product_uuid",
TRIM(UPPER("product_name")) as "product_name_clean",
TRIM("description") as "product_description",
UPPER("category") as "category_standardized",
UPPER("subcategory") as "subcategory_standardized",
UPPER("brand") as "brand_standardized",
"unit_price" as "price",
COALESCE("cost_price", 0) as "cost",
ROUND("unit_price" - COALESCE("cost_price", 0), 2) as "profit_margin",
CASE
WHEN "unit_price" - COALESCE("cost_price", 0) <= 0 THEN 'No Profit'
WHEN ("unit_price" - COALESCE("cost_price", 0)) / NULLIF("unit_price", 0) >= 0.5 THEN 'High Margin'
WHEN ("unit_price" - COALESCE("cost_price", 0)) / NULLIF("unit_price", 0) >= 0.3 THEN 'Medium Margin'
ELSE 'Low Margin'
END as "margin_tier",
"weight_kg" as "weight",
"dimensions_cm" as "dimensions",
UPPER("color") as "color_standardized",
UPPER("size") as "size_standardized",
"material",
"supplier_id",
"stock_quantity",
CASE
WHEN "stock_quantity" > 100 THEN 'High Stock'
WHEN "stock_quantity" > 50 THEN 'Medium Stock'
WHEN "stock_quantity" > 0 THEN 'Low Stock'
ELSE 'Out of Stock'
END as "stock_status",
"reorder_level",
"is_active",
"sku",
"barcode",
"launch_date",
"created_at",
"updated_at"
FROM "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db"."de_products"
WHERE "product_id" IS NOT NULL
AND "product_name" IS NOT NULL;
Creates a dynamic Iceberg table for cleaned and enriched customer data.
CREATE OR REPLACE DYNAMIC ICEBERG TABLE silver_analytics_db.public."de_customers_cleaned"
TARGET_LAG = '1 minute'
WAREHOUSE = HOL_ICE_WH
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'de_external_volume'
BASE_LOCATION = 'de_customers_cleaned'
AS
SELECT
"customer_id",
"customer_uuid",
TRIM(UPPER("first_name")) as "first_name",
TRIM(UPPER("last_name")) as "last_name",
LOWER(TRIM("email")) as "email",
REGEXP_REPLACE("phone", '[^0-9]', '') as "phone_clean",
"phone",
"date_of_birth",
CASE
WHEN "gender" IN ('M', 'Male', 'MALE', 'm') THEN 'M'
WHEN "gender" IN ('F', 'Female', 'FEMALE', 'f') THEN 'F'
ELSE 'O'
END as "gender_standardized",
"gender",
TRIM("address_line1") as "address_line1",
TRIM("address_line2") as "address_line2",
TRIM(UPPER("city")) as "city",
TRIM(UPPER("state")) as "state",
REGEXP_REPLACE("postal_code", '[^0-9]', '') as "postal_code_clean",
"postal_code",
UPPER("country") as "country",
"customer_segment",
"registration_date",
"last_login_date",
COALESCE("total_orders", 0) as "total_orders",
COALESCE("total_spent", 0) as "total_spent",
"is_active",
YEAR("updated_at") - YEAR("date_of_birth") as "age_approx",
DATEDIFF('day', "registration_date", "updated_at") as "days_since_registration_approx",
DATEDIFF('day', "last_login_date", "updated_at") as "days_since_last_login_approx",
CASE
WHEN "total_spent" >= 1000 THEN 'High Value'
WHEN "total_spent" >= 500 THEN 'Medium Value'
ELSE 'Low Value'
END as "value_tier",
"created_at",
"updated_at"
FROM "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db"."de_customers"
WHERE "email" IS NOT NULL
AND "email" LIKE '%@%'
AND "customer_id" IS NOT NULL;
Orders and Order Items Processing. Creates dynamic Iceberg tables for cleaned orders and enriched order items.
CREATE OR REPLACE DYNAMIC ICEBERG TABLE silver_analytics_db.public."de_orders_cleaned"
TARGET_LAG = '1 minute'
WAREHOUSE = HOL_ICE_WH
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'de_external_volume'
BASE_LOCATION = 'de_orders_cleaned'
AS
SELECT
"order_id",
"order_uuid",
"customer_id",
"order_date",
"order_status",
CASE
WHEN "order_status" IN ('delivered', 'completed', 'shipped') THEN 'Fulfilled'
WHEN "order_status" IN ('pending', 'processing', 'confirmed') THEN 'In Progress'
WHEN "order_status" IN ('cancelled', 'refunded') THEN 'Cancelled'
ELSE 'Other'
END as "order_status_category",
"total_amount",
CASE
WHEN "total_amount" >= 500 THEN 'High Value'
WHEN "total_amount" >= 200 THEN 'Medium Value'
WHEN "total_amount" >= 50 THEN 'Low Value'
ELSE 'Minimal Value'
END as "order_value_tier",
"discount_amount",
"tax_amount",
"shipping_cost" as "shipping_amount",
"payment_method",
"payment_status",
"shipping_address",
"billing_address",
"shipping_method",
"currency",
"subtotal",
"delivery_date" as "estimated_delivery_date",
"shipping_date" as "actual_shipping_date",
CASE
WHEN "delivery_date" IS NOT NULL AND "shipping_date" IS NOT NULL
THEN DATEDIFF('day', "shipping_date", "delivery_date")
ELSE NULL
END as "estimated_delivery_days",
CASE
WHEN "delivery_date" IS NOT NULL AND "shipping_date" IS NOT NULL
THEN CASE
WHEN DATEDIFF('day', "shipping_date", "delivery_date") <= 3 THEN 'Fast Delivery'
WHEN DATEDIFF('day', "shipping_date", "delivery_date") <= 7 THEN 'Standard Delivery'
ELSE 'Slow Delivery'
END
ELSE 'Unknown'
END as "delivery_speed_category",
"notes",
"created_at",
"updated_at",
EXTRACT(YEAR FROM "order_date") as "order_year",
EXTRACT(MONTH FROM "order_date") as "order_month",
EXTRACT(QUARTER FROM "order_date") as "order_quarter",
DAYNAME("order_date") as "order_day_of_week"
FROM "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db"."de_orders"
WHERE "order_id" IS NOT NULL
AND "customer_id" IS NOT NULL
AND "order_date" IS NOT NULL;
-- Silver Table 3: ORDER ITEMS ENRICHED
CREATE OR REPLACE DYNAMIC ICEBERG TABLE silver_analytics_db.public."de_order_items_enriched"
TARGET_LAG = '1 minute'
WAREHOUSE = HOL_ICE_WH
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'de_external_volume'
BASE_LOCATION = 'de_order_items_enriched'
REFRESH_MODE = INCREMENTAL
AS
SELECT
"order_item_id",
"order_id",
"product_id",
"quantity",
"unit_price",
"line_total",
"total_price",
"discount_percent",
"tax_rate",
("unit_price" * "quantity") as "gross_amount",
CASE
WHEN "discount_percent" > 0
THEN ("unit_price" * "quantity" * "discount_percent" / 100)
ELSE 0
END as "discount_amount",
"created_at",
"updated_at"
FROM "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db"."de_order_items"
WHERE "order_item_id" IS NOT NULL
AND "order_id" IS NOT NULL
AND "product_id" IS NOT NULL;
Customer Order Summary. Creates a summary table by joining Bronze-layer customer and order data, calculating the sequential order number for each customer.
-- Silver Table 4: Customer Order Summary
CREATE OR REPLACE DYNAMIC ICEBERG TABLE silver_analytics_db.public."de_customer_order_summary"
TARGET_LAG = '1 minute'
WAREHOUSE = HOL_ICE_WH
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'de_external_volume'
BASE_LOCATION = 'de_customer_order_summary'
AS
SELECT
c."customer_id",
c."customer_uuid",
c."first_name",
c."last_name",
c."email",
c."customer_segment",
c."registration_date",
c."total_orders",
c."total_spent",
c."is_active",
c."last_login_date",
-- Order aggregations
o."order_id",
o."order_date",
o."order_status",
o."total_amount",
o."payment_method",
o."payment_status",
o."currency",
-- Customer demographics
YEAR(o."order_date") - YEAR(c."date_of_birth") as "customer_age_at_order",
c."gender" as "customer_gender",
c."city",
c."state",
c."country",
c."address_line1",
c."address_line2",
c."postal_code",
c."phone",
DATEDIFF('day', c."registration_date", o."order_date") as "days_since_registration_to_order",
ROW_NUMBER() OVER (PARTITION BY c."customer_id" ORDER BY o."order_date") as "order_sequence",
c."created_at" as "customer_created_at",
o."created_at" as "order_created_at"
FROM "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db"."de_customers" c
INNER JOIN "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db"."de_orders" o
ON c."customer_id" = o."customer_id"
WHERE c."customer_id" IS NOT NULL
AND o."order_id" IS NOT NULL;
The Gold layer creates final, denormalized tables optimized for analytical reporting.
CREATE DATABASE IF NOT EXISTS gold_analytics_db;
USE DATABASE gold_analytics_db;
Order Summary.A pass-through of cleaned orders with data type standardization.
CREATE OR REPLACE DYNAMIC ICEBERG TABLE gold_analytics_db.public."order_summary"
TARGET_LAG = '1 minute'
WAREHOUSE = HOL_ICE_WH
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'de_external_volume'
BASE_LOCATION = 'order_summary'
REFRESH_MODE = INCREMENTAL
AS
SELECT
"order_id",
"order_uuid",
"customer_id",
"order_date",
"order_status",
"order_status_category",
CAST("total_amount" AS DECIMAL(18,2)) as "total_amount",
"order_value_tier",
CAST("discount_amount" AS DECIMAL(18,2)) as "discount_amount",
CAST("tax_amount" AS DECIMAL(18,2)) as "tax_amount",
CAST("shipping_amount" AS DECIMAL(18,2)) as "shipping_amount",
"payment_method",
"payment_status",
"currency",
"order_year",
"order_month",
"order_quarter",
"order_day_of_week",
"created_at",
"updated_at"
FROM silver_analytics_db.public."de_orders_cleaned"
WHERE "order_id" IS NOT NULL
AND "customer_id" IS NOT NULL
AND "order_date" IS NOT NULL;
Sales Summary Trends (Denormalized Fact Table). The primary analytical table, which joins all cleaned Silver entities (Orders, Items, Customers, Products, and Customer Order Summary) into a single, comprehensive view.
CREATE OR REPLACE DYNAMIC ICEBERG TABLE gold_analytics_db.public."sales_summary_trends"
TARGET_LAG = '1 minute'
WAREHOUSE = HOL_ICE_WH
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'de_external_volume'
BASE_LOCATION = 'sales_summary_trends'
REFRESH_MODE = INCREMENTAL
AS
SELECT
-- Order information
o."order_id",
o."customer_id",
o."order_date",
o."order_status",
o."order_status_category",
CAST(o."total_amount" AS DECIMAL(18,2)) as "order_total_amount",
CAST(o."discount_amount" AS DECIMAL(18,2)) as "order_discount_amount",
o."payment_method",
o."payment_status",
o."currency",
o."order_year",
o."order_month",
o."order_quarter",
o."order_day_of_week",
-- Order item information
oi."order_item_id",
oi."product_id",
oi."quantity",
CAST(oi."unit_price" AS DECIMAL(18,2)) as "unit_price",
CAST(oi."total_price" AS DECIMAL(18,2)) as "item_total_price",
CAST(oi."gross_amount" AS DECIMAL(18,2)) as "gross_amount",
CAST(oi."discount_amount" AS DECIMAL(18,2)) as "item_discount_amount",
-- Customer information
c."customer_uuid",
c."customer_segment",
c."gender_standardized" as "customer_gender",
c."city" as "customer_city",
c."state" as "customer_state",
c."country" as "customer_country",
c."value_tier" as "customer_value_tier",
CAST(c."total_spent" AS DECIMAL(18,2)) as "customer_total_spent",
c."total_orders" as "customer_total_orders",
-- Product information
p."product_name_clean" as "product_name",
p."category_standardized" as "product_category",
p."subcategory_standardized" as "product_subcategory",
p."brand_standardized" as "product_brand",
CAST(p."price" AS DECIMAL(18,2)) as "product_price",
CAST(p."cost" AS DECIMAL(18,2)) as "product_cost",
p."margin_tier" as "product_margin_tier",
p."stock_status" as "product_stock_status",
-- Simple calculated fields (no aggregations)
CAST((oi."total_price" - (oi."quantity" * COALESCE(p."cost", 0))) AS DECIMAL(18,2)) as "item_profit",
-- Customer order sequence
cos."order_sequence",
CASE
WHEN cos."order_sequence" = 1 THEN 'New Customer Order'
ELSE 'Repeat Customer Order'
END as "customer_order_type",
-- Date dimensions
CAST(DATE_TRUNC('month', o."order_date") AS DATE) as "order_month_date",
CAST(DATE_TRUNC('week', o."order_date") AS DATE) as "order_week_date",
CAST(DATE_TRUNC('day', o."order_date") AS DATE) as "order_day_date",
-- Status flags
CASE WHEN o."order_status_category" = 'Fulfilled' THEN 1 ELSE 0 END as "is_fulfilled",
CASE WHEN o."order_status_category" = 'Cancelled' THEN 1 ELSE 0 END as "is_cancelled",
CASE WHEN o."payment_method" = 'credit_card' THEN 1 ELSE 0 END as "is_credit_card",
CASE WHEN cos."order_sequence" = 1 THEN 1 ELSE 0 END as "is_new_customer",
CASE WHEN c."value_tier" = 'High Value' THEN 1 ELSE 0 END as "is_high_value_customer",
-- Timestamps
o."created_at" as "order_created_at",
oi."created_at" as "item_created_at",
c."created_at" as "customer_created_at",
p."created_at" as "product_created_at"
FROM silver_analytics_db.public."de_orders_cleaned" o
INNER JOIN silver_analytics_db.public."de_order_items_enriched" oi
ON o."order_id" = oi."order_id"
LEFT JOIN silver_analytics_db.public."de_customers_cleaned" c
ON o."customer_id" = c."customer_id"
LEFT JOIN silver_analytics_db.public."de_products_cleaned" p
ON oi."product_id" = p."product_id"
LEFT JOIN silver_analytics_db.public."de_customer_order_summary" cos
ON o."order_id" = cos."order_id"
WHERE o."order_date" IS NOT NULL
AND oi."order_item_id" IS NOT NULL;
Gold Layer Verification
USE DATABASE GOLD_ANALYTICS_DB;
SHOW DYNAMIC TABLES;
This section illustrates how a write to the Bronze layer in AWS automatically refreshes up the Silver and Gold layers via Dynamic Tables in Snowflake.
Note: In a real scenario, an external process (like an ETL job or CDC tool or a streaming job) would write these records to S3, and the Glue Catalog would be updated. For demonstration, we simulate an INSERT into the AWS Glue managed Iceberg tables.
-- Insert 5 rows for de_orders table
INSERT INTO "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db"."de_orders"
("billing_address", "created_at", "currency", "customer_id", "delivery_date", "discount_amount", "notes", "order_date", "order_id", "order_status", "order_uuid", "payment_method", "payment_status", "shipping_address", "shipping_cost", "shipping_date", "shipping_method", "subtotal", "tax_amount", "total_amount", "updated_at")
VALUES
('123 Main St, City, State 12345', TO_TIMESTAMP_LTZ('2024-12-20 09:00:00'), 'USD', 5001, TO_TIMESTAMP_LTZ('2024-12-25 14:00:00'), 0.00, 'Standard delivery', TO_TIMESTAMP_LTZ('2024-12-20 09:00:00'), 1500048, 'confirmed', 'ORD-2024-1500048', 'credit_card', 'paid', '123 Main St, City, State 12345', 15.99, TO_TIMESTAMP_LTZ('2024-12-22 10:00:00'), 'standard', 125.99, 10.08, 151.06, TO_TIMESTAMP_LTZ('2024-12-20 09:00:00')),
('456 Oak Ave, Town, State 23456', TO_TIMESTAMP_LTZ('2024-12-20 10:30:00'), 'USD', 5002, TO_TIMESTAMP_LTZ('2024-12-25 15:30:00'), 25.00, 'Express delivery requested', TO_TIMESTAMP_LTZ('2024-12-20 10:30:00'), 1500049, 'confirmed', 'ORD-2024-1500049', 'paypal', 'paid', '456 Oak Ave, Town, State 23456', 29.99, TO_TIMESTAMP_LTZ('2024-12-21 11:00:00'), 'express', 249.99, 20.00, 274.98, TO_TIMESTAMP_LTZ('2024-12-20 10:30:00')),
('789 Pine Rd, Village, State 34567', TO_TIMESTAMP_LTZ('2024-12-20 11:15:00'), 'USD', 5003, TO_TIMESTAMP_LTZ('2024-12-26 16:15:00'), 0.00, 'Gift wrapping included', TO_TIMESTAMP_LTZ('2024-12-20 11:15:00'), 1500050, 'processing', 'ORD-2024-1500050', 'credit_card', 'paid', '789 Pine Rd, Village, State 34567', 12.99, TO_TIMESTAMP_LTZ('2024-12-23 12:00:00'), 'standard', 89.99, 7.20, 109.18, TO_TIMESTAMP_LTZ('2024-12-20 11:15:00')),
('321 Elm Dr, County, State 45678', TO_TIMESTAMP_LTZ('2024-12-20 12:45:00'), 'USD', 5004, TO_TIMESTAMP_LTZ('2024-12-27 17:45:00'), 15.00, 'Holiday special discount', TO_TIMESTAMP_LTZ('2024-12-20 12:45:00'), 1500051, 'confirmed', 'ORD-2024-1500051', 'debit_card', 'paid', '321 Elm Dr, County, State 45678', 18.99, TO_TIMESTAMP_LTZ('2024-12-22 13:30:00'), 'standard', 199.99, 16.00, 219.98, TO_TIMESTAMP_LTZ('2024-12-20 12:45:00')),
('654 Maple Ln, District, State 56789', TO_TIMESTAMP_LTZ('2024-12-20 14:20:00'), 'USD', 5005, TO_TIMESTAMP_LTZ('2024-12-28 19:20:00'), 0.00, 'Signature required', TO_TIMESTAMP_LTZ('2024-12-20 14:20:00'), 1500052, 'shipped', 'ORD-2024-1500052', 'credit_card', 'paid', '654 Maple Ln, District, State 56789', 22.99, TO_TIMESTAMP_LTZ('2024-12-21 15:00:00'), 'express', 349.99, 28.00, 400.98, TO_TIMESTAMP_LTZ('2024-12-20 14:20:00'));
-- Insert 5 rows for de_order_items table--
INSERT INTO "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db"."de_order_items"
("created_at", "discount_percent", "line_total", "order_id", "order_item_id", "product_id", "quantity", "tax_rate", "total_price", "unit_price", "updated_at")
VALUES
(TO_TIMESTAMP_LTZ('2024-12-20 09:00:00'), 0, 125.99, 1500048, 2970625, 3051, 1, 0.08, 125.99, 125.99, TO_TIMESTAMP_LTZ('2024-12-20 09:00:00')),
(TO_TIMESTAMP_LTZ('2024-12-20 10:30:00'), 10, 249.99, 1500049, 2970626, 3052, 1, 0.08, 224.99, 249.99, TO_TIMESTAMP_LTZ('2024-12-20 10:30:00')),
(TO_TIMESTAMP_LTZ('2024-12-20 10:30:00'), 0, 25.00, 1500049, 2970627, 3053, 1, 0.08, 25.00, 25.00, TO_TIMESTAMP_LTZ('2024-12-20 10:30:00')),
(TO_TIMESTAMP_LTZ('2024-12-20 11:15:00'), 0, 89.99, 1500050, 2970628, 3054, 1, 0.08, 89.99, 89.99, TO_TIMESTAMP_LTZ('2024-12-20 11:15:00')),
(TO_TIMESTAMP_LTZ('2024-12-20 12:45:00'), 10, 199.99, 1500051, 2970629, 3055, 1, 0.08, 179.99, 199.99, TO_TIMESTAMP_LTZ('2024-12-20 12:45:00'));
After inserting the new data into the Bronze layer, wait for the Dynamic Tables to automatically refresh (configured with a TARGET_LAG of ‘1 minute'). Then, check the updated row counts in the downstream layers.
--- Check the updated counts in the SILVER layer based on dynamic iceberg table refresh
select count(*) from silver_analytics_db.public."de_order_items_enriched";
select count(*) from silver_analytics_db.public."de_orders_cleaned";
--Gold Layer - Check the updated counts in the GOLD layer based on dynamic iceberg table refresh
select count(*) from gold_analytics_db.public."sales_summary_trends";
This implementation successfully demonstrates a multi-layered lakehouse architecture using Snowflake Dynamic Iceberg Tables integrated with the AWS Glue Catalog.
The key takeaways from this exercise include:
Decoupled Metadata and Data: The Bronze Layer successfully connects Snowflake to the external Iceberg data in S3 using the Snowflake Catalog Integration and catalog linked Databases, keeping the data lake open and accessible by other engines. Automated Data Pipeline: The Silver and Gold layers are powered by Snowflake Dynamic Iceberg Tables. By setting a TARGET_LAG of ‘1 minute' and sourcing data from the linked Bronze tables, the entire pipeline automatically processes, cleans, and transforms data as soon as new records land in the data lake. Incremental Efficiency: The REFRESH_MODE = INCREMENTAL setting on key Silver and Gold tables (de_order_items_enriched, order_summary, sales_summary_trends) ensures that only the new or modified data is processed, leading to a highly efficient and low-latency data flow. Full CRUD/ACID Capabilities: The ability to INSERT data directly into the Glue-managed Iceberg tables from Snowflake (as demonstrated in the incremental test ) confirms the transactional capabilities necessary for a true lakehouse architecture.
By leveraging Snowflake Dynamic Iceberg Tables, you have built a near real-time data platform that provides low-latency analytics while maintaining the flexibility and scalability of an open Iceberg data lake.
So far, you learned to work with: