This guide demonstrates how to build a declarative incremental pipeline and establish a multi-layered lakehouse architecture in Snowflake using Dynamic Iceberg Tables that read from and write back to an external AWS Glue Catalog-managed Iceberg data lake.

What You Will Learn

By the end of this guide, you will learn to work with:

What You'll Build

You will build a three-tiered lakehouse architecture in Snowflake: Bronze Layer: Connects Snowflake to the AWS Glue Managed Iceberg tables using Snowflake Catalog Integration and Catalog Linked Database. Silver Layer: Creates dynamic Iceberg tables that apply cleaning and standardization to the Bronze data, with results stored as Snowflake Managed Iceberg tables back in S3. Gold Layer: Creates final, denormalized, and aggregated Snowflake dynamic Iceberg tables optimized for analytical reporting by joining the Silver-layer entities.

Prerequisites

Creating Glue managed Iceberg tables in the AWS Glue Catalog is a key step to setting up the Bronze layer for the Snowflake Iceberg integration. You can create the Glue managed Iceberg tables using the AWS Management Console (via AWS Glue or Lake Formation) or using SQL via Amazon Athena. For now, we will use Athena to define the schema and table properties for Iceberg.

Prerequisites in AWS

Before creating the tables, ensure these components are ready:

AWS Glue Database: You must have the bronze_analytics_db created in your AWS Glue Catalog. S3 Location: An S3 path (e.g., s3:///glue-iceberg/) where the Iceberg metadata and data files will reside. This location must correspond to the STORAGE_BASE_URL defined in your Snowflake EXTERNAL VOLUME. Permissions: You need the appropriate AWS IAM permissions to create tables in the Glue Catalog and read/write to the S3 location (often configured via AWS Lake Formation if it is enabled).

Create Iceberg Tables using Amazon Athena

Amazon Athena provides a simple SQL interface to create and manage Glue-managed Iceberg tables.

de_orders Table Creation. This table contains the core transactional order information.

CREATE TABLE bronze_analytics_db.de_orders (
    billing_address STRING,
    created_at TIMESTAMP,
    currency STRING,
    customer_id BIGINT,
    delivery_date TIMESTAMP,
    discount_amount DOUBLE,
    notes STRING,
    order_date TIMESTAMP,
    order_id BIGINT,
    order_status STRING,
    order_uuid STRING,
    payment_method STRING,
    payment_status STRING,
    shipping_address STRING,
    shipping_cost DOUBLE,
    shipping_date TIMESTAMP,
    shipping_method STRING,
    subtotal DOUBLE,
    tax_amount DOUBLE,
    total_amount DOUBLE,
    updated_at TIMESTAMP)
LOCATION 's3://<s3_bucket>/glue-iceberg/de_orders'
TBLPROPERTIES (
  'table_type'='iceberg',
  'write_compression'='snappy'
);

de_order_items Table Creation. This table holds the line-item details for each order.

CREATE TABLE IF NOT EXISTS bronze_analytics_db.de_order_items (
    created_at TIMESTAMP,
    discount_percent BIGINT,
    line_total DOUBLE,
    order_id BIGINT,
    order_item_id BIGINT,
    product_id BIGINT,
    quantity BIGINT,
    tax_rate DOUBLE,
    total_price DOUBLE,
    unit_price DOUBLE,
    updated_at TIMESTAMP
)
LOCATION 's3://<s3_bucket>/glue-iceberg/de_order_items'
TBLPROPERTIES (
  'table_type'='iceberg',
  'write_compression'='snappy'
);

de_products Table Creation. This table stores the product dimension data.

CREATE TABLE bronze_analytics_db.de_products (
  barcode string,
  brand string,
  category string,
  color string,
  cost_price double,
  created_at timestamp,
  description string,
  dimensions_cm string,
  is_active boolean,
  launch_date timestamp,
  material string,
  product_id bigint,
  product_name string,
  product_uuid string,
  reorder_level bigint,
  size string,
  sku string,
  stock_quantity bigint,
  subcategory string,
  supplier_id bigint,
  unit_price double,
  updated_at timestamp,
  weight_kg double)
LOCATION 's3://<s3_bucket>/glue-iceberg/de_products'
TBLPROPERTIES (
  'table_type'='iceberg',
  'write_compression'='snappy'
);

de_customers Table Creation. This table stores the customer dimension data.

CREATE TABLE bronze_analytics_db.de_customers (
  address_line1 string,
  address_line2 string,
  city string,
  country string,
  created_at timestamp,
  customer_id bigint,
  customer_segment string,
  customer_uuid string,
  date_of_birth timestamp,
  email string,
  first_name string,
  gender string,
  is_active boolean,
  last_login_date timestamp,
  last_name string,
  phone string,
  postal_code string,
  registration_date timestamp,
  state string,
  total_orders bigint,
  total_spent double,
  updated_at timestamp)
LOCATION 's3://<s3_bucket>/glue-iceberg/de_customers'
TBLPROPERTIES (
  'table_type'='iceberg',
  'write_compression'='snappy'
);

Check if the tables are created in AWS Glue console

Note: I have also provided an AWS Glue job to populate the Glue managed Iceberg tables with test data. This is the AWS glue script which can be reused for populating the tables with mock data.

Download the iceberg_dt.py file from this git repository and run it from Glue.

Catalog Integration and Catalog Linked Database (CLD)

Snowflake Catalog Integration

This step creates a Snowflake Catalog Integration to connect to the AWS Glue Catalog, which acts as the metadata store for the Iceberg tables. A catalog integration is a named, account-level Snowflake object that stores information about how your table metadata is organized for the following scenarios:

CREATE or replace CATALOG INTEGRATION glue_catalog_integration
  CATALOG_SOURCE = ICEBERG_REST -- Connects to the Iceberg REST Catalog endpoint
  TABLE_FORMAT = ICEBERG
  CATALOG_NAMESPACE = 'bronze_analytics_db'
    REST_CONFIG = (
    CATALOG_URI = 'https://glue.us-east-1.amazonaws.com/iceberg'
    CATALOG_API_TYPE = AWS_GLUE
    CATALOG_NAME = '<Your AWS Account ID>'
  )
  REST_AUTHENTICATION = (
    TYPE = SIGV4
    SIGV4_IAM_ROLE = 'arn:aws:iam::<Your AWS Account ID>:role/<role_arn>'
    SIGV4_SIGNING_REGION = 'us-east-1'   
)
ENABLED = TRUE;

External Volume Creation

An External Volume is created to define the S3 location where the actual Iceberg data files reside and specifies the IAM role for Snowflake to read/write the data.

CREATE OR REPLACE EXTERNAL VOLUME de_external_volume
  STORAGE_LOCATIONS =
    (
      (
        NAME = 'de_external_vol'
        STORAGE_PROVIDER = 'S3'
        STORAGE_BASE_URL = 's3://<bucket_name>/glue-iceberg/' -- S3 path for Iceberg data files 
        STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::<account_id>:role/<role_arn>' -- IAM role for S3 read/write
        ENCRYPTION = ( TYPE = 'AWS_SSE_S3' )
      )
    )
  ALLOW_WRITES = TRUE;

Catalog Linked Database

A catalog-linked database is a Snowflake database connected to an external Iceberg REST catalog. Snowflake automatically syncs with the external catalog to detect namespaces and Iceberg tables, and registers the remote tables to the catalog-linked database. Catalog-linked databases also support creating and dropping schemas or Iceberg tables.

CREATE OR REPLACE DATABASE glue_catalog_linked_db
  LINKED_CATALOG = (
    CATALOG = 'glue_catalog_integration',
    ALLOWED_NAMESPACES = ('bronze_analytics_db') -- Limit scope to the Bronze database 
    NAMESPACE_MODE = FLATTEN_NESTED_NAMESPACE,
    NAMESPACE_FLATTEN_DELIMITER = '-',
    SYNC_INTERVAL_SECONDS = 60 -- Snowflake checks for metadata updates every 60 seconds
  ),
  EXTERNAL_VOLUME = 'de_external_volume';

Verification and Exploration

Verify the setup and explore the discovered Bronze tables.

SELECT SYSTEM$VERIFY_CATALOG_INTEGRATION('glue_catalog_integration');

-- List schemas/namespaces available
SELECT SYSTEM$LIST_NAMESPACES_FROM_CATALOG('glue_catalog_integration', '',0);

-- Check the status of the new linked database
SELECT SYSTEM$CATALOG_LINK_STATUS('glue_catalog_linked_db');

-- List the Iceberg tables discovered
SELECT SYSTEM$LIST_ICEBERG_TABLES_FROM_CATALOG('glue_catalog_integration');

-- Show the linked schemas/namespaces
show schemas in database glue_catalog_linked_db;
use schema "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db";
show iceberg tables;

Silver Layer Creation: Dynamic Iceberg Tables

The Silver layer applies cleaning, standardization, and enrichment, using Dynamic Iceberg Tables for automated refresh. The resulting tables are stored as Iceberg tables back in S3 via the External Volume.

CREATE DATABASE IF NOT EXISTS silver_analytics_db;
USE DATABASE silver_analytics_db;

Products Cleaning and Standardization

Creates a dynamic Iceberg table for cleaned and enhanced product data.

CREATE OR REPLACE DYNAMIC ICEBERG TABLE silver_analytics_db.public."de_products_cleaned"
  TARGET_LAG = '1 minute'
  WAREHOUSE = HOL_ICE_WH
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'de_external_volume'
  BASE_LOCATION = 'de_products_cleaned'
  AS
  SELECT 
    "product_id",
    "product_uuid",
    TRIM(UPPER("product_name")) as "product_name_clean",
    TRIM("description") as "product_description",
    UPPER("category") as "category_standardized",
    UPPER("subcategory") as "subcategory_standardized",
    UPPER("brand") as "brand_standardized",
    "unit_price" as "price",
    COALESCE("cost_price", 0) as "cost",
    ROUND("unit_price" - COALESCE("cost_price", 0), 2) as "profit_margin",
    CASE 
        WHEN "unit_price" - COALESCE("cost_price", 0) <= 0 THEN 'No Profit'
        WHEN ("unit_price" - COALESCE("cost_price", 0)) / NULLIF("unit_price", 0) >= 0.5 THEN 'High Margin'
        WHEN ("unit_price" - COALESCE("cost_price", 0)) / NULLIF("unit_price", 0) >= 0.3 THEN 'Medium Margin'
        ELSE 'Low Margin'
    END as "margin_tier",
    "weight_kg" as "weight",
    "dimensions_cm" as "dimensions",
    UPPER("color") as "color_standardized",
    UPPER("size") as "size_standardized",
    "material",
    "supplier_id",
    "stock_quantity",
    CASE 
        WHEN "stock_quantity" > 100 THEN 'High Stock'
        WHEN "stock_quantity" > 50 THEN 'Medium Stock'
        WHEN "stock_quantity" > 0 THEN 'Low Stock'
        ELSE 'Out of Stock'
    END as "stock_status",
    "reorder_level",
    "is_active",
    "sku",
    "barcode",
    "launch_date",
    "created_at",
    "updated_at"
FROM "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db"."de_products"
WHERE "product_id" IS NOT NULL 
  AND "product_name" IS NOT NULL;

Customers Cleaning and Standardization

Creates a dynamic Iceberg table for cleaned and enriched customer data.

CREATE OR REPLACE DYNAMIC ICEBERG TABLE silver_analytics_db.public."de_customers_cleaned"
  TARGET_LAG = '1 minute'
  WAREHOUSE = HOL_ICE_WH
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'de_external_volume'
  BASE_LOCATION = 'de_customers_cleaned'
  AS
  SELECT 
    "customer_id",
    "customer_uuid",
    TRIM(UPPER("first_name")) as "first_name",
    TRIM(UPPER("last_name")) as "last_name",
    LOWER(TRIM("email")) as "email",
    REGEXP_REPLACE("phone", '[^0-9]', '') as "phone_clean",
    "phone",
    "date_of_birth",
    CASE 
        WHEN "gender" IN ('M', 'Male', 'MALE', 'm') THEN 'M'
        WHEN "gender" IN ('F', 'Female', 'FEMALE', 'f') THEN 'F'
        ELSE 'O'
    END as "gender_standardized",
    "gender",
    TRIM("address_line1") as "address_line1",
    TRIM("address_line2") as "address_line2",
    TRIM(UPPER("city")) as "city",
    TRIM(UPPER("state")) as "state",
    REGEXP_REPLACE("postal_code", '[^0-9]', '') as "postal_code_clean",
    "postal_code",
    UPPER("country") as "country",
    "customer_segment",
    "registration_date",
    "last_login_date",
    COALESCE("total_orders", 0) as "total_orders",
    COALESCE("total_spent", 0) as "total_spent",
    "is_active",
    YEAR("updated_at") - YEAR("date_of_birth") as "age_approx",
    DATEDIFF('day', "registration_date", "updated_at") as "days_since_registration_approx",
    DATEDIFF('day', "last_login_date", "updated_at") as "days_since_last_login_approx",
    CASE 
        WHEN "total_spent" >= 1000 THEN 'High Value'
        WHEN "total_spent" >= 500 THEN 'Medium Value'
        ELSE 'Low Value'
    END as "value_tier",
    "created_at",
    "updated_at"
FROM "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db"."de_customers"
WHERE "email" IS NOT NULL 
  AND "email" LIKE '%@%'
  AND "customer_id" IS NOT NULL;

Orders and Order Items Processing. Creates dynamic Iceberg tables for cleaned orders and enriched order items.

CREATE OR REPLACE DYNAMIC ICEBERG TABLE silver_analytics_db.public."de_orders_cleaned"
  TARGET_LAG = '1 minute'
  WAREHOUSE = HOL_ICE_WH
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'de_external_volume'
  BASE_LOCATION = 'de_orders_cleaned'
  AS
  SELECT 
    "order_id",
    "order_uuid",
    "customer_id",
    "order_date",
    "order_status",
    CASE 
        WHEN "order_status" IN ('delivered', 'completed', 'shipped') THEN 'Fulfilled'
        WHEN "order_status" IN ('pending', 'processing', 'confirmed') THEN 'In Progress'
        WHEN "order_status" IN ('cancelled', 'refunded') THEN 'Cancelled'
        ELSE 'Other'
    END as "order_status_category",
    "total_amount",
    CASE 
        WHEN "total_amount" >= 500 THEN 'High Value'
        WHEN "total_amount" >= 200 THEN 'Medium Value'
        WHEN "total_amount" >= 50 THEN 'Low Value'
        ELSE 'Minimal Value'
    END as "order_value_tier",
    "discount_amount",
    "tax_amount",
    "shipping_cost" as "shipping_amount",
    "payment_method",
    "payment_status",
    "shipping_address",
    "billing_address",
    "shipping_method",
    "currency",
    "subtotal",
    "delivery_date" as "estimated_delivery_date",
    "shipping_date" as "actual_shipping_date",
    CASE 
        WHEN "delivery_date" IS NOT NULL AND "shipping_date" IS NOT NULL
        THEN DATEDIFF('day', "shipping_date", "delivery_date")
        ELSE NULL
    END as "estimated_delivery_days",
    CASE 
        WHEN "delivery_date" IS NOT NULL AND "shipping_date" IS NOT NULL
        THEN CASE 
            WHEN DATEDIFF('day', "shipping_date", "delivery_date") <= 3 THEN 'Fast Delivery'
            WHEN DATEDIFF('day', "shipping_date", "delivery_date") <= 7 THEN 'Standard Delivery'
            ELSE 'Slow Delivery'
        END
        ELSE 'Unknown'
    END as "delivery_speed_category",
    "notes",
    "created_at",
    "updated_at",
    EXTRACT(YEAR FROM "order_date") as "order_year",
    EXTRACT(MONTH FROM "order_date") as "order_month",
    EXTRACT(QUARTER FROM "order_date") as "order_quarter",
    DAYNAME("order_date") as "order_day_of_week"
FROM "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db"."de_orders"
WHERE "order_id" IS NOT NULL 
  AND "customer_id" IS NOT NULL
  AND "order_date" IS NOT NULL;

-- Silver Table 3: ORDER ITEMS ENRICHED
  
CREATE OR REPLACE DYNAMIC ICEBERG TABLE silver_analytics_db.public."de_order_items_enriched"
  TARGET_LAG = '1 minute'
  WAREHOUSE = HOL_ICE_WH
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'de_external_volume'
  BASE_LOCATION = 'de_order_items_enriched'
  REFRESH_MODE = INCREMENTAL
  AS
  SELECT 
    "order_item_id",
    "order_id",
    "product_id",
    "quantity",
    "unit_price",
    "line_total",
    "total_price",
    "discount_percent",
    "tax_rate",
    ("unit_price" * "quantity") as "gross_amount",
    CASE 
        WHEN "discount_percent" > 0 
        THEN ("unit_price" * "quantity" * "discount_percent" / 100)
        ELSE 0
    END as "discount_amount",
    "created_at",
    "updated_at"
FROM "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db"."de_order_items"
WHERE "order_item_id" IS NOT NULL
  AND "order_id" IS NOT NULL
  AND "product_id" IS NOT NULL;

Customer Order Summary. Creates a summary table by joining Bronze-layer customer and order data, calculating the sequential order number for each customer.

-- Silver Table 4: Customer Order Summary 
CREATE OR REPLACE DYNAMIC ICEBERG TABLE silver_analytics_db.public."de_customer_order_summary"
  TARGET_LAG = '1 minute'
  WAREHOUSE = HOL_ICE_WH
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'de_external_volume'
  BASE_LOCATION = 'de_customer_order_summary'
  AS
  SELECT 
    c."customer_id",
    c."customer_uuid",
    c."first_name",
    c."last_name",
    c."email",
    c."customer_segment",
    c."registration_date",
    c."total_orders",
    c."total_spent",
    c."is_active",
    c."last_login_date",
    -- Order aggregations
    o."order_id",
    o."order_date",
    o."order_status",
    o."total_amount",
    o."payment_method",
    o."payment_status",
    o."currency",
    -- Customer demographics  
    YEAR(o."order_date") - YEAR(c."date_of_birth") as "customer_age_at_order",
    c."gender" as "customer_gender",
    c."city",
    c."state",
    c."country",
    c."address_line1",
    c."address_line2",
    c."postal_code",
    c."phone",
    DATEDIFF('day', c."registration_date", o."order_date") as "days_since_registration_to_order",
    ROW_NUMBER() OVER (PARTITION BY c."customer_id" ORDER BY o."order_date") as "order_sequence",
    c."created_at" as "customer_created_at",
    o."created_at" as "order_created_at"
FROM "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db"."de_customers" c
INNER JOIN "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db"."de_orders" o 
    ON c."customer_id" = o."customer_id"
WHERE c."customer_id" IS NOT NULL 
  AND o."order_id" IS NOT NULL;

Gold Layer Creation: Analytical Aggregations

The Gold layer creates final, denormalized tables optimized for analytical reporting.

CREATE DATABASE IF NOT EXISTS gold_analytics_db;
USE DATABASE gold_analytics_db;

Order Summary.A pass-through of cleaned orders with data type standardization.

CREATE OR REPLACE DYNAMIC ICEBERG TABLE  gold_analytics_db.public."order_summary"
  TARGET_LAG = '1 minute'
  WAREHOUSE = HOL_ICE_WH
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'de_external_volume'
  BASE_LOCATION = 'order_summary'
  REFRESH_MODE = INCREMENTAL
  AS
  SELECT 
    "order_id",
    "order_uuid",
    "customer_id",
    "order_date",
    "order_status",
    "order_status_category",
    CAST("total_amount" AS DECIMAL(18,2)) as "total_amount",
    "order_value_tier",
    CAST("discount_amount" AS DECIMAL(18,2)) as "discount_amount",
    CAST("tax_amount" AS DECIMAL(18,2)) as "tax_amount",
    CAST("shipping_amount" AS DECIMAL(18,2)) as "shipping_amount",
    "payment_method",
    "payment_status",
    "currency",
    "order_year",
    "order_month",
    "order_quarter",
    "order_day_of_week",
    "created_at",
    "updated_at"
FROM silver_analytics_db.public."de_orders_cleaned"
WHERE "order_id" IS NOT NULL 
  AND "customer_id" IS NOT NULL
  AND "order_date" IS NOT NULL;

Sales Summary Trends (Denormalized Fact Table). The primary analytical table, which joins all cleaned Silver entities (Orders, Items, Customers, Products, and Customer Order Summary) into a single, comprehensive view.

CREATE OR REPLACE DYNAMIC ICEBERG TABLE gold_analytics_db.public."sales_summary_trends"
  TARGET_LAG = '1 minute'
  WAREHOUSE = HOL_ICE_WH
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'de_external_volume'
  BASE_LOCATION = 'sales_summary_trends'
  REFRESH_MODE = INCREMENTAL
  AS
  SELECT 
    -- Order information
    o."order_id",
    o."customer_id",
    o."order_date",
    o."order_status",
    o."order_status_category",
    CAST(o."total_amount" AS DECIMAL(18,2)) as "order_total_amount",
    CAST(o."discount_amount" AS DECIMAL(18,2)) as "order_discount_amount",
    o."payment_method",
    o."payment_status",
    o."currency",
    o."order_year",
    o."order_month",
    o."order_quarter",
    o."order_day_of_week",
    
    -- Order item information
    oi."order_item_id",
    oi."product_id",
    oi."quantity",
    CAST(oi."unit_price" AS DECIMAL(18,2)) as "unit_price",
    CAST(oi."total_price" AS DECIMAL(18,2)) as "item_total_price",
    CAST(oi."gross_amount" AS DECIMAL(18,2)) as "gross_amount",
    CAST(oi."discount_amount" AS DECIMAL(18,2)) as "item_discount_amount",
    
    -- Customer information
    c."customer_uuid",
    c."customer_segment",
    c."gender_standardized" as "customer_gender",
    c."city" as "customer_city",
    c."state" as "customer_state",
    c."country" as "customer_country",
    c."value_tier" as "customer_value_tier",
    CAST(c."total_spent" AS DECIMAL(18,2)) as "customer_total_spent",
    c."total_orders" as "customer_total_orders",
    
    -- Product information
    p."product_name_clean" as "product_name",
    p."category_standardized" as "product_category",
    p."subcategory_standardized" as "product_subcategory",
    p."brand_standardized" as "product_brand",
    CAST(p."price" AS DECIMAL(18,2)) as "product_price",
    CAST(p."cost" AS DECIMAL(18,2)) as "product_cost",
    p."margin_tier" as "product_margin_tier",
    p."stock_status" as "product_stock_status",
    
    -- Simple calculated fields (no aggregations)
    CAST((oi."total_price" - (oi."quantity" * COALESCE(p."cost", 0))) AS DECIMAL(18,2)) as "item_profit",
    
    -- Customer order sequence
    cos."order_sequence",
    CASE 
        WHEN cos."order_sequence" = 1 THEN 'New Customer Order'
        ELSE 'Repeat Customer Order'
    END as "customer_order_type",
    
    -- Date dimensions
    CAST(DATE_TRUNC('month', o."order_date") AS DATE) as "order_month_date",
    CAST(DATE_TRUNC('week', o."order_date") AS DATE) as "order_week_date",
    CAST(DATE_TRUNC('day', o."order_date") AS DATE) as "order_day_date",
    
    -- Status flags
    CASE WHEN o."order_status_category" = 'Fulfilled' THEN 1 ELSE 0 END as "is_fulfilled",
    CASE WHEN o."order_status_category" = 'Cancelled' THEN 1 ELSE 0 END as "is_cancelled",
    CASE WHEN o."payment_method" = 'credit_card' THEN 1 ELSE 0 END as "is_credit_card",
    CASE WHEN cos."order_sequence" = 1 THEN 1 ELSE 0 END as "is_new_customer",
    CASE WHEN c."value_tier" = 'High Value' THEN 1 ELSE 0 END as "is_high_value_customer",
    
    -- Timestamps
    o."created_at" as "order_created_at",
    oi."created_at" as "item_created_at",
    c."created_at" as "customer_created_at",
    p."created_at" as "product_created_at"
    
FROM silver_analytics_db.public."de_orders_cleaned" o
INNER JOIN silver_analytics_db.public."de_order_items_enriched" oi 
    ON o."order_id" = oi."order_id"
LEFT JOIN silver_analytics_db.public."de_customers_cleaned" c 
    ON o."customer_id" = c."customer_id"
LEFT JOIN silver_analytics_db.public."de_products_cleaned" p 
    ON oi."product_id" = p."product_id"
LEFT JOIN silver_analytics_db.public."de_customer_order_summary" cos 
    ON o."order_id" = cos."order_id"
WHERE o."order_date" IS NOT NULL
  AND oi."order_item_id" IS NOT NULL;

Gold Layer Verification

USE DATABASE GOLD_ANALYTICS_DB;
SHOW DYNAMIC TABLES;

Incremental Load Demonstration and Verification

This section illustrates how a write to the Bronze layer in AWS automatically refreshes up the Silver and Gold layers via Dynamic Tables in Snowflake.

Mimic Incremental Insert into Bronze

Note: In a real scenario, an external process (like an ETL job or CDC tool or a streaming job) would write these records to S3, and the Glue Catalog would be updated. For demonstration, we simulate an INSERT into the AWS Glue managed Iceberg tables.

-- Insert 5 rows for de_orders table

INSERT INTO "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db"."de_orders"
("billing_address", "created_at", "currency", "customer_id", "delivery_date", "discount_amount", "notes", "order_date", "order_id", "order_status", "order_uuid", "payment_method", "payment_status", "shipping_address", "shipping_cost", "shipping_date", "shipping_method", "subtotal", "tax_amount", "total_amount", "updated_at")
VALUES
('123 Main St, City, State 12345', TO_TIMESTAMP_LTZ('2024-12-20 09:00:00'), 'USD', 5001, TO_TIMESTAMP_LTZ('2024-12-25 14:00:00'), 0.00, 'Standard delivery', TO_TIMESTAMP_LTZ('2024-12-20 09:00:00'), 1500048, 'confirmed', 'ORD-2024-1500048', 'credit_card', 'paid', '123 Main St, City, State 12345', 15.99, TO_TIMESTAMP_LTZ('2024-12-22 10:00:00'), 'standard', 125.99, 10.08, 151.06, TO_TIMESTAMP_LTZ('2024-12-20 09:00:00')),
('456 Oak Ave, Town, State 23456', TO_TIMESTAMP_LTZ('2024-12-20 10:30:00'), 'USD', 5002, TO_TIMESTAMP_LTZ('2024-12-25 15:30:00'), 25.00, 'Express delivery requested', TO_TIMESTAMP_LTZ('2024-12-20 10:30:00'), 1500049, 'confirmed', 'ORD-2024-1500049', 'paypal', 'paid', '456 Oak Ave, Town, State 23456', 29.99, TO_TIMESTAMP_LTZ('2024-12-21 11:00:00'), 'express', 249.99, 20.00, 274.98, TO_TIMESTAMP_LTZ('2024-12-20 10:30:00')),
('789 Pine Rd, Village, State 34567', TO_TIMESTAMP_LTZ('2024-12-20 11:15:00'), 'USD', 5003, TO_TIMESTAMP_LTZ('2024-12-26 16:15:00'), 0.00, 'Gift wrapping included', TO_TIMESTAMP_LTZ('2024-12-20 11:15:00'), 1500050, 'processing', 'ORD-2024-1500050', 'credit_card', 'paid', '789 Pine Rd, Village, State 34567', 12.99, TO_TIMESTAMP_LTZ('2024-12-23 12:00:00'), 'standard', 89.99, 7.20, 109.18, TO_TIMESTAMP_LTZ('2024-12-20 11:15:00')),
('321 Elm Dr, County, State 45678', TO_TIMESTAMP_LTZ('2024-12-20 12:45:00'), 'USD', 5004, TO_TIMESTAMP_LTZ('2024-12-27 17:45:00'), 15.00, 'Holiday special discount', TO_TIMESTAMP_LTZ('2024-12-20 12:45:00'), 1500051, 'confirmed', 'ORD-2024-1500051', 'debit_card', 'paid', '321 Elm Dr, County, State 45678', 18.99, TO_TIMESTAMP_LTZ('2024-12-22 13:30:00'), 'standard', 199.99, 16.00, 219.98, TO_TIMESTAMP_LTZ('2024-12-20 12:45:00')),
('654 Maple Ln, District, State 56789', TO_TIMESTAMP_LTZ('2024-12-20 14:20:00'), 'USD', 5005, TO_TIMESTAMP_LTZ('2024-12-28 19:20:00'), 0.00, 'Signature required', TO_TIMESTAMP_LTZ('2024-12-20 14:20:00'), 1500052, 'shipped', 'ORD-2024-1500052', 'credit_card', 'paid', '654 Maple Ln, District, State 56789', 22.99, TO_TIMESTAMP_LTZ('2024-12-21 15:00:00'), 'express', 349.99, 28.00, 400.98, TO_TIMESTAMP_LTZ('2024-12-20 14:20:00')); 
-- Insert 5 rows for de_order_items table--

INSERT INTO "GLUE_CATALOG_LINKED_DB"."bronze_analytics_db"."de_order_items"
("created_at", "discount_percent", "line_total", "order_id", "order_item_id", "product_id", "quantity", "tax_rate", "total_price", "unit_price", "updated_at")
VALUES
(TO_TIMESTAMP_LTZ('2024-12-20 09:00:00'), 0, 125.99, 1500048, 2970625, 3051, 1, 0.08, 125.99, 125.99, TO_TIMESTAMP_LTZ('2024-12-20 09:00:00')),
(TO_TIMESTAMP_LTZ('2024-12-20 10:30:00'), 10, 249.99, 1500049, 2970626, 3052, 1, 0.08, 224.99, 249.99, TO_TIMESTAMP_LTZ('2024-12-20 10:30:00')),
(TO_TIMESTAMP_LTZ('2024-12-20 10:30:00'), 0, 25.00, 1500049, 2970627, 3053, 1, 0.08, 25.00, 25.00, TO_TIMESTAMP_LTZ('2024-12-20 10:30:00')),
(TO_TIMESTAMP_LTZ('2024-12-20 11:15:00'), 0, 89.99, 1500050, 2970628, 3054, 1, 0.08, 89.99, 89.99, TO_TIMESTAMP_LTZ('2024-12-20 11:15:00')),
(TO_TIMESTAMP_LTZ('2024-12-20 12:45:00'), 10, 199.99, 1500051, 2970629, 3055, 1, 0.08, 179.99, 199.99, TO_TIMESTAMP_LTZ('2024-12-20 12:45:00')); 

Updated Layer Counts (After Dynamic Table Refresh)

After inserting the new data into the Bronze layer, wait for the Dynamic Tables to automatically refresh (configured with a TARGET_LAG of ‘1 minute'). Then, check the updated row counts in the downstream layers.

--- Check the updated counts in the SILVER layer based on dynamic iceberg table refresh
select count(*) from silver_analytics_db.public."de_order_items_enriched";
select count(*) from silver_analytics_db.public."de_orders_cleaned";

--Gold Layer - Check the updated counts in the GOLD layer based on dynamic iceberg table refresh
select count(*) from gold_analytics_db.public."sales_summary_trends";

This implementation successfully demonstrates a multi-layered lakehouse architecture using Snowflake Dynamic Iceberg Tables integrated with the AWS Glue Catalog.

The key takeaways from this exercise include:

Decoupled Metadata and Data: The Bronze Layer successfully connects Snowflake to the external Iceberg data in S3 using the Snowflake Catalog Integration and catalog linked Databases, keeping the data lake open and accessible by other engines. Automated Data Pipeline: The Silver and Gold layers are powered by Snowflake Dynamic Iceberg Tables. By setting a TARGET_LAG of ‘1 minute' and sourcing data from the linked Bronze tables, the entire pipeline automatically processes, cleans, and transforms data as soon as new records land in the data lake. Incremental Efficiency: The REFRESH_MODE = INCREMENTAL setting on key Silver and Gold tables (de_order_items_enriched, order_summary, sales_summary_trends) ensures that only the new or modified data is processed, leading to a highly efficient and low-latency data flow. Full CRUD/ACID Capabilities: The ability to INSERT data directly into the Glue-managed Iceberg tables from Snowflake (as demonstrated in the incremental test ) confirms the transactional capabilities necessary for a true lakehouse architecture.

By leveraging Snowflake Dynamic Iceberg Tables, you have built a near real-time data platform that provides low-latency analytics while maintaining the flexibility and scalability of an open Iceberg data lake.

What You Learned

So far, you learned to work with:

Relevant Resources