AI-powered Business Intelligence (BI) and conversational analytics hold immense promise for data-driven decision-making. However, directly applying these technologies to complex enterprise schemas often leads to inaccurate or inconsistent results, commonly referred to as "hallucinations." This challenge arises because AI and BI systems may struggle to interpret raw data without a clear understanding of the organizational context and definitions.

Snowflake addresses these challenges by introducing Semantic Views, a new schema-level object that stores all semantic model information natively within the database. Semantic views capture and store semantic model information directly within the database, including business-relevant concepts such as metrics (e.g., total sales), dimensions (e.g., product category), and facts.

What You'll Learn

What You'll Build

You will build a foundational understanding and practical setup of a Snowflake semantic view, complete with data views and a defined semantic model, enabling simplified and consistent data querying for BI and AI applications using TPC-DS sample data.

What You'll Need

What is a Semantic View?

A semantic view acts as a translator between your raw data and how humans or AI interpret it. It defines key business concepts like "metrics" (e.g., total sales) and "dimensions" (e.g., customer location) by referencing your underlying tables and their relationships.

Why Semantic Views Matter

Semantic views are crucial for several reasons:

Basic Elements of a Semantic View Definition

Every semantic view definition requires essential elements:

Download the Notebook

Firstly, to follow along with this quickstart, you can click on getting-started-with-snowflake-semantic-view.ipynb to download the Notebook from GitHub.

Snowflake Notebooks come pre-installed with common Python libraries for data science and machine learning, such as numpy, pandas, matplotlib, and more! If you are looking to use other packages, click on the Packages dropdown on the top right to add additional packages to your notebook.

Setup your Database and Schema

First, we'll create a new database named SAMPLE_DATA and a schema named TPCDS_SF10TCL to organize our data. We will then set the context to use this new schema.

-- Create a new test database named SAMPLE_DATA
CREATE DATABASE SAMPLE_DATA;

-- Use the newly created database
USE DATABASE SAMPLE_DATA;

-- Create a new schema named TPCDS_SF10TCL within SAMPLE_DATA
CREATE SCHEMA TPCDS_SF10TCL;

-- Set the context to use the new schema
USE SCHEMA TPCDS_SF10TCL;

Create Views from Sample Data

Next, we'll create views for the tables we want to analyze. These views will be based on the SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL dataset, allowing us to work with a subset of the data without modifying the original tables.

-- Create or replace views for the tables from SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL

CREATE OR REPLACE VIEW CUSTOMER AS
SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.CUSTOMER;

CREATE OR REPLACE VIEW CUSTOMER_DEMOGRAPHICS AS
SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.CUSTOMER_DEMOGRAPHICS;

CREATE OR REPLACE VIEW DATE_DIM AS
SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.DATE_DIM;

CREATE OR REPLACE VIEW ITEM AS
SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.ITEM;

CREATE OR REPLACE VIEW STORE AS
SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.STORE;

CREATE OR REPLACE VIEW STORE_SALES AS
SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.STORE_SALES;

Verify your Environment Setup

Before proceeding, let's ensure our warehouse, database, and schema are correctly set, and then list the views we just created.

-- Select the warehouse, database, and schema
USE WAREHOUSE COMPUTE_WH;
USE DATABASE SAMPLE_DATA;
USE SCHEMA TPCDS_SF10TCL;

-- Show all views in the current schema to verify creation
SHOW VIEWS;

image

Define the Semantic View

We'll start by switching to the ACCOUNTADMIN role:

-- Switch to ACCOUNTADMIN role to create the semantic view
USE ROLE ACCOUNTADMIN;

Now, we'll define our TPCDS_SEMANTIC_VIEW_SM semantic view. This view will establish relationships between our tables, define facts (measures), and dimensions (attributes), making it easier to query and analyze our data without complex joins.

-- Create or replace the semantic view named TPCDS_SEMANTIC_VIEW_SM
CREATE OR REPLACE SEMANTIC VIEW TPCDS_SEMANTIC_VIEW_SM
	tables (
		CUSTOMER primary key (C_CUSTOMER_SK),
		DATE as DATE_DIM primary key (D_DATE_SK),
		DEMO as CUSTOMER_DEMOGRAPHICS primary key (CD_DEMO_SK),
		ITEM primary key (I_ITEM_SK),
		STORE primary key (S_STORE_SK),
		STORESALES as STORE_SALES
        primary key (SS_SOLD_DATE_SK,SS_CDEMO_SK,SS_ITEM_SK,SS_STORE_SK,SS_CUSTOMER_SK)
	)
	relationships (
		SALESTOCUSTOMER as STORESALES(SS_CUSTOMER_SK) references CUSTOMER(C_CUSTOMER_SK),
		SALESTODATE as STORESALES(SS_SOLD_DATE_SK) references DATE(D_DATE_SK),
		SALESTODEMO as STORESALES(SS_CDEMO_SK) references DEMO(CD_DEMO_SK),
		SALESTOITEM as STORESALES(SS_ITEM_SK) references ITEM(I_ITEM_SK),
		SALETOSTORE as STORESALES(SS_STORE_SK) references STORE(S_STORE_SK)
	)
	facts (
		ITEM.COST as i_wholesale_cost,
		ITEM.PRICE as i_current_price,
		STORE.TAX_RATE as S_TAX_PRECENTAGE
	)
	dimensions (
		CUSTOMER.BIRTHYEAR as C_BIRTH_YEAR,
		CUSTOMER.COUNTRY as C_BIRTH_COUNTRY,
		CUSTOMER.C_CUSTOMER_SK as c_customer_sk,
		DATE.DATE as D_DATE,
		DATE.D_DATE_SK as d_date_sk,
		DATE.MONTH as D_MOY,
		DATE.WEEK as D_WEEK_SEQ,
		DATE.YEAR as D_YEAR,
		DEMO.CD_DEMO_SK as cd_demo_sk,
		DEMO.CREDIT_RATING as CD_CREDIT_RATING,
		DEMO.MARITAL_STATUS as CD_MARITAL_STATUS,
		ITEM.BRAND as I_BRAND,
		ITEM.CATEGORY as I_CATEGORY,
		ITEM.CLASS as I_CLASS,
		ITEM.I_ITEM_SK as i_item_sk,
		STORE.MARKET as S_MARKET_ID,
		STORE.SQUAREFOOTAGE as S_FLOOR_SPACE,
		STORE.STATE as S_STATE,
		STORE.STORECOUNTRY as S_COUNTRY,
		STORE.S_STORE_SK as s_store_sk,
		STORESALES.SS_CDEMO_SK as ss_cdemo_sk,
		STORESALES.SS_CUSTOMER_SK as ss_customer_sk,
		STORESALES.SS_ITEM_SK as ss_item_sk,
		STORESALES.SS_SOLD_DATE_SK as ss_sold_date_sk,
		STORESALES.SS_STORE_SK as ss_store_sk
	)
	metrics (
		STORESALES.TOTALCOST as SUM(item.cost),
		STORESALES.TOTALSALESPRICE as SUM(SS_SALES_PRICE),
		STORESALES.TOTALSALESQUANTITY as SUM(SS_QUANTITY)
            WITH SYNONYMS = ( 'total sales quantity', 'total sales amount')
	)
;

Verify the Semantic View Creation

Let's confirm that our semantic view has been successfully created by listing all semantic views in the current database.

-- Lists semantic views in the database that is currently in use
SHOW SEMANTIC VIEWS;

image

Describe the Semantic View

To understand the structure and components of our newly created semantic view, we can use the DESC SEMANTIC VIEW command. This will provide details about its tables, relationships, facts, and dimensions.

-- Describes the semantic view named TPCDS_SEMANTIC_VIEW_SM, and as a special bonus uses our new flow operator to filter and project only the metric and dimension names

DESC SEMANTIC VIEW TPCDS_SEMANTIC_VIEW_SM
    ->> SELECT "object_kind","property_value" as "parent_object","object_name" FROM $1
        WHERE "object_kind" IN ('METRIC','DIMENSION') AND "property" IN ('TABLE')
;

image

Natural Language Querying

Snowflake's Cortex Analyst allows you to interact with your semantic views using natural language. This powerful feature transforms how users can explore data without needing to know SQL syntax.

Access Cortex Analyst

You can access Cortex Analyst through the Snowflake interface.

Let's dynamically generate a link to Cortex Analyst so that you can access the semantic view.

First, we'll create a SQL cell named sql_step7:

SELECT 'https://app.snowflake.com/' || CURRENT_ORGANIZATION_NAME() || '/' || CURRENT_ACCOUNT_NAME() || '/#/studio/analyst/databases/SAMPLE_DATA/schemas/TPCDS_SF10TCL/semanticView/TPCDS_SEMANTIC_VIEW_SM/edit' AS RESULT;

Next, we'll call it from the following Python code cell that we called py_link:

import streamlit as st

link = sql_step7.to_pandas()['RESULT'].iloc[0]

st.link_button("Go to Cortex Analyst", link)

image

Example Natural Language Query

Once in Cortex Analyst, you can ask questions like:

Show me the top selling brands in by total sales quantity in the state ‘TX' in the ‘Books' category in the year 2003

Cortex Analyst will automatically translate this natural language question into the appropriate Semantic SQL query and return the results.

image

Basic Query Structure

To query a semantic view, you'll use this special query structure:

SELECT * FROM SEMANTIC_VIEW (view_name
    METRICS metric1, metric2, ...
    DIMENSIONS dimension1, dimension2, ...
);

Let's break down each part:

Add Filters and Sorting

You can enhance your queries with standard SQL clauses:

Example Query: Top Selling Brands

Now that our semantic view is defined, we can easily query it to retrieve aggregated data. The following query demonstrates how to find the top-selling brands in a specific state and category for a given year and month:

-- Query the semantic view to find top selling brands
SELECT * FROM SEMANTIC_VIEW
(
 TPCDS_SEMANTIC_VIEW_SM
    DIMENSIONS
            Item.Brand,
            Item.Category,
            Date.Year,
            Date.Month,
            Store.State
    METRICS
        StoreSales.TotalSalesQuantity
)
WHERE Year = '2002' AND Month = '12' AND State ='TX' AND Category = 'Books'
ORDER BY TotalSalesQuantity DESC LIMIT 10;

image

In this step, we'll build 2 simple interactive data apps:

  1. Interactive data visualization app
  2. Simple interactive dashboard

Query data

Firstly, we'll modify the SQL query to show data for month 12.

-- Query the semantic view for month 12
SELECT * FROM SEMANTIC_VIEW
( 
 TPCDS_SEMANTIC_VIEW_SM
    DIMENSIONS 
            Item.Brand,
            Item.Category,            
            Date.Year,
            Date.Month,
            Store.State
    METRICS 
        StoreSales.TotalSalesQuantity
    WHERE
        Date.Year = '2002' AND Date.Month = '12' AND Item.Category = 'Books'
) 
ORDER BY TotalSalesQuantity DESC;

Next, in a Python cell named df, we'll convert the SQL table to a Pandas DataFrame:

cell1.to_pandas()

image

App 1: Interactive data visualization app

Here the user can interactively explore the sales data:

import streamlit as st
import pandas as pd

st.title("📊 Sales Data Interactive Visualization")

# Create selectbox for grouping option
group_by = st.selectbox(
    "Select grouping option:",
    options=['BRAND', 'STATE'],
    index=0
)

# Group the data based on selection
if group_by == 'BRAND':
    grouped_data = df.groupby('BRAND')['TOTALSALESQUANTITY'].sum().reset_index()
    grouped_data = grouped_data.set_index('BRAND')
    chart_title = "Total Sales Quantity by Brand"
else:  # group_by == 'STATE'
    grouped_data = df.groupby('STATE')['TOTALSALESQUANTITY'].sum().reset_index()
    grouped_data = grouped_data.set_index('STATE')
    chart_title = "Total Sales Quantity by State"

# Display the chart
st.subheader(chart_title)
st.bar_chart(grouped_data['TOTALSALESQUANTITY'])

# Optional: Display the data table
if st.checkbox("Show data table"):
    st.subheader("Grouped Data")
    st.dataframe(grouped_data)

image

App 2: Simple interactive dashboard

Here's a simple dashboard we're we've included a row of metrics:

import streamlit as st
import pandas as pd

st.title("📊 Sales Data Dashboard")

# Create selectbox for grouping option
group_by = st.selectbox(
    "Select grouping option:",
    options=['BRAND', 'STATE'],
    index=0
)

# Group the data based on selection
if group_by == 'BRAND':
    grouped_data = df.groupby('BRAND')['TOTALSALESQUANTITY'].sum().reset_index()
    grouped_data = grouped_data.set_index('BRAND')
    chart_title = "Total Sales Quantity by Brand"
else:  # group_by == 'STATE'
    grouped_data = df.groupby('STATE')['TOTALSALESQUANTITY'].sum().reset_index()
    grouped_data = grouped_data.set_index('STATE')
    chart_title = "Total Sales Quantity by State"

# Calculate KPIs based on current grouping
total_sales = df['TOTALSALESQUANTITY'].sum()
avg_sales = df['TOTALSALESQUANTITY'].mean()
top_performer = grouped_data['TOTALSALESQUANTITY'].max()
top_performer_name = grouped_data['TOTALSALESQUANTITY'].idxmax()

# Display KPI metrics in 3 columns
col1, col2, col3 = st.columns(3)

with col1:
    st.metric(
        label="Total Sales Quantity", 
        value=f"{total_sales:,.0f}",
        delta=None
    )

with col2:
    if group_by == 'BRAND':
        st.metric(
            label="Average Sales per Brand", 
            value=f"{avg_sales:,.0f}",
            delta=f"{((avg_sales/total_sales)*100):.3f}% of total"
        )
    else:
        st.metric(
            label="Average Sales per State", 
            value=f"{avg_sales:,.0f}",
            delta=f"{len(df['STATE'].unique())} state(s)"
        )

with col3:
    st.metric(
        label=f"Top {group_by.title()}", 
        value=f"{top_performer:,.0f}",
        delta=f"{top_performer_name}"
    )

# Display the chart
st.subheader(chart_title)
st.bar_chart(grouped_data['TOTALSALESQUANTITY'])

# Optional: Display the data table
if st.checkbox("Show data table"):
    st.subheader("Grouped Data")
    st.dataframe(grouped_data)

image

Congratulations! You've successfully created your first Snowflake semantic view. By implementing this semantic layer directly within Snowflake, you've established a foundation for consistent, AI-ready analytics that bridges the gap between raw data and meaningful business insights.

The semantic view you built not only simplifies complex data relationships but also enables natural language querying through Cortex Analyst, democratizing data access across your organization. As you continue to develop your semantic models, remember that this approach scales to support enterprise-wide analytics initiatives, ensuring that all stakeholders — from data analysts to business users — can trust and understand the data that they're working with.

What You Learned

Related Resources

Articles:

Documentation: