Through this quickstart guide, you will explore how to get started with Cortex Analyst, which is a fully managed service in Snowflake that provides a conversational interface to interact with structured data in Snowflake.
Cortex Analyst is a fully managed service in Cortex AI that provides a conversational interface to interact with structured data in Snowflake. It streamlines the development of intuitive, self-service analytics applications for business users, while providing industry-leading accuracy. To deliver high text-to-SQL accuracy, Cortex Analyst uses an agentic AI setup powered by state-of-the-art LLMs. Available as a convenient REST API, Cortex Analyst can seamlessly integrate into any application. This empowers developers to customize how and where business users interact with results, while still benefiting from Snowflake's integrated security and governance features, including role-based access controls (RBAC), to protect valuable data.
Historically, business users have primarily relied on BI dashboards and reports to answer their data questions. However, these resources often lack the flexibility needed, leaving users dependent on overburdened data analysts for updates or answers, which can take days. Cortex Analyst disrupts this cycle by providing a natural language interface with high text-to-SQL accuracy. With Cortex Analyst organizations can streamline the development of intuitive, conversational applications that can enable business users to ask questions using natural language and receive more accurate answers in near real time
This quickstart will focus on getting started with Cortex Analyst, teaching the mechanics of how to interact with the Cortex Analyst service and how to define the Semantic Model definitions that enhance the precision of results from this conversational interface over your Snowflake data.
Open up the create_snowflake_objects.sql file in a SQL worksheet in Snowsight.
Run the following SQL commands in a SQL worksheet to create the warehouse, database and schema.
/*--
• Database, schema, warehouse, and stage creation
--*/
USE ROLE SECURITYADMIN;
CREATE ROLE cortex_user_role;
GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE cortex_user_role;
GRANT ROLE cortex_user_role TO USER <user>;
USE ROLE sysadmin;
-- Create demo database
CREATE OR REPLACE DATABASE cortex_analyst_demo;
-- Create schema
CREATE OR REPLACE SCHEMA cortex_analyst_demo.revenue_timeseries;
-- Create warehouse
CREATE OR REPLACE WAREHOUSE cortex_analyst_wh
WAREHOUSE_SIZE = 'large'
WAREHOUSE_TYPE = 'standard'
AUTO_SUSPEND = 60
AUTO_RESUME = TRUE
INITIALLY_SUSPENDED = TRUE
COMMENT = 'Warehouse for Cortex Analyst demo';
GRANT USAGE ON WAREHOUSE cortex_analyst_wh TO ROLE cortex_user_role;
GRANT OPERATE ON WAREHOUSE cortex_analyst_wh TO ROLE cortex_user_role;
GRANT OWNERSHIP ON SCHEMA cortex_analyst_demo.revenue_timeseries TO ROLE cortex_user_role;
GRANT OWNERSHIP ON DATABASE cortex_analyst_demo TO ROLE cortex_user_role;
USE ROLE cortex_user_role;
-- Use the created warehouse
USE WAREHOUSE cortex_analyst_wh;
USE DATABASE cortex_analyst_demo;
USE SCHEMA cortex_analyst_demo.revenue_timeseries;
-- Create stage for raw data
CREATE OR REPLACE STAGE raw_data DIRECTORY = (ENABLE = TRUE);
/*--
• Fact and Dimension Table Creation
--*/
-- Fact table: daily_revenue
CREATE OR REPLACE TABLE cortex_analyst_demo.revenue_timeseries.daily_revenue (
date DATE,
revenue FLOAT,
cogs FLOAT,
forecasted_revenue FLOAT,
product_id INT,
region_id INT
);
-- Dimension table: product_dim
CREATE OR REPLACE TABLE cortex_analyst_demo.revenue_timeseries.product_dim (
product_id INT,
product_line VARCHAR(16777216)
);
-- Dimension table: region_dim
CREATE OR REPLACE TABLE cortex_analyst_demo.revenue_timeseries.region_dim (
region_id INT,
sales_region VARCHAR(16777216),
state VARCHAR(16777216)
);
These can also be found in the create_snowflake_objects.sql file.
There are three data files and one YAML file included in the Git Repo that you should have cloned:
You will now upload these files to your Snowflake account and ingest the data files into the tables created in the previous step.
To upload the data files:
Let's go check that the files were successfully uploaded to the stage. In the Snowsight UI:
You should see the four files listed in the stage:
Now, let's load the raw CSV data into the tables. Go back to your Snowflake SQL worksheet and run the following load_data.sql code to load data into the tables:
/*--
• looad data into tables
--*/
USE ROLE CORTEX_USER_ROLE;
USE DATABASE CORTEX_ANALYST_DEMO;
USE SCHEMA CORTEX_ANALYST_DEMO.REVENUE_TIMESERIES;
USE WAREHOUSE CORTEX_ANALYST_WH;
COPY INTO CORTEX_ANALYST_DEMO.REVENUE_TIMESERIES.DAILY_REVENUE
FROM @raw_data
FILES = ('daily_revenue.csv')
FILE_FORMAT = (
TYPE=CSV,
SKIP_HEADER=1,
FIELD_DELIMITER=',',
TRIM_SPACE=FALSE,
FIELD_OPTIONALLY_ENCLOSED_BY=NONE,
REPLACE_INVALID_CHARACTERS=TRUE,
DATE_FORMAT=AUTO,
TIME_FORMAT=AUTO,
TIMESTAMP_FORMAT=AUTO
EMPTY_FIELD_AS_NULL = FALSE
error_on_column_count_mismatch=false
)
ON_ERROR=CONTINUE
FORCE = TRUE ;
COPY INTO CORTEX_ANALYST_DEMO.REVENUE_TIMESERIES.PRODUCT_DIM
FROM @raw_data
FILES = ('product.csv')
FILE_FORMAT = (
TYPE=CSV,
SKIP_HEADER=1,
FIELD_DELIMITER=',',
TRIM_SPACE=FALSE,
FIELD_OPTIONALLY_ENCLOSED_BY=NONE,
REPLACE_INVALID_CHARACTERS=TRUE,
DATE_FORMAT=AUTO,
TIME_FORMAT=AUTO,
TIMESTAMP_FORMAT=AUTO
EMPTY_FIELD_AS_NULL = FALSE
error_on_column_count_mismatch=false
)
ON_ERROR=CONTINUE
FORCE = TRUE ;
COPY INTO CORTEX_ANALYST_DEMO.REVENUE_TIMESERIES.REGION_DIM
FROM @raw_data
FILES = ('region.csv')
FILE_FORMAT = (
TYPE=CSV,
SKIP_HEADER=1,
FIELD_DELIMITER=',',
TRIM_SPACE=FALSE,
FIELD_OPTIONALLY_ENCLOSED_BY=NONE,
REPLACE_INVALID_CHARACTERS=TRUE,
DATE_FORMAT=AUTO,
TIME_FORMAT=AUTO,
TIMESTAMP_FORMAT=AUTO
EMPTY_FIELD_AS_NULL = FALSE
error_on_column_count_mismatch=false
)
ON_ERROR=CONTINUE
FORCE = TRUE ;
Now, you will integrate Cortex Search as a way to improve literal string searches to help Cortex Analyst generate more accurate SQL queries. Writing the correct SQL query to answer a question sometimes requires knowing exact literal values to filter on. Since those values can't always be extracted directly from the question, a search of some kind may be needed.
Go back to your Snowflake SQL worksheet and run the following cortex_search_create.sql code to load data into the tables:
USE DATABASE cortex_analyst_demo;
USE SCHEMA revenue_timeseries;
use ROLE cortex_user_role;
CREATE OR REPLACE CORTEX SEARCH SERVICE product_line_search_service
ON product_dimension
WAREHOUSE = cortex_analyst_wh
TARGET_LAG = '1 hour'
AS (
SELECT DISTINCT product_line AS product_dimension FROM product_dim
);
Now, you will create a demo chat application to call the Cortex Analyst API and ask natural-language questions over our structured revenue datasets. To create the Streamlit in Snowflake application:
+ Streamlit App
, and fill it in with the below details and click create: Run
and begin asking questions!Take note of the get_analyst_response
function that is defined in this Python code. This is the function that takes our chat input prompt and history, packages it up as a JSON object, and sends it to the Cortex Analyst API (with the specified revenue_timeseries.yaml
Semantic Model).
def get_analyst_response(messages: List[Dict]) -> Tuple[Dict, Optional[str]]:
"""
Send chat history to the Cortex Analyst API and return the response.
Args:
messages (List[Dict]): The conversation history.
Returns:
Optional[Dict]: The response from the Cortex Analyst API.
"""
# Prepare the request body with the user's prompt
request_body = {
"messages": messages,
"semantic_model_file": f"@{st.session_state.selected_semantic_model_path}",
}
# Send a POST request to the Cortex Analyst API endpoint
# Adjusted to use positional arguments as per the API's requirement
resp = _snowflake.send_snow_api_request(
"POST", # method
API_ENDPOINT, # path
{}, # headers
{}, # params
request_body, # body
None, # request_guid
API_TIMEOUT, # timeout in milliseconds
)
# Content is a string with serialized JSON object
parsed_content = json.loads(resp["content"])
# Check if the response is successful
if resp["status"] < 400:
# Return the content of the response as a JSON object
return parsed_content, None
else:
# Craft readable error message
error_msg = f"""
🚨 An Analyst API error has occurred 🚨
* response code: `{resp['status']}`
* request-id: `{parsed_content['request_id']}`
* error code: `{parsed_content['error_code']}`
Message: ```{parsed_content['message']}```
"""
return parsed_content, error_msg
You can now begin asking natural language questions about the revenue data in the chat interface (e.g. "What questions can I ask?")
The semantic model file revenue_timeseries.yaml
is the key that unlocks Cortex Analyst's power. This YAML file dictates the tables, columns, etc. that Analyst can use in order to run queries that answer natural-language questions Let's talk a little about the details of this file:
The Semantic Model is composed of a number of different fields that help Cortex Analyst understand the specifics of your data:
dimensions
, time_dimensions
, or measures
Logical Tables are relatively straightforward- these are tables or views within a database. That's it! Pretty simple
Logical Columns get a bit more complicated; a logical column can reference an underlying physical column in a table, or it can be a expression containing one or more physical columns. So, for example, in the revenue_timeseries.yaml
, we have a simple logical column daily_revenue
that is a physical column. In the daily_revenue
measure definition, you'll notice that we provide a description, as well as synonyms, data_type, and a default_aggregation, but no expr
parameter. This is because revenue
is simply a physical column in the daily_revenue
table:
measures:
- name: daily_revenue
expr: revenue
description: total revenue for the given day
synonyms: ["sales", "income"]
default_aggregation: sum
data_type: number
In contrast, we define a different measure daily_profit
which is not in fact a physical column, but rather an expression of the difference between the revenue
and cogs
physical columns:
- name: daily_profit
description: profit is the difference between revenue and expenses.
expr: revenue - cogs
data_type: number
In the semantic model, time_dimensions
specifically capture temporal features of the data, and dimensions
are not quantitative fields (e.g. quantitative fields are measures
, while categorical fields are dimensions
).
An example time_dimension
:
time_dimensions:
- name: date
expr: date
description: date with measures of revenue, COGS, and forecasted revenue for each product line
unique: false
data_type: date
An example dimension
:
dimensions:
- name: product_line
expr: product_line
description: product line associated with it's own slice of revenue
unique: false
data_type: varchar
sample_values:
- Electronics
- Clothing
- Home Appliances
- Toys
- Books
An example relationship
:
relationships:
- name: revenue_to_product
left_table: daily_revenue
right_table: product
relationship_columns:
- left_column: product_id
right_column: product_id
join_type: left_outer
relationship_type: many_to_one
Here are some tips on building your own semantic model to use with Cortex Analyst:
When generating the semantic model, think from the end user perspective:
Some additional items that'll significantly improve model performance:
For more information about the semantic model, please refer to the documentation.
In addition to the previously discussed Semantic Model information, the Cortex Analyst Verified Query Repository (VQR) can help improve accuracy and trustworthiness of results by providing a collection of questions and corresponding SQL queries to answer them. Cortex Analyst will then use these verified queries when answering similar types of questions in the future.
Verified queries ultimately are specified in the verified_queries
section of the semantic model, e.g.:
verified_queries:
name: "lowest revenue each month"
question: "For each month, what was the lowest daily revenue and on what date did that lowest revenue occur?"
sql: "WITH monthly_min_revenue AS (
SELECT
DATE_TRUNC('MONTH', date) AS month,
MIN(daily_revenue) AS min_revenue
FROM daily_revenue
GROUP BY
DATE_TRUNC('MONTH', date)
)
SELECT
mmr.month,
mmr.min_revenue,
dr.date AS min_revenue_date
FROM monthly_min_revenue AS mmr JOIN daily_revenue AS dr
ON mmr.month = DATE_TRUNC('MONTH', dr.date) AND mmr.min_revenue = dr.daily_revenue
ORDER BY mmr.month DESC NULLS LAST"
verified_at: 1715187400
verified_by: Jane
While verified queries can be added directly to the Semantic Model, Snowflake also provides an OSS Streamlit application to help add verified queries to your model.
To install and use this app:
Modify your SiS application code to point at the new Semantic Model YAML file location, and use Cortex Analyst as before!
Congratulations, you have successfully completed this quickstart! Through this quickstart, we were able to showcase how Cortex Analyst allows business users to ask natural-language questions over their structured data to perform analysis and receive trusted answers to business questions.
For more information, check out the resources below: