Operational Excellence in the Snowflake AI Data Cloud is the practice of running and monitoring systems to deliver business value and continuously improve supporting processes and procedures. It focuses on maximizing automation, gaining deep observability into workloads, and establishing a culture of iterative improvement. This empowers your organization to innovate faster with data engineering, analytics, AI, applications, and collaboration, all while managing risk and optimizing for cost and performance.

This pillar emphasizes aligning technology with business outcomes to support the transformations Snowflake enables. Each Operational Excellence principle follows a phase-based structure to reflect an iterative approach:

Ensure operational readiness & performance

Proactively define performance targets (SLOs), test and validate capacity, and continuously optimize compute engines to ensure workloads meet business expectations.

Automate infrastructure & maintenance

Eliminate manual operational tasks by codifying all infrastructure, configuration, and data pipelines, leveraging Snowflake's built-in automation for scaling and maintenance.

Enhance observability & issue resolution

Gain deep, end-to-end visibility into the platform by capturing and analyzing telemetry, logs, and traces to rapidly diagnose and resolve issues.

Manage incidents & problems

Minimize the impact of incidents using AI-driven diagnostics, immutable backups for rapid recovery, and automated governance controls.

Enable collaboration & Secure Sharing

Foster a collaborative data culture by establishing a secure, governed internal marketplace for sharing data, applications, and models.

Manage the AI/ML software development lifecycle

Implement a governed, end-to-end MLOps framework to manage models and features, from experimentation and fine-tuning to deployment and monitoring, directly within the data cloud.

Continuously improve performance & practices

Proactively define performance targets (SLOs), test and validate capacity, and continuously optimize compute engines to ensure workloads meet business expectations.

Overview

Ensuring operational readiness and performance in the Snowflake AI Data Cloud is about creating a stable, efficient, and scalable environment that consistently meets your business objectives. This involves proactively planning for capacity, monitoring system health, optimizing query performance, and implementing robust processes for management and support. A well-performing and operationally sound platform builds trust, drives user adoption, and maximizes the return on your data investment. It ensures that your data engineering pipelines run on schedule, analytical queries return quickly, AI models are trained and deployed efficiently, and data applications deliver a seamless user experience.

Focus areas

To achieve peak performance and operational excellence in Snowflake, concentrate on four key areas that directly impact your workloads.

Phase-based activities

Prepare

In this initial phase, the focus is on planning and design to build a foundation for operational excellence.

Implement

During implementation, you will build and configure the Snowflake environment based on the designs from the Prepare phase.

Operate

The Operate phase focuses on the day-to-day management and maintenance of the Snowflake environment.

Improve

This phase is about continuous improvement through analysis, learning, and refinement.

Recommendations

Here are the key recommendations focused specifically on ensuring operational readiness and performance for your Snowflake environment.

Isolate workloads to guarantee performance

Your top priority for predictable performance is to prevent different jobs from competing for the same resources. A heavy data science task should never slow down a critical business dashboard.

Workload isolation ensures that each process gets the compute it needs without interference.

For additional information, review the best practices in Virtual Warehouse Considerations.

Continuously tune warehouse size for optimal performance

Choosing the right warehouse size is a balancing act. Too small, and queries will run slowly or fail; too large, and you're wasting resources. The key is to use data, not guesswork, to find the sweet spot.

Learn to diagnose bottlenecks using the Query Profile and monitor load with the views in Monitoring Warehouse Load.

Automate monitoring and alerting

You can't achieve operational readiness by manually checking dashboards. You need an automated system that alerts you to problems before your users report them.

Build your automated alert system by following the guide Introduction to Tasks.

Validate your recovery plan with regular drills

An untested disaster recovery (BCDR) plan is just a theory. Operational readiness means having a proven, practiced process to restore service after a major incident.

Find the specific commands and procedures in the guide for Database Replication and Failover/Failback.

Persona responsibilities (RACI chart)

The table below outlines the roles and responsibilities for ensuring operational readiness and performance.

Legend: R - Responsible, A - Accountable, C - Consulted, I - Informed

|| || || || || || || || ||

Overview

Automating infrastructure and maintenance is crucial for achieving efficiency, consistency, and scalability in the Snowflake AI Data Cloud. Because Snowflake is a fully managed service, automation efforts focus less on provisioning underlying servers and more on managing the configuration, workloads, and ecosystem surrounding your data. This framework provides principles and best practices for automating the setup, deployment, and operation of your Snowflake environment to support data engineering, analytics, AI, and application workloads reliably and at scale.

Focus areas

To effectively automate your Snowflake environment, concentrate on four key areas. These areas provide a structured approach to managing your data ecosystem programmatically, reducing manual effort and minimizing human error.

Phase-based activities

Adopting automation is a journey. The following activities are organized by phase to provide a clear roadmap from initial preparation to continuous improvement, aligned with our defined focus areas.

Prepare

The Prepare phase is about planning and laying the foundation for successful automation.

Focus Area

Activities

Infrastructure as Code (IaC)

Evaluate and select an IaC tool (e.g., Terraform, Schemachange).

Define and document naming conventions and standards for all Snowflake objects.

Establish a Git repository structure for managing your IaC configurations.

CI/CD for data & applications

Choose CI/CD tools (e.g., GitHub Actions, Jenkins, GitLab CI) that integrate with your code repositories.

Define a branching and deployment strategy (e.g., GitFlow) for promoting changes from development to production.

Observability & monitoring

Identify key metrics for cost, performance, and security that need to be tracked.

Evaluate tools for collecting and visualizing data from the Snowflake ACCOUNT_USAGE schema.

Define initial alert thresholds for critical events like high credit usage or long-running queries.

Automated governance

Define your RBAC model and map business roles to Snowflake roles.

Document your data classification standards and corresponding security controls (e.g., masking policies for PII).

Implement

The Implement phase involves the initial build-out and rollout of your automation scripts and pipelines.

Focus Area

Activities

Infrastructure as Code (IaC)

Develop initial IaC modules to manage core objects: roles, users, warehouses, and databases.

Create a sandbox environment entirely provisioned through your IaC scripts to validate the process.

CI/CD for data & applications

Build a starter CI/CD pipeline for a single data engineering (e.g., dbt) or Snowpark project.

This pipeline should automate code linting, unit testing, and deployment to a development environment.

Observability & monitoring

Develop scripts or configure tools to automatically pull data from ACCOUNT_USAGE into a monitoring dashboard.

Configure basic automated alerts for budget overruns (via resource monitors) and warehouse contention.

Automated governance

Write scripts to provision your defined RBAC model in Snowflake.

Implement initial dynamic data masking policies on a non-production table containing sensitive data.

Operate

The Operate phase focuses on using and managing your automated systems for day-to-day activities.

Focus Area

Activities

Infrastructure as Code (IaC)

Use your IaC repository and pull-request workflow as the sole method for making environmental changes.

Run periodic checks to detect any manual changes ("drift") that deviate from the code-defined state.

CI/CD for data & applications

All code changes for data pipelines, AI models, and applications are deployed to production via the automated CI/CD pipeline.

Use automated testing gates to prevent regressions from reaching production.

Implement fix forward or rollback for defects.

Observability & monitoring

Regularly review automated cost and performance dashboards.

Integrate automated alerts with your team's communication channels (e.g., Slack, PagerDuty).

Automated governance

Run automated quarterly access reviews and entitlement reports.

Automate the process of granting and revoking access based on requests from your identity provider (e.g., Okta, Azure AD) via SCIM.

Improve

The Improve phase is about refining and optimizing your automation to increase efficiency and capability.

Focus Area

Activities

Infrastructure as Code (IaC)

Refactor IaC modules for greater reusability and simplicity.

Implement automated validation and policy-as-code checks (e.g., ensuring all warehouses have auto-suspend enabled) before applying changes.

CI/CD for data & applications

Optimize pipeline performance to reduce deployment times.

Introduce more sophisticated testing, such as data quality tests (e.g., using dbt tests) and integration tests within the pipeline.

Explore zero-downtime deployment strategies for applications and stored procedures.

Observability & monitoring

Implement automated cost optimization actions, such as automatically resizing warehouses based on historical usage patterns.

Use machine learning to forecast future credit usage and detect performance anomalies.

Automated governance

Automate the tagging of data objects based on their contents to streamline governance.

Develop automated routines to scan for and mask newly discovered sensitive data, ensuring continuous compliance.

Recommendations

Persona responsibilities (RACI chart)

This RACI (Responsible, Accountable, Consulted, Informed) matrix clarifies the roles and responsibilities for automation activities across different personas.

Legend: R = Responsible, A = Accountable, C = Consulted, I = Informed

Activity

C-Level (CIO/CDO)

Chief Architect

Engineering / SRE

Data Science

Security

Defining automation strategy & tooling

A

R

C

C

C

Developing IaC modules & scripts

I

C

R

I

C

Building CI/CD pipelines

I

C

R

C

C

Managing environments via IaC

I

A

R

I

I

Deploying workloads via CI/CD

I

I

R

R

I

Defining & implementing monitoring alerts

I

A

R

C

C

Automating governance & access controls

A

C

R

I

R

Reviewing automated cost/usage reports

A

I

C

C

I

Overview

Observability in the Snowflake AI Data Cloud is about gaining deep, actionable insights into your platform's health, performance, cost, and security. It goes beyond simple monitoring by providing the context needed to understand why something is happening, enabling you to move from reactive problem-fixing to proactive optimization. Effective observability ensures your data engineering pipelines are reliable, your analytics are fast and accurate, your AI models are performant, and your applications are secure. This framework provides a structured approach to building a comprehensive observability strategy that delivers trust and maximizes the value of your Snowflake investment for all stakeholders, from engineers to the C-suite.

Focus areas

We'll organize our observability strategy around four key focus areas. These pillars ensure a holistic view of your Snowflake environment, covering everything from cost efficiency to data integrity.

Phase-based activities

A successful observability strategy is implemented incrementally. The following phases provide a roadmap from initial preparation to continuous improvement.

Prepare

This foundational phase is about defining what "good" looks like by establishing goals, metrics, and ownership before implementing any tools.

Focus area

Activities

Cost & performance intelligence

Define cost allocation strategy: Establish a consistent tagging methodology for users, roles, and warehouses to enable accurate chargeback.

Establish performance baselines: Identify key queries and workloads (e.g., critical dashboard refreshes, ETL jobs) and document their expected runtimes and credit consumption.

• Select tooling: Evaluate whether to use native Snowflake features (Snowsight, ACCOUNT_USAGE views), third-party observability platforms, or a combination.

Workload health & reliability

Define key Service Level Objectives (SLOs): For each workload, define measurable reliability targets. Examples: Snowpipe data freshness within 5 minutes; critical data transformation (dbt) jobs complete by 6 AM.

Map critical data paths: Document the key data flows for your most important analytics, applications, and AI models.

Security & access analytics

Define sensitive data & roles: Classify sensitive data objects and map the roles and users that should have access.

• Establish alerting policies: Define what constitutes a security incident (e.g., unauthorized access attempts, privilege escalation, data exfiltration patterns) that requires an immediate alert.

Data integrity & lineage

Identify Critical Data Elements (CDEs): Pinpoint the most vital datasets that power executive dashboards, financial reporting, or production AI models.

Define data quality rules: For CDEs, define rules for key metrics like freshness, completeness, and validity (e.g., `order_date` cannot be in the future).

Implement

In this phase, you'll configure the tools and processes defined during preparation to start collecting and visualizing observability data.

Focus area

Activities

Cost & performance intelligence

Configure resource monitors: set up warehouse-level monitors to prevent budget overruns by suspending warehouses or sending notifications at defined credit thresholds.

Build foundational dashboards: create snowsight dashboards to visualize credit usage by warehouse/tag, identify long-running queries (QUERY_HISTORY), and monitor warehouse queuing.

Workload health & reliability

Implement error notifications: configure notifications for failed tasks (SYSTEM$SEND_EMAIL) or snowpipe copy errors to immediately alert the responsible teams.

Monitor data ingestion: use the COPY_HISTORY and PIPE_USAGE_HISTORY views to track the latency and health of data loading processes.

Security & access analytics

Enable access monitoring: build dashboards on top of the ACCESS_HISTORY and LOGIN_HISTORY views to visualize user login patterns, query activity on sensitive tables, and privilege grants.

Set up security alerts: implement snowflake alerts to trigger notifications for defined security events, such as a user being granted the ACCOUNTADMIN role.

Data integrity & lineage

Deploy data quality tests: implement data quality checks as part of your data transformation pipeline (e.g., using dbt tests) that run on a schedule.

Utilize object tagging for lineage: apply tags to tables and columns to create a basic, searchable framework for tracking data lineage.

Operate

This phase focuses on the day-to-day use of the implemented observability systems to monitor health and resolve issues.

Focus area

Activities

Cost & performance intelligence

Conduct regular cost reviews: Hold weekly or bi-weekly reviews with engineering and finance teams to analyze spending trends and identify optimization opportunities. • Triage performance issues: Use query history and query profiles to investigate and troubleshoot slow-running queries, identifying bottlenecks like disk spilling or inefficient joins.

Workload health & reliability

Respond to workload alerts: Triage and resolve alerts for failed tasks, data loading errors, or SLO breaches.

Manage incidents: Follow a defined incident management process for critical failures, including communication, root cause analysis (RCA), and post-mortems.

Security & access analytics

Review access logs: Periodically audit access to sensitive data, investigate anomalous queries, and ensure access patterns align with business needs. • Investigate security alerts: When an alert is triggered, follow a security runbook to investigate the potential threat, determine its impact, and remediate as needed.

Data integrity & lineage

Investigate data quality failures: When a data quality test fails, use lineage information to trace the issue back to its source and notify the data producers.

Communicate data incidents: Proactively inform data consumers when a known data quality issue impacts their dashboards or applications.

Improve

This final phase is about moving from reactive to proactive operations by analyzing trends, automating responses, and continuously refining your observability strategy.

Focus area

Activities

Cost & performance intelligence

Automate warehouse scaling: Use historical workload patterns to right-size warehouses or implement a more dynamic scaling strategy for spiky workloads.

Optimize high-cost queries: Proactively identify the top credit-consuming queries each month and assign them to engineering teams for performance tuning or rewriting.

Workload health & reliability

Perform trend analysis: Analyze historical task and pipe error rates to identify systemic issues in data pipelines and prioritize fixes.

Refine SLOs and alerts: Adjust SLO thresholds based on historical performance and business needs. Tune alerts to reduce noise and false positives.

Security & access analytics

Automate access reviews: Develop automated workflows to periodically require business owners to certify who has access to their data, reducing manual toil for security teams.

Enhance threat detection models: Use historical access data to build simple anomaly detection models (e.g., using Snowpark) to identify suspicious behavior that deviates from a user's normal baseline.

Data integrity & lineage

Implement automated lineage: Adopt tools that automatically parse SQL from QUERY_HISTORY to generate column-level lineage, dramatically speeding up impact analysis and root cause identification.

Expand data quality coverage: Use insights from data incidents to expand data quality monitoring to more datasets across the platform.

Recommendations

The following recommendations provide actionable steps for implementing a robust observability and issue resolution strategy on Snowflake. They are designed to guide interactions between teams and leverage specific platform features.

Establish a centralized observability data model

Instead of allowing teams to query raw metadata independently, create a governed, centralized foundation for all observability data. This ensures consistency and simplifies access control.

For Enterprise Architects & SREs:

For Engineering & Data Science teams:

Enrich observability data with contextual tagging

Raw metrics like "credit usage" or "query runtime" are not actionable without context. A consistent tagging strategy is crucial for quickly isolating the source of any issue.

For Chief Architects & Engineering Leads:

For SREs & on-call engineers:

Implement persona-driven observability dashboards

A single dashboard cannot serve everyone. Build a tiered set of dashboards in Snowsight that provides the right level of detail for each persona, enabling them to answer their specific questions quickly.

For SREs & platform owners:

For Engineering & Data Science leads:

For C-Level stakeholders (CIO/CFO):

Automate detection and response with alerts & tasks

Move from passive monitoring to active, automated observability. Use Snowflake's native features to not only detect issues but also to notify the right people and, where appropriate, trigger corrective actions.

For Security & SRE teams:

Persona responsibilities (RACI chart)

This RACI (Responsible, Accountable, Consulted, Informed) matrix clarifies the roles and responsibilities for key observability activities across different teams.

Legend: R = Responsible, A = Accountable, C = Consulted, I = Informed

|| || || || || || || || || ||

Overview

Effective incident and problem management is the cornerstone of a reliable data platform. In the Snowflake AI Data Cloud, where critical data pipelines, analytics, AI workloads, and applications depend on high availability and performance, a structured approach to handling disruptions is essential. The primary goal is to maintain the trust of your stakeholders by ensuring that data services are available, performant, and deliver accurate results.

This framework addresses two distinct but related disciplines:

This section provides a structured approach for managing the entire incident lifecycle within the Snowflake AI Data Cloud—from proactive detection and rapid response to thorough root cause analysis. Adopting these practices builds confidence in your data platform, ensuring it remains a dependable foundation for critical business decisions and innovation.

Focus areas

Detection & alerting

This is the "smoke alarm" for your data platform. The goal is to identify deviations from normal behavior as quickly as possible, often before users are impacted. This involves instrumenting key Snowflake metrics, such as query execution time, warehouse load, and data latency, and setting up automated alerts to notify the on-call team of potential issues.

Response & Triage

When an alert fires, this is the initial assessment and firefighting phase. The focus is on understanding the impact ("who is affected?"), determining the severity ("how bad is it?"), and executing immediate actions to stabilize the service. In Snowflake, this could involve canceling a runaway query, resizing a virtual warehouse, or communicating the initial findings to stakeholders.

Resolution & recovery

This area focuses on fully resolving the incident and returning the service to a healthy state. It involves deeper diagnosis to find a fix or a workaround that restores functionality for all affected users. This phase concludes when the immediate impact is over and the service is operating under normal conditions again.

Root Cause Analysis (RCA)

Once the immediate fire is out, the problem management process begins. RCA is the systematic investigation to uncover the fundamental cause of an incident, not just the symptoms. The objective is to ask "why" repeatedly until the core issue is identified, such as an inefficient query pattern, a flawed data model, or an inadequate warehouse configuration.

Readiness & Preparation

This is the proactive area focused on learning from past incidents and improving future responses. It involves creating and refining runbooks for common failure scenarios, conducting incident response drills, and ensuring roles and communication plans are clearly defined. A well-prepared team can significantly reduce the time it takes to resolve future incidents.

Phase-based activities

Prepare

This phase includes all the proactive work your teams do before an incident occurs to ensure they are ready to respond effectively.

Implement

In this phase, you build and configure the systems, tools, and documentation needed to manage incidents efficiently.

Operate

This phase covers the real-time activities your team performs during an active incident.

Improve

This phase is about learning from past incidents after they are resolved to build a more resilient system and process.

Recommendations

Effective incident management in Snowflake relies on leveraging its unique architectural strengths—separating compute from storage and providing rich operational metadata. These recommendations provide actionable steps for detection, response, and improvement.

Establish a single source of truth for triage

During an incident, speed and accuracy are paramount. Centralize your initial investigation using Snowflake's comprehensive metadata logs to shorten the time from detection to diagnosis.

Centralize incident detection with ACCOUNT_USAGE views

Isolate and mitigate performance incidents immediately

Snowflake's architecture provides powerful levers to contain and resolve performance degradation without affecting the entire platform. Your response should focus on isolating the problematic workload and adjusting compute resources dynamically.

Terminate runaway queries without affecting the warehouse

Dynamically scale compute to resolve resource contention

Automate disaster recovery and high-availability responses

For major incidents like a regional outage, your response should be swift, tested, and reliable. This depends on preparation and leveraging Snowflake's built-in business continuity features.

Execute pre-defined runbooks for cross-region failover.

Perform data-driven post-mortems for continuous improvement

Every incident is a learning opportunity. Use Snowflake's detailed query execution data to move beyond symptoms and identify the precise root cause, leading to permanent fixes.

Analyze the query profile to pinpoint inefficiencies

Persona responsibilities (RACI chart)

A RACI (Responsible, Accountable, Consulted, Informed) matrix defines the roles and responsibilities for incident and governance management:

Legend: R = Responsible, A = Accountable, C = Consulted, I = Informed

|| || || || || ||

Overview

Enabling secure, real-time collaboration across your organization, with customers, and with business partners is a foundational pillar of the Snowflake AI Data Cloud. Unlike traditional methods that involve risky and inefficient data duplication and FTP transfers, Snowflake's architecture allows you to share live, governed data without moving or copying it. This unlocks new opportunities for data-driven insights, AI development, and monetization while maintaining a strong security and governance posture.

This framework provides principles and best practices to help you build a robust strategy for sharing and collaboration. By implementing these guidelines, you can break down data silos, accelerate innovation, and create new value streams, all while ensuring your data remains protected.

Focus areas

To effectively enable collaboration and secure sharing in Snowflake, concentrate on four key areas. These areas provide a structured approach to designing, implementing, and managing your data sharing ecosystem.

Phase-based activities

A successful data sharing strategy is implemented progressively. The following phases outline the journey from initial planning to continuous improvement.

Prepare

The Prepare phase is about establishing the strategy, governance framework, and organizational alignment needed for secure data sharing.

Focus area

Activities

Secure data sharing architecture

Identify and prioritize datasets suitable for sharing. Define potential data consumers (internal teams, external partners) and their access requirements. Design a hub-and-spoke or federated sharing model.

Granular governance and control

Define a data sharing policy that outlines acceptable use, security requirements, and the approval process. Establish a data classification framework to tag sensitive data (e.g., PII, confidential).

Unified collaboration for workloads

Identify key collaboration use cases, such as joint AI/ML model development or building a shared analytics dashboard.

Comprehensive auditing and monitoring

Define key metrics for success and risk, such as the number of data consumers, query volume on shares, and types of sensitive data being accessed. Plan your auditing strategy.

Implement

The Implement phase involves the hands-on configuration of the Snowflake platform to bring your sharing strategy to life.

Focus area

Activities

Secure data sharing architecture

Create SHARE objects and grant them access to specific database objects (tables, secure views). For external sharing without a Snowflake account, provision Reader Accounts. For broad distribution, create listings on the Snowflake Marketplace.

Granular governance and control

Implement RBAC roles for data sharing administration and consumption. Apply dynamic data masking policies to sensitive columns and row-access policies to tables before adding them to a share. Use object tags to automate policy application.

Unified collaboration for workloads

Develop and deploy Snowpark applications that can be shared via listings. Build shared data engineering pipelines using Streams and Tasks. Package and publish Snowflake Native Apps to offer data and application logic together.

Comprehensive auditing and monitoring

Configure alerts on QUERY_HISTORY and ACCESS_HISTORY to monitor for unusual access patterns on shared objects. Set up monitoring dashboards to track share consumption and performance.

Operate

The Operate phase focuses on the day-to-day management of your data sharing environment, ensuring it runs smoothly and securely.

Focus area

Activities

Secure data sharing architecture

Manage the lifecycle of data consumers, including approving requests, providing support, and revoking access when necessary. Regularly update data in shares to ensure consumers have the freshest information.

Granular governance and control

Conduct periodic reviews of access controls and sharing policies to ensure they remain aligned with business needs and compliance requirements.

Unified collaboration for workloads

Provide support for shared assets like Snowpark applications and data pipelines. Gather feedback from users to identify areas for improvement.

Comprehensive auditing and monitoring

Regularly review audit logs and monitoring dashboards. Investigate any security alerts or performance degradation related to data sharing activities.

Improve

The Improve phase is about optimizing and evolving your data sharing capabilities based on feedback, usage data, and new business requirements.

Focus area

Activities

Secure data sharing architecture

Actively solicit feedback from data consumers to enhance datasets and create new data products. Analyze Marketplace usage to optimize listings and pricing. Automate the consumer onboarding process.

Granular governance and control

Refine and automate the application of governance policies using tag-based masking and access controls. Update the data sharing policy based on lessons learned and evolving regulations.

Unified collaboration for workloads

Enhance Snowflake Native Apps with new features based on consumer feedback. Explore new collaboration patterns using emerging Snowflake features.

Comprehensive auditing and Monitoring

Fine-tune monitoring alerts to reduce false positives. Develop more sophisticated usage analytics to better understand the value derived from shared data and identify new sharing opportunities.

Recommendations

To activate your data sharing and collaboration strategy, your teams should take specific, coordinated actions. The following recommendations provide an imperative guide for stakeholders, detailing the exact tools to use and referencing official Snowflake documentation for further detail.

Mandate the use of secure views for all shares

Instead of sharing raw tables, always use SECURE VIEWS as the interface for your data consumers. This creates a durable, controlled contract that decouples consumers from your underlying physical data model and embeds fine-grained security logic directly into the shared object.

Action Plan:

  1. The Chief Enterprise Architect and Security team will mandate a policy stating that no TABLE object can be directly added to a SHARE.
  2. For a new sharing request, the Data Science or business user defines the specific columns and row-level filtering criteria needed by the consumer.
  3. The Data Engineer translates these requirements into a CREATE SECURE VIEW statement. Within the WHERE clause of the view, they use functions like CURRENT_ROLE() or IS_ROLE_IN_SESSION() to implement logic that filters data based on the consumer's role.
  4. The Security team reviews and approves the view's DDL to ensure it doesn't expose sensitive data.
  5. Finally, the Data Engineer grants SELECT on the secure view to the SHARE.

Operationalize a "data as a product" mindset with in-platform documentation

Treat every shared dataset as a product. A product requires clear documentation that allows consumers to discover, understand, and trust it. Use Snowflake's native features to build and share this documentation alongside the data itself.

Action Plan:

  1. The Data Governance team defines a standard for documentation, including mandatory descriptions for all shared tables, views, and columns.
  2. During development, the Data Engineer adds descriptive COMMENT metadata to every object and column using COMMENT = ‘...' in their DDL or COMMENT ON ... IS ‘...' statements.
  3. The Chief Enterprise Architect designs a "Data Dictionary" view built on the SNOWFLAKE.ACCOUNT_USAGE.COLUMNS view. This view should be shared with all internal data consumers, allowing them to query and explore available datasets and their documented business context.
  4. For external consumers on the Marketplace, the Engineering team adds rich descriptions and sample queries directly into the listing's UI in Snowsight.

Automate governance at scale with tag-based policies

Manually applying security policies to hundreds of tables is not scalable and is prone to error. Instead, implement a tag-based governance framework where security policies (like masking) automatically attach to data based on its classification.

Action Plan:

  1. The Security team, in consultation with the CIO/CDO, defines a data classification taxonomy and creates the corresponding object tags in Snowflake (e.g., CREATE TAG pii_level WITH ALLOWED_VALUES ‘HIGH', ‘LOW').
  2. Security then creates generic masking policies. For example, a policy named mask_pii_high that redacts data, and another named mask_email that shows only the email domain.
  3. Security then associates these policies with the tags (e.g., ALTER TAG pii_level SET MASKING POLICY mask_pii_high).
  4. As part of their CI/CD process, Data Engineers are responsible for setting the appropriate tags on tables and columns as they are created (e.g., ALTER TABLE ... MODIFY COLUMN email SET TAG pii_level = ‘HIGH').
  5. Snowflake automatically applies the correct masking policy to the email column by virtue of the tag, ensuring governance is enforced without manual intervention before the data is ever added to a share.

Distribute application logic securely with Snowflake Native Apps

When you need to share more than just data—such as a proprietary algorithm, a machine learning model, or a complete interactive application—use the Snowflake Native App Framework. This allows consumers to run your logic on their own data without the code or data ever leaving their secure Snowflake environment.

Action Plan:

  1. A Data Science team develops a predictive model using Snowpark for Python and saves it as a User-Defined Function (UDF).
  2. An Application Developer (within Engineering) builds a user interface using the Streamlit in Snowflake integration that allows users to input data and see the model's prediction.
  3. The developer packages the Snowpark UDF, the Streamlit UI, and any necessary stored procedures into an APPLICATION PACKAGE. They define the components in a manifest.yml file and a setup.sql script.
  4. The CDO and Engineering lead decide to list the application on the Marketplace. The engineer uses Snowsight to create a listing from the application package, adding pricing and usage terms.
  5. A consumer can now "install" this application, running the provider's proprietary model against their own private customer table, with the provider having zero access to the consumer's data.

Persona responsibilities (RACI chart)

Clarifying roles and responsibilities is crucial for a well-governed data sharing program. The following RACI (Responsible, Accountable, Consulted, Informed) matrix outlines the typical duties for each persona.

Legend: R = Responsible, A = Accountable, C = Consulted, I = Informed

Activity

CIO / CDO

Chief Enterprise Architect

Security

Engineering / SRE

Data Science

Define Data Sharing & Monetization Strategy

A

C

C

I

C

Establish Governance & Sharing Policies

A

C

R

I

I

Design the Sharing Architecture (e.g., Shares, Views)

I

A

C

R

C

Implement and Apply Security Controls (Masking/Row Access)

I

I

A

R

I

Publish and Manage Marketplace Listings

A

I

C

R

C

Approve and Onboard Data Consumers

I

I

C

R

A

Monitor and Audit Sharing Usage

I

I

A

R

I

Develop Collaborative Snowpark/Native App Assets

I

C

C

R

A

Overview

A well-defined Software Development Lifecycle (SDLC) in Snowflake enables teams to innovate faster while maintaining stability and governance. It transforms development from an ad-hoc process into a predictable, repeatable, and automated workflow. By applying software engineering best practices like version control, automated testing, and CI/CD to your data projects, you can significantly reduce manual errors, improve collaboration, and increase the trustworthiness of your data assets. This is essential for all key workloads, whether you are building scalable data pipelines, developing complex machine learning models with Snowpark, or deploying native applications.

Focus areas

To build a robust SDLC, we recommend concentrating on five key focus areas. These areas provide the foundation for a mature and scalable development process on Snowflake.

Phase-based activities

Managing the SDLC in a well-architected way can be broken down into four distinct phases. Here's how the focus areas apply to each phase.

Prepare

This phase is about planning and setting up the foundational components for your project.

Implement

This phase involves the core development and building of your data asset or application.

Operate

This phase focuses on deploying, managing, and monitoring the solution in production.

Improve

This final phase is about iterating on the solution and the process itself based on operational feedback.

Recommendations

To implement a mature SDLC for the Snowflake AI Data Cloud, your teams should adopt specific practices and tools. These recommendations provide actionable guidance for each stakeholder to build a reliable, automated, and governable development lifecycle.

Standardize your core SDLC toolchain

Action: The Chief Enterprise Architect and Engineering Leads should define and enforce a single, approved toolchain for source control, CI/CD, and infrastructure management. This prevents fragmentation and ensures consistency.

These CI/CD pipelines will use tools like the SnowSQL CLI or the Snowflake Connector for Python to execute scripts and deploy objects against Snowflake environments.

Adopt GitOps for all Snowflake object management

Action: All changes to Snowflake database objects (schemas, tables, views, roles, grants) can be managed declaratively as code in Git. Direct CREATE OR REPLACE commands against production environments should be prohibited.

This practice provides a robust audit trail for all DDL, DML, and DCL statements executed against Snowflake, improving governance and simplifying troubleshooting.

Automate quality gates with comprehensive testing

Action: Enforce automated testing as a mandatory step in your CI pipeline. Pull requests that do not pass all tests must be blocked from merging.

Testing can be done against temporary schemas or databases created using Zero-Copy Cloning, providing a production-like environment without incurring storage costs or performance impact.

Isolate workflows with on-demand cloned environments

Action: Empower your development teams with self-service, isolated environments using Snowflake's Zero-Copy Cloning feature. This eliminates development bottlenecks and ensures high-fidelity testing.

This directly leverages one of Snowflake's most powerful features. Cloning is an instantaneous metadata operation, meaning environments are ready in seconds, not hours, and consume no additional storage until changes are made.

Embed cost and performance monitoring into the workflow

Action: Make cost and performance explicit responsibilities of the development team, not just an operational afterthought. Integrate monitoring directly into the SDLC.

This leverages Snowflake's rich metadata and governance features. Using QUERY_TAG allows you to precisely attribute credit consumption to specific features or changes, enabling true cost visibility.

Persona responsibilities (RACI chart)

This RACI (Responsible, Accountable, Consulted, Informed) matrix outlines the typical roles and responsibilities across the SDLC lifecycle.

Legend: R = Responsible, A = Accountable, C = Consulted, I = Informed

|| || || || || || || || || || || || || ||

Overview

Continuously improving performance and operational practices is essential for maximizing the value, efficiency, and innovation you get from the Snowflake AI Data Cloud. It's not a one-time task but an ongoing cycle of measurement, analysis, and optimization that ensures your platform evolves with your business needs. This approach helps you control costs, enhance user experience, and maintain a robust, scalable data environment.

Focus areas

To structure your improvement efforts, concentrate on these four key areas. They provide a comprehensive framework for optimizing every aspect of your Snowflake usage, from query execution to team expertise.

Phase-based activities

Continuous improvement is a journey. By breaking it down into the four distinct iterative phases of the Operational Excellence pillar, you can apply focused effort at each stage of your project lifecycle.

Prepare

In this phase, you lay the groundwork for success by defining goals, standards, and metrics before a project begins.

Implement

During the development and deployment phase, you turn plans into action, with a focus on building efficient and manageable solutions.

Operate

Once a solution is live, the focus shifts to monitoring, maintenance, and real-time optimization.

Improve

This proactive phase involves looking for opportunities to refine and enhance your existing solutions and practices.

Recommendations

Proactively manage workload performance with observability tools

Don't wait for performance issues to arise. Empower your teams to use Snowflake's built-in observability tools to find and fix inefficiencies before they impact the business.

Action for Engineering & Data Science:

Action for SREs & Architects:

Implement a continuous compute optimization cycle

Virtual warehouse configuration is not a "set it and forget it" task. Create a formal, data-driven process to ensure your compute resources are always perfectly matched to your workloads.

Action for Engineering Leads & SREs:

Action for Engineering & Finance (FinOps):

Automate performance guardrails and operational tasks

Reduce manual effort and human error by embedding performance best practices and operational duties directly into your automated workflows and data pipelines. ⚙️

Action for SREs & DevOps Engineers:

Action for Data Engineers:

Foster a culture of continuous learning and excellence

Your team's expertise is the most critical factor in achieving sustained performance. Invest in a structured program to keep skills sharp and align everyone on best practices.

Action for C-Level & Chief Architects:

Action for Team Leads & Engineers:

Persona responsibilities (RACI chart)

Clarifying roles ensures that everyone understands their part in the continuous improvement process. The matrix below outlines typical responsibilities.

Legend: R - Responsible, A - Accountable, C - Consulted, I - Informed