This demo showcases how to transform Google Drive business documents into actionable strategic intelligence using Snowflake's unstructured data processing capabilities.
You'll work with a realistic Festival Operations business document collection that includes:
By completing this guide, you will be able to build an end-to-end unstructured data pipeline that ingests documents from Google Drive, processes them through Openflow, and enables intelligent search and analysis using Snowflake Intelligence.
Here is a summary of what you will be able to learn in each step by following this quickstart:
Openflow is Snowflake's managed service for building and running data pipelines in Snowpark Container Services (SPCS). It provides pre-built connectors and processing capabilities that make it easy to ingest, transform, and analyze data from various sources including unstructured documents.
Key Benefits:
Learn more about Openflow.
Snowflake Intelligence is an integrated AI capability that enables natural language interactions with your data. It combines large language models with your business context to provide intelligent search, analysis, and insights.
Core Components:
This quickstart will focus on:
Before starting, ensure you have:
First, clone the repository to get access to sample documents and SQL scripts:
git clone https://github.com/Snowflake-Labs/sfguide-getting-started-openflow-unstructured-data-pipeline.git
cd sfguide-getting-started-openflow-unstructured-data-pipeline
Repository Contents:
sample-data/google-drive-docs/
- 15 Festival Operations documents in various formats (PDF, DOCX, PPTX, JPG)sql/
- Reusable SQL scripts for setup, checks, and verificationTaskfile.yml
- Automation tasks for building additional documentsFor executing SQL scripts directly in Snowsight, you can import this repository into Snowflake Workspaces:
https://github.com/Snowflake-Labs/sfguide-getting-started-openflow-unstructured-data-pipeline
Learn more about integrating Workspaces with Git.
Log into Snowsight using your credentials to create the necessary database objects.
Open Snowflake Workspaces and run the following SQL commands to create the warehouse, database, schema, and role.
-- Create role and warehouse
USE ROLE ACCOUNTADMIN;
CREATE ROLE IF NOT EXISTS FESTIVAL_DEMO_ROLE;
CREATE WAREHOUSE IF NOT EXISTS FESTIVAL_DEMO_S
WAREHOUSE_SIZE = SMALL
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE;
GRANT USAGE ON WAREHOUSE FESTIVAL_DEMO_S TO ROLE FESTIVAL_DEMO_ROLE;
-- Create database and grant ownership
CREATE DATABASE IF NOT EXISTS OPENFLOW_FESTIVAL_DEMO;
GRANT OWNERSHIP ON DATABASE OPENFLOW_FESTIVAL_DEMO TO ROLE FESTIVAL_DEMO_ROLE;
-- Grant role to current user
SET CURR_USER=(SELECT CURRENT_USER());
GRANT ROLE FESTIVAL_DEMO_ROLE TO ROLE IDENTIFIER($CURR_USER);
-- Switch to demo role and create schema
USE ROLE FESTIVAL_DEMO_ROLE;
USE DATABASE OPENFLOW_FESTIVAL_DEMO;
CREATE SCHEMA IF NOT EXISTS FESTIVAL_OPS;
Cortex Search and Snowflake Intelligence are available by default in most regions.
For accounts in other regions, you may need to enable cross-region Cortex access:
-- Check current Cortex cross-region setting (requires ORGADMIN role)
SHOW PARAMETERS LIKE 'CORTEX_ENABLED_CROSS_REGION' IN ACCOUNT;
-- Enable cross-region Cortex access if needed (requires ORGADMIN role)
-- This allows your account to use Cortex services from us-west-2
ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'AWS_US';
-- Verify the setting was applied
SHOW PARAMETERS LIKE 'CORTEX_ENABLED_CROSS_REGION' IN ACCOUNT;
Configure external access for Google APIs to allow Openflow to connect to Google Drive.
First, create a schema for network configuration (or use an existing one):
-- Create schema for network rules
USE ROLE ACCOUNTADMIN;
USE DATABASE OPENFLOW_FESTIVAL_DEMO;
CREATE SCHEMA IF NOT EXISTS NETWORKS;
Now create the network rules and external access integration:
-- Create network rule for Google APIs
CREATE OR REPLACE NETWORK RULE google_network_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = (
'admin.googleapis.com',
'oauth2.googleapis.com',
'www.googleapis.com',
'google.com'
);
-- Verify the network rule was created successfully
DESC NETWORK RULE google_network_rule;
If you need to access resources from your specific Google Workspace domain, create an additional network rule:
-- Create network rule for your Google Workspace domain
-- Replace 'your-domain.com' with your actual Google Workspace domain
CREATE OR REPLACE NETWORK RULE your_workspace_domain_network_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('your-domain.com');
-- Example: For a domain like kamesh.dev
-- CREATE OR REPLACE NETWORK RULE kameshs_dev_network_rule
-- MODE = EGRESS
-- TYPE = HOST_PORT
-- VALUE_LIST = ('kameshs.dev');
-- Verify the domain network rule was created successfully
DESC NETWORK RULE your_workspace_domain_network_rule;
Now combine the network rules into an external access integration:
-- Create external access integration with Google API access
-- If you created a workspace domain rule, add it to ALLOWED_NETWORK_RULES
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION festival_ops_access_integration
ALLOWED_NETWORK_RULES = (
OPENFLOW_FESTIVAL_DEMO.NETWORKS.google_network_rule
-- Add your workspace domain rule if created:
-- , OPENFLOW_FESTIVAL_DEMO.NETWORKS.your_workspace_domain_network_rule
)
ENABLED = TRUE
COMMENT = 'Used for Openflow SPCS runtime to access Google Drive';
-- Verify the external access integration was created with correct settings
DESC EXTERNAL ACCESS INTEGRATION festival_ops_access_integration;
Grant necessary permissions to the Openflow admin role:
-- Grant access to the external access integration
GRANT USAGE ON DATABASE OPENFLOW_FESTIVAL_DEMO TO ROLE OPENFLOW_ADMIN;
GRANT USAGE ON SCHEMA OPENFLOW_FESTIVAL_DEMO.NETWORKS TO ROLE OPENFLOW_ADMIN;
GRANT USAGE ON INTEGRATION festival_ops_access_integration TO ROLE OPENFLOW_ADMIN;
-- Verify grants
SHOW GRANTS TO ROLE OPENFLOW_ADMIN;
This section guides you through setting up Openflow SPCS infrastructure and creating a runtime for the Festival Operations document pipeline.
Follow the comprehensive Getting Started with Openflow SPCS quickstart to set up your infrastructure:
Quickstart Guide: Getting Started with Openflow SPCS
This 25-minute setup includes:
Step | Task | Duration | What You'll Create |
1 | Setup Core Snowflake | 10 min |
|
2 | Create Deployment | 5 min | Openflow SPCS deployment with optional event logging |
3 | Create Runtime Role | 5 min | Runtime role with external access integrations |
4 | Create Runtime | 5 min | Active runtime environment ready for connectors |
Key Components You'll Set Up:
OPENFLOW_ADMIN
Role: Administrative role with deployment and integration privilegesFESTIVAL_DEMO_ROLE
(created in Setup Environment) with database, schema, warehouse access, and external access integrationsAfter Setup: Once you complete the quickstart, you'll have a production-ready Openflow environment. You can then proceed with adding the Google Drive connector for this Festival Operations demo.
If you already have Openflow SPCS set up in your account, you can reuse your existing infrastructure. However, for this quickstart, we recommend using FESTIVAL_DEMO_ROLE
to keep naming consistent:
FESTIVAL_DEMO_ROLE
access - Ensure it has access to festival_ops_access_integration
(created in the previous "Setup Environment" section)FESTIVAL_DEMO_ROLE
:USE ROLE ACCOUNTADMIN;
GRANT USAGE ON INTEGRATION festival_ops_access_integration TO ROLE FESTIVAL_DEMO_ROLE;
After completing the Openflow SPCS setup, access the Openflow interface to configure your runtime:
Now that you have Openflow open, configure your runtime for the Festival Operations pipeline:
Create a dedicated runtime for this demo:
FESTIVAL_DOC_INTELLIGENCE
FESTIVAL_DEMO_ROLE
(created in Setup Environment)festival_ops_access_integration
(created in previous step)OPENFLOW_FESTIVAL_DEMO
FESTIVAL_OPS
FESTIVAL_DEMO_S
After creating the runtime, your Openflow interface will look like this:
If you already have a runtime from the Openflow SPCS quickstart (e.g., QUICKSTART_RUNTIME
):
USE ROLE ACCOUNTADMIN;
GRANT USAGE ON INTEGRATION festival_ops_access_integration TO ROLE YOUR_RUNTIME_ROLE;
Once the runtime is active, add the Google Drive connector (Overview tab in Openflow Home page):
The connector will be automatically added to your canvas:
Before configuring the connector, set up your Google Drive location:
Now configure the Google Drive connector with the following parameters:
hi@kameshs.dev
(your Google Workspace user with drive access)OPENFLOW_FESTIVAL_DEMO
FESTIVAL_OPS
SNOWFLAKE_SESSION_TOKEN
FESTIVAL_DEMO_ROLE
FESTIVAL_DEMO_S
Navigate to Parameter Contexts from Runtime Canvas:
Turn Off Parameter Inheritance (for clarity):
Click the checkbox to disable inherited parameters and show only ingestion-specific settings:
Configure the Ingestion Parameters:
Now configure only the ingestion-specific parameters:
pdf,txt,docx,xlsx,pptx,html,jpg
[YOUR WORKSPACE DOMAIN]
[Your shared drive ID]
Festival Operations
(The folder path in your Google Shared Drive)LAYOUT
(preserves document structure during text extraction)FESTIVAL_DEMO_ROLE
After configuring all parameters, you need to enable and start the pipeline by right-clicking on the canvas:
Once started, you should see the connector running with active processors:
The pipeline will automatically:
Before running the pipeline, you need to prepare the Festival Operations sample documents in your Google Drive.
The repository includes 15 business documents across multiple formats in the sample-data/google-drive-docs/
directory:
sample-data/google-drive-docs/
├── Analysis/
│ └── Post-Event-Analysis-Summer-2024.pptx
├── Compliance/
│ └── Health-Safety-Policy.pdf
├── Executive Meetings/
│ └── Board-Meeting-Minutes-Q4-2024.docx
├── Financial Reports/
│ └── Q3-2024-Financial-Analysis.pdf
├── Operations/
│ ├── Venue-Setup-Operations-Manual-0.jpg
│ ├── Venue-Setup-Operations-Manual-1.jpg
│ ├── Venue-Setup-Operations-Manual-2.jpg
│ └── Venue-Setup-Operations-Manual-3.jpg
├── Projects/
│ └── Sound-System-Modernization-Project-Charter.docx
├── Strategic Planning/
│ ├── 2025-Festival-Expansion-Strategy-0.jpg
│ ├── 2025-Festival-Expansion-Strategy-1.jpg
│ ├── 2025-Festival-Expansion-Strategy-2.jpg
│ ├── 2025-Festival-Expansion-Strategy-3.jpg
│ └── 2025-Festival-Expansion-Strategy-4.jpg
├── Training/
│ └── Customer-Service-Training-Guide.pptx
└── Vendors/
└── Audio-Equipment-Service-Agreement.pdf
Document Formats: PDF, DOCX, PPTX, JPG - demonstrating true multi-format document intelligence
Complete the document preparation in your Google Drive:
sample-data/google-drive-docs/
structure:sample-data/google-drive-docs/
directory into the corresponding foldersAfter uploading, verify your Google Drive "Festival Operations" folder contains all 15 documents across multiple formats:
Folder | Document | Format |
Strategic Planning | 2025-Festival-Expansion-Strategy (5 images) | JPG |
Operations | Venue-Setup-Operations-Manual (4 images) | JPG |
Projects | Sound-System-Modernization-Project-Charter | DOCX |
Financial Reports | Q3-2024-Financial-Analysis | |
Compliance | Health-Safety-Policy | |
Vendors | Audio-Equipment-Service-Agreement | |
Analysis | Post-Event-Analysis-Summer-2024 | PPTX |
Training | Customer-Service-Training-Guide | PPTX |
Format Summary:
Once the Google Drive Connector starts, you can monitor the pipeline execution directly from the canvas. The processor group displays real-time statistics:
The animation demonstrates:
Check that documents have been successfully ingested using the verification queries:
-- Switch to the correct role and database
USE ROLE FESTIVAL_DEMO_ROLE;
USE WAREHOUSE FESTIVAL_DEMO_S;
USE DATABASE OPENFLOW_FESTIVAL_DEMO;
USE SCHEMA FESTIVAL_OPS;
-- Show all tables created by Openflow connector
SHOW TABLES;
-- Show all stages created by Openflow connector
SHOW STAGES;
The Openflow connector automatically creates several tables for document management:
-- Describe the auto-created tables
DESC TABLE docs_chunks; -- Document content chunks
DESC TABLE docs_groups; -- Document groupings
DESC TABLE docs_perms; -- Document permissions
DESC TABLE doc_group_perms; -- Group permissions
DESC TABLE file_hashes; -- File tracking and metadata
DESC TABLE perms_groups; -- Permission groups
-- View file tracking information
SELECT * FROM file_hashes;
Query the document chunks to see ingested content:
-- View document chunks
SELECT * FROM docs_chunks LIMIT 10;
-- Get distinct document IDs and filenames
SELECT DISTINCT
METADATA:id::string as id,
METADATA:fullName::string as filename
FROM docs_chunks;
-- Check specific document categories
SELECT COUNT(DOC_ID)
FROM file_hashes
WHERE LOWER(DOC_ID) LIKE '%strategy%';
Verify all documents are ingested across demo categories:
-- Comprehensive document verification by demo category
SELECT
COUNT(*) as total_docs,
-- Strategic Planning Documents (Expected: 7)
COUNT(CASE WHEN
LOWER(DOC_ID) LIKE '%strategy%' OR
LOWER(DOC_ID) LIKE '%board%meeting%' OR
LOWER(DOC_ID) LIKE '%meeting%minutes%' OR
LOWER(DOC_ID) LIKE '%financial%analysis%' OR
LOWER(DOC_ID) LIKE '%q3%2024%financial%'
THEN 1 END) as strategic_docs,
-- Operations Excellence Documents (Expected: 5)
COUNT(CASE WHEN
(LOWER(DOC_ID) LIKE '%operation%manual%' OR LOWER(DOC_ID) LIKE '%venue%setup%') OR
(LOWER(DOC_ID) LIKE '%sound%system%' AND LOWER(DOC_ID) LIKE '%project%') OR
(LOWER(DOC_ID) LIKE '%post%event%analysis%')
THEN 1 END) as operations_docs,
-- Compliance & Risk Documents (Expected: 3)
COUNT(CASE WHEN
(LOWER(DOC_ID) LIKE '%health%safety%' OR LOWER(DOC_ID) LIKE '%safety%policy%') OR
(LOWER(DOC_ID) LIKE '%service%agreement%' OR LOWER(DOC_ID) LIKE '%audio%equipment%') OR
(LOWER(DOC_ID) LIKE '%post%event%analysis%')
THEN 1 END) as compliance_docs,
-- Knowledge Management Documents (Expected: 1)
COUNT(CASE WHEN
LOWER(DOC_ID) LIKE '%training%guide%' OR
LOWER(DOC_ID) LIKE '%customer%service%training%'
THEN 1 END) as training_docs,
-- Document format breakdown
COUNT(CASE WHEN LOWER(DOC_ID) LIKE '%.jpg' THEN 1 END) as jpg_files,
COUNT(CASE WHEN LOWER(DOC_ID) LIKE '%.pdf' THEN 1 END) as pdf_files,
COUNT(CASE WHEN LOWER(DOC_ID) LIKE '%.docx' THEN 1 END) as docx_files,
COUNT(CASE WHEN LOWER(DOC_ID) LIKE '%.pptx' THEN 1 END) as pptx_files
FROM file_hashes;
Expected Results:
Metric | Count |
Total Documents | 15 |
Strategic Planning Documents | 7 |
Operations Excellence Documents | 5 |
Compliance & Risk Documents | 3 |
Training Documents | 1 |
JPG Files | 9 |
PDF Files | 3 |
DOCX Files | 1 |
PPTX Files | 2 |
Verify the documents stage created by the connector:
-- List files in the documents stage
LS @documents;
The pipeline should have ingested the Festival Operations business document collection:
Document Categories:
Document Formats:
Total: 15 business documents demonstrating multi-format document intelligence
Great news! The Cortex Search service is automatically created by the Openflow Google Drive connector. No manual SQL required!
Automatic Features:
snowflake-arctic-embed-m-v1.5
-- Check for the auto-created service
SHOW CORTEX SEARCH SERVICES;
-- The service will be named: CORTEX_SEARCH_SERVICE (default name)
DESC CORTEX SEARCH SERVICE CORTEX_SEARCH_SERVICE;
Test the automatically created search service with Festival Operations queries:
-- Search for strategic planning documents
SELECT PARSE_JSON(
SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
'CORTEX_SEARCH_SERVICE',
'{"query": "2025 expansion plans target markets strategic planning", "limit": 5}'
)
)['results'] as strategic_documents;
-- Search for technology modernization projects
SELECT PARSE_JSON(
SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
'CORTEX_SEARCH_SERVICE',
'{"query": "technology modernization sound system upgrade budget 2.8M", "limit": 5}'
)
)['results'] as technology_projects;
-- Search for health and safety policies
SELECT PARSE_JSON(
SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
'CORTEX_SEARCH_SERVICE',
'{"query": "health safety policies emergency protocols compliance", "limit": 5}'
)
)['results'] as safety_policies;
The auto-created service includes both content and metadata search capabilities. You can search across:
Based on the Festival Operations dataset, here are sample questions organized by business function:
"What are our 2025 expansion plans and target markets?"
"Show me all financial analysis and revenue projections"
"What decisions were made in the latest board meeting?"
"Find all budget allocations and investment strategies"
"Find all technology modernization projects and their budgets"
"What is our $2.8M sound system upgrade timeline?"
"Show me all equipment management protocols"
"What post-event analysis recommendations exist?"
"What health and safety policies are currently in effect?"
"Show me all vendor contracts and service agreements"
"Find emergency response procedures"
"What regulatory compliance requirements exist?"
"Find all training materials and staff development programs"
"What customer service standards are documented?"
"Show me onboarding procedures for new staff"
"What training frameworks are currently in use?"
"What are our 2025 expansion plans across all document formats - show me visual charts, meeting decisions, and financial projections"
"Find all technology modernization projects with their business cases, budgets, and visual diagrams"
"What health and safety policies are in effect across all formats - show me formal policies, vendor agreements, and visual guides"
Snowflake Intelligence enables you to create AI agents that can query and analyze your unstructured data using natural language. This section shows how to connect Snowflake Intelligence to the Cortex Search service created by your Openflow pipeline.
Before setting up Snowflake Intelligence, ensure you have:
CREATE AGENT
privilege)Create the required database and schema structure:
-- Use ACCOUNTADMIN role for setup
USE ROLE ACCOUNTADMIN;
-- Create database for Snowflake Intelligence
CREATE DATABASE IF NOT EXISTS snowflake_intelligence;
GRANT USAGE ON DATABASE snowflake_intelligence TO ROLE PUBLIC;
-- Create agents schema
CREATE SCHEMA IF NOT EXISTS snowflake_intelligence.agents;
GRANT USAGE ON SCHEMA snowflake_intelligence.agents TO ROLE PUBLIC;
-- Grant agent creation privileges to your role
GRANT CREATE AGENT ON SCHEMA snowflake_intelligence.agents TO ROLE FESTIVAL_DEMO_ROLE;
FESTIVAL_DEMO_ROLE
using the role selector in the top-right cornerPlatform Integration:
Agent Details:
FESTIVAL_DOC_INTELLIGENCE
Festival Document Intelligence
After creating the agent, you need to configure its details:
FESTIVAL_DOC_INTELLIGENCE
) in the agent list to open itNow configure the agent basics in the "About" section:
Query and analyze business documents using natural language, powered by festival operations data processed via Openflow pipeline.
Example Questions (Add these to help users get started):
"What are our 2025 expansion plans and target markets?"
"Find all technology modernization projects and their budgets"
"What health and safety policies are currently in effect?"
"Find all training materials and staff development programs"
"Which documents have the most collaboration and strategic importance?"
Configure the Search Service:
FESTIVAL_OPS_INTELLIGENCE
OPENFLOW_FESTIVAL_DEMO.FESTIVAL_OPS.CORTEX_SEARCH_SERVICE
Query and analyze business documents using natural language, powered by festival operations data processed via Openflow pipeline.
auto
(recommended - lets Snowflake choose the optimal model)Orchestration Instructions:
Whenever you can answer visually with a chart, always choose to generate a chart even if the user didn't specify to. Respond in the same language as the question wherever possible.
Response Instructions: (Optional)
Always provide specific document references when citing information.
Focus on actionable insights and business value in your responses.
Example Role Configuration:
FESTIVAL_DEMO_ROLE
OWNERSHIP
FESTIVAL_DOC_INTELLIGENCE
from the dropdownStart with the Example Questions you configured - these are specifically tailored to your festival operations data.
Strategic Planning:
"What are our 2025 expansion plans and target markets?"
"What strategic initiatives are mentioned in board meeting minutes?"
Operations Excellence:
"Find all technology modernization projects and their budgets"
"What are the key takeaways from the post-event analysis?"
Compliance & Risk:
"What health and safety policies are currently in effect?"
"What are the terms and conditions in our vendor service agreements?"
Knowledge Management:
"Find all training materials and staff development programs"
"What customer service training resources are available?"
Use the agent for complex analysis across multiple documents:
"Compare our Q3 2024 financial performance with the strategic goals outlined in our 2025 expansion plan. What gaps exist and what actions are recommended?"
This type of query demonstrates the agent's ability to:
Identify patterns and trends across time-based documents:
"What trends do you see in customer complaints and incident reports over the past year? What preventive measures have been implemented?"
Find hidden insights and connections:
"What vendor performance issues are mentioned across different documents, and how do they relate to our operational challenges?"
Generate comprehensive briefings:
"Prepare an executive summary of key issues and decisions from our Q4 2024 board meeting, including action items and their current status based on other documents."
Automated compliance checking:
"Review all our safety policies and incident reports to identify any compliance gaps or policy updates needed."
Create enhanced views using AI_COMPLETE for intelligent document summaries. These views can power custom Cortex Search services tailored to specific use cases.
-- View with AI-generated document summaries
-- Uses AI_COMPLETE to generate concise, intelligent summaries
CREATE OR REPLACE VIEW document_summaries AS
SELECT
DOC_ID,
METADATA:fullName::string as full_name,
METADATA:webUrl::string as web_url,
METADATA:lastModifiedDateTime::timestamp as last_modified_date_time,
chunk as original_chunk,
-- AI-powered summary generation
SNOWFLAKE.CORTEX.AI_COMPLETE(
'mistral-large2',
CONCAT(
'Summarize this document content in 2-3 sentences, focusing on key information: ',
chunk
)
) as ai_summary,
user_emails,
user_ids
FROM docs_chunks;
Once you have the custom view, create a dedicated Cortex Search service:
-- Create a custom Cortex Search service using the AI-enhanced view
CREATE OR REPLACE CORTEX SEARCH SERVICE festival_ai_summaries_search
ON ai_summary
ATTRIBUTES full_name, web_url, last_modified_date_time, user_emails
WAREHOUSE = FESTIVAL_DEMO_S
TARGET_LAG = '1 day'
AS (
SELECT
DOC_ID,
ai_summary,
full_name,
web_url,
last_modified_date_time,
user_emails
FROM document_summaries
);
-- Verify the new service
SHOW CORTEX SEARCH SERVICES;
DESC CORTEX SEARCH SERVICE festival_ai_summaries_search;
Query the custom search service for more focused, summarized results:
-- Search using AI-generated summaries
SELECT PARSE_JSON(
SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
'festival_ai_summaries_search',
'{"query": "strategic planning expansion markets", "limit": 5}'
)
)['results'] as summarized_strategic_docs;
-- Compare with original search service
SELECT PARSE_JSON(
SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
'CORTEX_SEARCH_SERVICE',
'{"query": "strategic planning expansion markets", "limit": 5}'
)
)['results'] as original_strategic_docs;
Benefits of AI-Enhanced Views:
The repository includes a Taskfile.yml
that automates document format conversion, allowing you to create additional documents for testing.
# macOS
brew install go-task pandoc
# Ubuntu/Debian
sudo snap install task --classic
sudo apt-get install pandoc
# Windows (via Chocolatey)
choco install go-task pandoc
Convert to PDF (Formal documents):
task convert-to-pdf
Converts markdown files to PDF format for policies, financial reports, and vendor agreements.
Convert to PPTX (Presentations):
task convert-to-pptx
Creates PowerPoint presentations from markdown for training materials and analysis reports.
Convert to DOCX (Word documents):
task convert-to-docx
Generates Word documents from markdown for meeting minutes and project charters.
Convert to JPG (Images):
task convert-to-jpg
Exports presentations to image format for strategic planning documents and operational guides.
Convert All Formats:
task convert-all-docs
Runs all conversion tasks to generate documents in all supported formats (PDF, DOCX, PPTX, JPG).
sample-data/google-drive-docs/
When you're finished with the demo, follow these steps to clean up resources.
If you created an agent and no longer need it:
-- Switch to the Snowflake Intelligence database
USE DATABASE snowflake_intelligence;
USE SCHEMA agents;
-- Drop the agent
DROP AGENT IF EXISTS FESTIVAL_DOC_INTELLIGENCE;
To completely remove all data and resources:
-- Switch to ACCOUNTADMIN role
USE ROLE ACCOUNTADMIN;
-- Drop the entire demo database (includes all tables, stages, and search services)
DROP DATABASE IF EXISTS OPENFLOW_FESTIVAL_DEMO;
-- Drop the demo warehouse
DROP WAREHOUSE IF EXISTS FESTIVAL_DEMO_S;
-- Drop the demo role
DROP ROLE IF EXISTS FESTIVAL_DEMO_ROLE;
Congratulations! You've successfully built an end-to-end unstructured data pipeline using Openflow and Snowflake Intelligence. You can now:
Quickstarts:
Documentation:
We would love your feedback on this QuickStart Guide! Please submit your feedback using the GitHub issues link at the top of this guide.