In this quickstart, you'll learn how to build an end-to-end application that analyzes audio files for emotional tone and sentiment using Snowflake Notebooks on Container Runtime. The application combines audio processing, speech recognition, and sentiment analysis to create comprehensive insights from audio data.
Snowflake Notebooks on Container Runtime enable advanced data science and machine learning workflows directly within Snowflake. Powered by Snowpark Container Services, it provides a flexible environment to build and operationalize various workloads, especially those requiring Python packages from multiple sources and powerful compute resources, including CPUs and GPUs. With this Snowflake-native experience, you can process audio, perform speech recognition, and execute sentiment analysis while seamlessly running SQL queries. NOTE: This feature is currently in Public Preview.
Learn more about Container Runtime.
Wav2vec2 is a state-of-the-art framework for self-supervised learning of speech representations. Developed by Facebook AI, it's specifically designed for speech recognition tasks but has been adapted for various audio analysis tasks including emotion recognition. The model we use in this guide, "ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition", is fine-tuned for detecting emotions in speech, capable of identifying emotions like happiness, sadness, anger, and neutral tones from audio input.
Learn more about wav2vec2.
Snowflake Cortex is a suite of AI features that use large language models (LLMs) to understand unstructured data, answer freeform questions, and provide intelligent assistance. In this guide, we use Cortex's sentiment analysis capabilities to analyze the emotional content of transcribed speech.
Learn more about Snowflake Cortex.
OpenAI's Whisper is an open-source automatic speech recognition (ASR) model designed for high-quality transcription and translation of spoken language. Trained on diverse multilingual data, it handles various languages, accents, and challenging audio conditions like background noise. Whisper supports transcription, language detection, and translation to English, making it versatile for applications such as subtitles, accessibility tools, and voice interfaces.
Learn more about Whisper.
A full-stack application that enables users to:
Step 1. In Snowsight, create a SQL Worksheet and open setup.sql to execute all statements in order from top to bottom.
Step 2. In Snowsight, switch your user role to AUDIO_CONTAINER_RUNTIME_ROLE
.
Step 3. Click on sfguide_getting_started_with_audio_sentiment_analysis_using_snowflake_notebooksipynb to download the Notebook from GitHub. (NOTE: Do NOT right-click to download.)
Step 4. In Snowsight:
AUDIO_SENTIMENT_DB
and AUDIO_SCHEMA
AUDIO_WH_S
Run on container
GPU Runtime
GPU_POOL
Step 5. Open Notebook
Notebook settings
» External access
For best results, your audio files should have:
For optimal analysis:
This system works well for analyzing:
Here's the walkthrough of the notebook cells and their functions:
Cell 1: Package Installation
torch
: For deep learning and neural network operationslibrosa
: For loading and manipulating audio filestransformers
: For accessing pre-trained modelsCell 2: Environment Setup
Cell 3: Audio Processing Configuration
Key components:
ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition
whisper
Snowflake Cortex
Cell 4: Process Files
Example output:
File Emotion Emotion_Score Transcript Sentiment_Score Tone_Sentiment_Match
0 customer_call1.wav happy 0.892 Thank you so much for your help today! You've... 0.8 Match
1 customer_call2.wav angry 0.945 I've been waiting for hours and nobody has... -0.7 Match
2 customer_call3.wav neutral 0.756 I would like to inquire about the status... 0.1 Unknown
3 customer_call4.wav happy 0.834 This service has exceeded my expectations... 0.9 Match
The notebook outputs results showing the file name, detected emotion, emotion confidence score, transcript, sentiment score, and whether the tone matches the sentiment analysis.
The analysis provides multiple metrics that work together to give a comprehensive view of the audio:
The system compares emotional tone with sentiment scores to verify consistency:
The ‘Tone_Sentiment_Match' field in the results indicates:
Congratulations! You've successfully built an end-to-end audio analysis application in Snowflake that combines emotional tone detection, speech recognition, and sentiment analysis using Container Runtime for ML.
Webpages:
Documentation:
Sample Code & Guides:
Related Quickstarts: