In this quickstart, you'll learn how to build an end-to-end application that analyzes audio files for emotional tone and sentiment using Snowflake Notebooks on Container Runtime. The application combines audio processing, speech recognition, and sentiment analysis to create comprehensive insights from audio data.

What is Container Runtime?

Snowflake Notebooks on Container Runtime enable advanced data science and machine learning workflows directly within Snowflake. Powered by Snowpark Container Services, it provides a flexible environment to build and operationalize various workloads, especially those requiring Python packages from multiple sources and powerful compute resources, including CPUs and GPUs. With this Snowflake-native experience, you can process audio, perform speech recognition, and execute sentiment analysis while seamlessly running SQL queries. NOTE: This feature is currently in Public Preview.

Learn more about Container Runtime.

What is wav2vec2?

Wav2vec2 is a state-of-the-art framework for self-supervised learning of speech representations. Developed by Facebook AI, it's specifically designed for speech recognition tasks but has been adapted for various audio analysis tasks including emotion recognition. The model we use in this guide, "ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition", is fine-tuned for detecting emotions in speech, capable of identifying emotions like happiness, sadness, anger, and neutral tones from audio input.

Learn more about wav2vec2.

What is Snowflake Cortex?

Snowflake Cortex is a suite of AI features that use large language models (LLMs) to understand unstructured data, answer freeform questions, and provide intelligent assistance. In this guide, we use Cortex's sentiment analysis capabilities to analyze the emotional content of transcribed speech.

Learn more about Snowflake Cortex.

What is Whisper?

OpenAI's Whisper is an open-source automatic speech recognition (ASR) model designed for high-quality transcription and translation of spoken language. Trained on diverse multilingual data, it handles various languages, accents, and challenging audio conditions like background noise. Whisper supports transcription, language detection, and translation to English, making it versatile for applications such as subtitles, accessibility tools, and voice interfaces.

Learn more about Whisper.

What You'll Learn

What You'll Build

A full-stack application that enables users to:

Prerequisites

Step 1. In Snowsight, create a SQL Worksheet and open setup.sql to execute all statements in order from top to bottom.

Step 2. In Snowsight, switch your user role to AUDIO_CONTAINER_RUNTIME_ROLE.

Step 3. Click on sfguide_getting_started_with_audio_sentiment_analysis_using_snowflake_notebooksipynb to download the Notebook from GitHub. (NOTE: Do NOT right-click to download.)

Step 4. In Snowsight:

Step 5. Open Notebook

Supported Audio Formats

Audio Specifications

For best results, your audio files should have:

Best Practices

For optimal analysis:

Common Use Cases

This system works well for analyzing:

Here's the walkthrough of the notebook cells and their functions:

Cell 1: Package Installation

Cell 2: Environment Setup

Cell 3: Audio Processing Configuration

Key components:

Cell 4: Process Files

Example output:

                    File    Emotion  Emotion_Score                                          Transcript  Sentiment_Score Tone_Sentiment_Match
0  customer_call1.wav      happy         0.892    Thank you so much for your help today! You've...            0.8            Match
1  customer_call2.wav      angry         0.945    I've been waiting for hours and nobody has...             -0.7            Match
2  customer_call3.wav    neutral         0.756    I would like to inquire about the status...              0.1          Unknown
3  customer_call4.wav       happy         0.834    This service has exceeded my expectations...              0.9            Match

The notebook outputs results showing the file name, detected emotion, emotion confidence score, transcript, sentiment score, and whether the tone matches the sentiment analysis.

The analysis provides multiple metrics that work together to give a comprehensive view of the audio:

Sentiment Analysis

Emotional Classification

Tone-Sentiment Matching

The system compares emotional tone with sentiment scores to verify consistency:

The ‘Tone_Sentiment_Match' field in the results indicates:

Congratulations! You've successfully built an end-to-end audio analysis application in Snowflake that combines emotional tone detection, speech recognition, and sentiment analysis using Container Runtime for ML.

What You Learned

Related Resources

Webpages:

Documentation:

Sample Code & Guides:

Related Quickstarts: