In this quickstart, we will review how to build a Streamlit App within Snowflake that allows you to easily analyze years of NVIDIA 10K filings using Jamba-Instruct in Cortex. With Jamba's 256K context window, there is no requirement to build a RAG pipeline that chunks the filings into segments that fit into a smaller context window.

What is Streamlit?

Streamlit enables Snowflake users to combine Streamlit's component-rich, open-source Python library with the scale, performance, and security of the Snowflake platform. https://www.snowflake.com/en/data-cloud/overview/streamlit-in-snowflake/

What is Snowflake Cortex?

Snowflake Cortex AI provides access to top-tier LLMs within your Snowflake environment. Build GenAI applications with fully managed LLMs and chat with your data services. https://www.snowflake.com/en/data-cloud/cortex/

What is Jamba-Instruct?

AI21's Jamba-Instruct is the world's first-ever commercial LLM to successfully leverage SSM-Transformer architecture. This hybrid approach delivers leading quality, performance, and cost, offering the best value per cost across an impressive 256k context window. https://www.ai21.com/jamba

Prerequisites

What You'll Learn

What You'll Build

This Streamlit application within Snowflake illustrates Jamba-Instruct's text analysis capabilities using Cybersyn's SEC Filings database. While 10K filings serve as our example, Jamba-Instruct on Cortex can be applied to various text-intensive scenarios such as processing internal documentation, analyzing customer feedback, or tackling any large-scale text analysis task specific to your industry.

Step 1: Accessing the data in Snowflake Marketplace

Get Cybersyn's SEC Filings Data

After logging into your Snowflake account, access Cybersyn's SEC Filings in the Marketplace.

Accessing_the_data_in_Snowflake_Marketplace

Step 2: Create a database and schema where the Streamlit app will run

Step 3: Create the Streamlit app in Snowflake

Step 4: Copy the Streamlit app code from GitHub

Streamlit GitHub Code

Step 5: Run the Jamba Streamlit Application

You've successfully built the Jamba-Instruct 10K Decoder Streamlit app in Snowflake!

AI21's Jamba-Instruct is a powerful language model designed with a 256k context window to handle large volumes of text, up to ~800 pages, making it ideal for tasks such as summarizing lengthy documents, analyzing call transcripts, and extracting information from extensive data.

Another major advantage of a long context window is its capacity to make advanced techniques more accessible, often eliminating the need for complex configurations. While Retrieval-Augmented Generation (RAG) is frequently employed to anchor language models with a company's curated data, Jamba-Instruct's long context window allows for achieving accurate results with or without RAG, simplifying the architecture and enhancing performance across various applications.

10K-Decoder Usage

Cybersyn's SEC_FILINGS database is used for this application. This dataset provides real-world SEC filing documents for many companies, illustrating Jamba-Instruct's effectiveness in processing large volumes of text in a single prompt. The last few years of NVIDIA 10K filings are used for this demonstration. There are over 200k tokens in this text, and using Jamba-Instruct, you can fit all this text (and your questions about it) into a single prompt!

10K-Decoder_Usage

10K-Decoder_Usage (part_2)

Jamba_Response

Prompting Tips

Congratulations! You've successfully built a GenAI Streamlit app using Jamba-Instruct on Snowflake Cortex AI. With this fully managed approach, GenAI can be harnessed without any data ever needing to leave Snowflake's secure walls.

Jamba's 256k context window delivers superior performance across key long context use cases:

What You Learned

Related Resources