By completing this guide, you will be able to go from raw data to build a machine learning model that can help to predict house prices.
Here is a summary of what you will be able to learn in each step by following this quickstart:
In case you are new to some of the technologies mentioned above, here's a quick summary with links to documentation.
The Snowpark API provides an intuitive library for querying and processing data at scale in Snowflake. Using a library for any of three languages, you can build applications that process data in Snowflake without moving data to the system where your application code runs, and process at scale as part of the elastic and serverless Snowflake engine.
Snowflake currently provides Snowpark libraries for three languages: Java, Python, and Scala.
Learn more about Snowpark.
It is one of the most popular open source machine learning libraries for Python that also happens to be pre-installed and available for developers to use in Snowpark for Python via Snowflake Anaconda channel. This means that you can use it in Snowpark for Python User-Defined Functions and Stored Procedures without having to manually install it and manage all of its dependencies.
This section covers cloning of the GitHub repository and creating a Python 3.10 environment.
environment.yml
and paste in the following config:name: snowpark_scikit_learn
channels:
- https://repo.anaconda.com/pkgs/snowflake/
- nodefaults
dependencies:
- python=3.10
- pip
- snowflake-snowpark-python==1.23.0
- snowflake-ml-python==1.6.4
- snowflake==1.0.0
- ipykernel
- matplotlib
- seaborn
conda env create -f environment.yml
conda activate snowpark_scikit_learn
pyarrow
related issuespyarrow
library already installed, uninstall it before installing Snowpark.pyarrow
installed, you do not need to install it yourself; installing Snowpark automatically installs the appropriate version.pyarrow
after installing Snowpark.The Notebook linked below covers the following data ingestion tasks.
To get started, follow these steps:
jupyter notebook
at the command line. (You may also use other tools and IDEs such Visual Studio Code.)The Notebook linked below covers the following data exploration tasks.
To get started, follow these steps:
jupyter notebook
at the command line. (You may also use other tools and IDEs such Visual Studio Code.)The Notebook linked below covers the following machine learning tasks.
To get started, follow these steps:
jupyter notebook
at the command line. (You may also use other tools and IDEs such Visual Studio Code.)Congratulations! You've successfully completed the lab using Snowpark for Python and scikit-learn.