Overview

Welcome to the Powered by Tasty Bytes - Zero to Snowflake Quickstart focused on Geospatial Analysis!

Within this Quickstart we will conduct in-depth Geospatial analysis leveraging powerful Snowflake functionality coupled with a SafeGraph POI listing from the Snowflake Marketplace.

Prerequisites

What You Will Learn

What You Will Build

Overview

Within this Quickstart we will follow a Tasty Bytes themed story via a Snowsight SQL Worksheet with this page serving as a side by side guide complete with additional commentary, images and documentation links.

This section will walk you through logging into Snowflake, Creating a New Worksheet, Renaming the Worksheet, Copying SQL from GitHub, and Pasting the SQL we will be leveraging within this Quickstart.

Step 1 - Accessing Snowflake via URL

Step 2 - Logging into Snowflake

Step 3 - Navigating to Worksheets

Step 4 - Creating a Worksheet

Step 5 - Renaming a Worksheet

Step 6 - Accessing Quickstart SQL in GitHub

Step 7 - Copying Setup SQL from GitHub

Step 8 - Pasting Setup SQL from GitHub into your Snowflake Worksheet

Step 9 - Click Next –>

Overview

Tasty Bytes operates Food Trucks in numerous cities and countries across the globe with each truck having the ability to choose two different selling locations per day.

One important item that our Executives are interested in is to learn more about how these locations relate to each other as well as if there are any locations we currently serve that are potentially too far away from hot selling city centers.

Unfortunately what we have seen so far is our first-party data does not give us the building blocks required to complete this sort of Geospatial analysis. Thankfully, the Snowflake Marketplace has great listings from SafeGraph that can assist us here.

Step 1 - Acquiring SafeGraph POI Data from the Snowflake Marketplace

To begin, acquire the SafeGraph listing by following the steps below within Snowsight:

Step 2 - Evaluating SafeGraph POI Data

Now that we have the tb_safegraph database live in our account, let's run the next three queries to set our Worksheet Role, Warehouse and Database context and see what data is available for Paris.

USE ROLE sysadmin;
USE WAREHOUSE tb_de_wh;
USE DATABASE tb_101;

SELECT 
    cpg.placekey,
    cpg.location_name,
    cpg.longitude,
    cpg.latitude,
    cpg.street_address,
    cpg.city,
    cpg.country,
    cpg.polygon_wkt
FROM tb_safegraph.public.frostbyte_tb_safegraph_s cpg
WHERE 1=1
    AND cpg.top_category = 'Museums, Historical Sites, and Similar Institutions'
    AND cpg.sub_category = 'Museums'
    AND cpg.city = 'Paris'
    AND cpg.country = 'France';

Amazing! These sort of POI metrics will be immensely valuable in conducting Geospatial analysis to assist our Executives in making data driven location decisions.

Let's get this harmonized with our Sales data in the next section.

Step 3 - Click Next –>

Overview

To make our Geospatial analysis seamless, let's make sure to get SafeGraph POI data included in the analytics.orders_v so all of our downstream users can also access it.

Step 1 - Enriching our Analytics View

Please execute this steps query which will enrich our analytics.orders_v by adding all available, new SafeGraph POI metrics.

Within this CREATE VIEW statement, you will see COMMENT, DATE and SELECT * EXCLUDE .. functionality leveraged.

To highlight what our EXCLUDE is doing we are using it to simplify the SQL required to SELECT all of the available SafeGraph columns but not the ones we already have available in the harmonized.orders_v.

This query will provide a View ORDERS_V successfully created. result.

CREATE OR REPLACE VIEW analytics.orders_v
COMMENT = 'Tasty Bytes Order Detail View'
    AS
SELECT 
    DATE(o.order_ts) AS date,
    o.* ,
    cpg.* EXCLUDE (location_id, region, phone_number, country)
FROM tb_101.harmonized.orders_v o
JOIN tb_safegraph.public.frostbyte_tb_safegraph_s cpg
    ON o.location_id = cpg.location_id;

Step 2 - Click Next –>

Overview

With Point of Interest metrics now readily available from the Snowflake Marketplace without any ETL required, let's start on our Geospatial analysis journey.

Step 1 - Creating a Geography Point

Latitude and Longitude are two building block POI metrics we now have access to thanks to SafeGraph.

Please execute the next two queries to re-assume to tb_data_engineer role and create our Geography Point leveraging the ST_MAKEPOINT/ST_POINT function.

USE ROLE tb_data_engineer;

SELECT TOP 10
    location_id,
    ST_MAKEPOINT(longitude, latitude) AS geo_point,
    SUM(price) AS total_sales_usd
FROM analytics.orders_v
WHERE primary_city = 'Paris'
GROUP BY location_id, latitude, longitude
ORDER BY total_sales_usd DESC;

The results we see in the geo_point column will be essential for everything else we do moving forward.

One thing to note here is despite typically hearing Latitude then Longitude this function requires these values to be input in the Longitude, Latitude order.

Step 2 - Click Next –>

Overview

Starting with our Geographic Point, we can now begin to dive into some of the powerful Geospatial functions Snowflake offers natively.

Step 1 - ST_DISTANCE

Let's first start with calculating the distances in Miles and Kilometers between those top selling locations we generated previously by executing our next query which leverages the ST_DISTANCE function.

Within this query we also leverage a Common Table Expression or CTE, QUALIFY and LAG.

WITH _top_10_locations AS
(
    SELECT TOP 10
        location_id,
        ST_MAKEPOINT(longitude, latitude) AS geo_point,
        SUM(price) AS total_sales_usd
    FROM analytics.orders_v
    WHERE primary_city = 'Paris'
    GROUP BY location_id, latitude, longitude
    ORDER BY total_sales_usd DESC
)
SELECT
    a.location_id,
    b.location_id,
    ROUND(ST_DISTANCE(a.geo_point, b.geo_point)/1609,2) AS geography_distance_miles,
    ROUND(ST_DISTANCE(a.geo_point, b.geo_point)/1000,2) AS geography_distance_kilometers
FROM _top_10_locations a
JOIN _top_10_locations b
    ON a.location_id <> b.location_id -- avoid calculating the distance between the point itself
QUALIFY a.location_id <> LAG(b.location_id) OVER (ORDER BY geography_distance_miles) -- avoid duplicate: a to b, b to a distances
ORDER BY geography_distance_miles;

Wow! Look at how far we have already came in so little time. Let's keep diving deeper into what Geospatial insights we can derive in our next section.

Step 2 - Click Next –>

Overview

Now that we understand how to create points, and calculate distance, we will now pile on a large set additional Snowflake Geospatial functionality to further our analysis.

Step 1 - Create a Geographic Collection of Points and build our Minimum Bounding Polygon

Please execute the next query which will take the building blocks from the previous section and now collect all of our Geography Points and build a Minimum Bounding Polygon from them.

Within this query we will see the usage of ST_NPOINTS, ST_COLLECT, ST_ENVELOPE and ST_AREA.

WITH _top_10_locations AS
(
    SELECT TOP 10
        location_id,
        ST_MAKEPOINT(longitude, latitude) AS geo_point,
        SUM(price) AS total_sales_usd
    FROM analytics.orders_v
    WHERE primary_city = 'Paris'
    GROUP BY location_id, latitude, longitude
    ORDER BY total_sales_usd DESC
)
SELECT
    ST_NPOINTS(ST_COLLECT( geo_point)) AS count_points_in_collection,
    ST_COLLECT(geo_point) AS collection_of_points,
    ST_ENVELOPE(collection_of_points) AS minimum_bounding_polygon,
    ROUND(ST_AREA(minimum_bounding_polygon)/1000000,2) AS area_in_sq_kilometers
FROM _top_10_locations;

Look at how much we accomplished within a single SQL query, but we can't stop yet, let's keep going to see what else we can accomplish.

Step 2 - Finding our Top Selling Locations Center Point

Now that we have collected geographic information about where our Trucks sell the most in Paris. Please execute the next query which will utilize ST_CENTROID to find the top selling center point.

WITH _top_10_locations AS
(
    SELECT TOP 10
        o.location_id,
        ST_MAKEPOINT(o.longitude, o.latitude) AS geo_point,
        SUM(o.price) AS total_sales_usd
    FROM analytics.orders_v o
    WHERE primary_city = 'Paris'
    GROUP BY o.location_id, o.latitude, o.longitude
    ORDER BY total_sales_usd DESC
)
SELECT
    ST_COLLECT(tl.geo_point) AS collect_points,
    ST_CENTROID(collect_points) AS geometric_center_point
FROM _top_10_locations tl;

Step 3 - Storing our Center Point in a Variable

So we can use this Center Point of our Top Selling locations in a future query, please execute the query below to SET our center_point variable equal to our geometric_center_point result from above.

SET center_point = '{ "coordinates": [ 2.323399542234561e+00, 4.885933767767676e+01 ], "type": "Point" }';

Once pasted in, please execute the query which will yield a Statement executed succesfully. message. Now we can use this center_point Variable to help conclude our Geospatial analysis.

Step 4 - Click Next –>

Overview

As mentioned earlier, our Tasty Bytes Executives are interested in seeing what locations we may want to stop having our Food Trucks visit in their weekly schedules. Thankfully, every step we have taken so far has now enabled us to deliver on this exact request.

Step 1 - Top 50 Locations furthest away from our Top Selling Center Point in Paris

Executing the next query we can identify which Paris locations we may want to take off our schedules. Within the query we will see a new function we haven't used yet, TO_GEOGRAPHY.

WITH _paris_locations AS
(
    SELECT DISTINCT
        location_id,
        location_name,
        ST_MAKEPOINT(longitude, latitude) AS geo_point
    FROM analytics.orders_v
    WHERE primary_city = 'Paris'
)
SELECT TOP 50
    location_id,
    location_name,
    ROUND(ST_DISTANCE(geo_point, TO_GEOGRAPHY($center_point))/1000,2) AS kilometer_from_top_selling_center
FROM _paris_locations
ORDER BY kilometer_from_top_selling_center DESC;

Fantastic work! We have now delivered on the exact as from our Executive Team through a simple process by leveraging SafeGraph from the Snowflake Marketplace.

Step 2 - Click Next –>

Overview

H3 is a way of dividing the Earth's surface into hexagonal shapes, organizing them into levels of resolution, and assigning unique codes to each hexagon for easy reference.

This system was created by Uber for tasks like mapping, location-based services, and market analysis. A lower resolution corresponds to larger hexagons covering broader areas, while a higher resolution means smaller hexagons representing more specific locations.

As Snowflake offers H3 functionality, the Tasty Bytes Leadership Team has tasked our Data Engineer with generating H3 Codes and finding the Top Selling Hexagons.

Step 1 - Locate H3 Codes in Integer and String form

The next query will locate H3 Codes for the Louvre in both Integer and String form at varying resolutions usings the H3_LATLNG_TO_CELL and H3_LATLNG_TO_CELL_STRING functions.

SELECT DISTINCT
    location_id,
    location_name,
    H3_LATLNG_TO_CELL(latitude, longitude, 4) AS h3_integer_resolution_4, 
    H3_LATLNG_TO_CELL_STRING(latitude, longitude, 4) AS h3_hex_resolution_4,
    H3_LATLNG_TO_CELL(latitude, longitude, 8) AS h3_integer_resolution_8, 
    H3_LATLNG_TO_CELL_STRING(latitude, longitude, 8) AS h3_hex_resolution_8,
    H3_LATLNG_TO_CELL(latitude, longitude, 12) AS h3_integer_resolution_12,
    H3_LATLNG_TO_CELL_STRING(latitude, longitude, 12) AS h3_hex_resolution_12
FROM analytics.orders_v
WHERE location_name = 'Musee du Louvre';

Step 2 - Parent and Children Cell ID's

Execute the next query to locate the Parent Cell Id and an array containing all of the associated Children Id's for the Louvre by leveraging H3_CELL_TO_PARENT, H3_INT_TO_STRING and H3_CELL_TO_CHILDREN.

SELECT DISTINCT
    location_id,
    location_name,
    H3_LATLNG_TO_CELL(latitude, longitude, 8) AS h3_int_resolution_8,
    H3_INT_TO_STRING(h3_int_resolution_8) AS h3_hex_resolution_8, -- convert above to hexadecimal format
    H3_CELL_TO_PARENT(h3_int_resolution_8, 5) AS h3_int_parent_resolution_5, -- locate resolution 5 parent of our H3 cell
    H3_CELL_TO_CHILDREN(h3_int_resolution_8, 10) AS h3_int_children_resolution_10 -- locate all children at resolution 10
FROM analytics.orders_v
WHERE location_name = 'Musee du Louvre';

Step 3 - Locations within Top Selling Hexagons

Using the H3 Cell IDs from the previous step, let's now produce a list of all locations within these Top Hexagons and the Hexagons Total Sales.

These results will help the business locate areas and locations that we should ensure are always occupied by our trucks.

WITH _top_50_locations AS
(
    SELECT TOP 50
        location_id,
        ARRAY_SIZE(ARRAY_UNIQUE_AGG(customer_id)) AS customer_loyalty_visitor_count,
        H3_LATLNG_TO_CELL(latitude, longitude, 7) AS h3_integer_resolution_6,
        H3_LATLNG_TO_CELL_STRING(latitude, longitude, 7) AS h3_hex_resolution_6,
        SUM(price) AS total_sales_usd
    FROM analytics.orders_v
    WHERE primary_city = 'Paris'
    GROUP BY ALL
    ORDER BY total_sales_usd DESC
)
SELECT
    h3_hex_resolution_6,
    COUNT(DISTINCT location_id) AS number_of_top_50_locations,
    SUM(customer_loyalty_visitor_count) AS customer_loyalty_visitor_count,
    SUM(total_sales_usd) AS total_sales_usd
FROM _top_50_locations
GROUP BY ALL
ORDER BY total_sales_usd DESC;

Step 4 - H3_GRID_DISTANCE

To conclude, in the next query we will see if our two Top Selling Resolution 6 Hexagons border each other using H3_GRID_DISTANCE(https://docs.snowflake.com/en/sql-reference/functions/h3_grid_distance).

SELECT H3_GRID_DISTANCE('871fb4671ffffff', '871fb4670ffffff') AS cell_distance;

Step 5 - Click Next –>

Conclusion

Fantastic work! You have successfully completed the Tasty Bytes - Zero to Snowflake - Geospatial Quickstart.

By doing so you have now:

If you would like to re-run this Quickstart please leverage the Reset scripts in the bottom of your associated Worksheet.

Next Steps

To continue your journey in the Snowflake AI Data Cloud, please now visit the link below to see all other Powered by Tasty Bytes - Quickstarts available to you.