Lab Scenario

In this lab, you will use Fivetran to ingest mock SAP data into Snowflake, and use Snowflake Marketplace to install an app that will let you quickly identify duplicate payments! Lets get started.

What You'll Need For the Lab

What You'll Learn

Login to your Snowflake trial account.

If this is your 1st time logging into Snowflake, you can get a detailed UI walkthrough here.

  1. Navigate to Admin → Partner Connect, then click on the "Fivetran" tile. If you dont see the Admin button on the left panel, make sure you are assuming the ACCOUNTADMIN role on top left:

Partner Connect

  1. You should see a window pop up that says Connect to Fivetran. Click the blue Connect at the bottom right:

Connect to Fivetran

  1. Once the partner account has been created, Click the blue Activate button:

Connect to Fivetran

  1. Congrats! Just like that, you have launched a trial Fivetran account through Snowflake, pre-configured to talk to your Snowflake account as a target.

Fivetran supports syncing data from file systems, applications, event systems, and databases. For SAP Sources, we support connecting to many different SAP sources. Our newest source is our SAP HANA ERP Source. This fully managed SaaS connector is the simplest, fastest way to get started extracting SAP data.

As we do not have access to an SAP HANA instance, for this lab we have replicated the SAP tables and some mock data to a PostgreSQL instance. We will use Fivetran's PostgreSQL connector for this lab and load it to your Snowflake account.

  1. Go to Fivetran and click on "Connectors" in the left gray navigation pane and then click the blue "Add Connector" button in the upper right corner:

Add Connector

  1. As this trial with Fivetran was created via the Snowflake Partner Connect, your destination will default SnowflakePC:

DestinationSnowflake

  1. Let's check how many connectors Fivetran has available "out of the box". Click on "All Sources" underneath your Selected Destination:

BrowseAllSources

  1. To get an idea of the range of connectors that Fivetran has for various sources, type oracle into the search box and you will be presented with a range of Oracle source connectors:

OracleConnectors

  1. Do the same thing for SAP as a source. Type sap into the search box and take a quick look at the SAP connectors that are available:

SAPConnectors

  1. Getting back to today's lab, type postgres into the search box and hover over the Postgres connector that is named "Google Cloud PostgreSQL", and click the "Setup" blue button at the right side of the selection:

PostgresConnectors

  1. As with all Fivetran connectors you are presented with a detailed setup guide in the right gray navigation pane. You can name the Destination Schema (it does not have to be created ahead of time in the destination), and for this lab, let's use lastname_firstname_sap_ntt like this: kelly_tony_sap_ntt:

PostgresForm

  1. To quickly connect the Google Cloud PostgreSQL source database system to Snowflake, here's the rest of the information that you'll need to provide - be sure and use the credentials for the column corresponding to the first letter of your last name so that everyone is not accessing the same source Google Cloud PostgreSQL database - go ahead and copy and paste these fields individually into the connector setup form in Fivetran:
    • Destination schema prefix: lastname_firstname_sap_ntt
    • Host IP Address: Select from the box below:

      Host IP Addresses to use

      Use the credentials for the column corresponding to the first letter of your last name

      Last Name begins with

      Host IP Address

      A-C

      34.94.122.157

      D-F

      34.94.151.28

      G-I

      34.94.33.194

      J-L

      34.102.51.15

      M-O

      35.236.3.13

      P-R

      34.94.1.83

      S-U

      34.94.165.240

      V-Y

      34.94.107.163

    • Port: 5432
    • User: fivetran
    • Password: 2PcnxqFrHh64WKbfsYDU
    • Database: sap_accounts_payable
    • Connection Method: Connect directly
    • Update Method: Detect Changes via XMIN
    NOTE: The eight PostgreSQL instances above will be available until the end of the day the lab is run. Seven of the eight instances will be shut down after the lab. If you wish to access the lab at any time during the remainder of the 14 day Fivetran trial, please use the first instance only (34.94.122.157). This instance will not be available after the 14 day Fivetran trial expires.
  2. Click the "Save & Test" blue button after you have completed the form:

SaveForm

  1. Fivetran will begin running connection tests to the Google Cloud PostgreSQL source database:

ConnectorTests

  1. You'll be presented with a request to "Confirm TLS Certificate" - here's why:Fivetran uses Transport Layer Security (TLS) to secure data in motion. It allows for both encryption and authentication of the connections that Fivetran makes to the PostgreSQL data source.Some additional detail:Security: Fivetran follows industry standard practices to encrypt data in transit (TLS) and at rest (AES256) and provides robust security, privacy, and governance from source to destination, and is in compliance with GDPR HIPAA, ISO, PCI, and SOC 2.
    • AES256 encryption throughout the entire data replication pipeline, using industry best practices including a wallet and unique encryption certificates
    • Support for OS-level and LDAP authentication, as well as custom plugin-based authentication
    • 2-Step verification support with unique certificates, avoiding a so-called man-in-the-middle attack
    • Pre-defined authorizations with full rights to make modifications, execute-only privileges, and read-only access
    • Auditing/logging of all operations that may affect data replication
    Click the top layer certificate and then click the blue "Confirm" button:

ConfirmCertificate

  1. Fivetran then finishes running the connection tests and once you see the green box stating that "All connection tests have passed" click "Continue":

FinishTests

  1. Fivetran will then connect to and fetch the database source schema, tables, and columns that you have access to in the PostgreSQL database:

SchemaFetch

  1. You will then see that you can "Select Data to Sync". For this SAP database, we have a schema available to us named sap_accounts_payable_sapabap1. We are going to sync all 12 tables for this lab.If we had multiple schemas and wanted to block those from moving into Snowflake, we could do that here. Similarly, we could do the same with any tables in any schemas.Fivetran's data-blocking feature allows you to exclude selected source schemas, tables, or columns from being loaded to your destination.

SelectTables

  1. Click the checkbox to the left of the "12 tables" to select all tables for syncing

Unhashed

SaveDataToSync

  1. Fivetran then needs to know how you want to handle incremental changes to the PostgreSQL source data since the schema may change in the future. For the lab, keep the "Allow all" selected (allows all new schemas, tables, and columns to sync) and just click on the "Continue" button.

HandleSchema

  1. You are now ready to start syncing your PostgreSQL data to Snowflake! Click on the "Start/Resume Initial Sync" button.

StartInitialSync

  1. You'll see the green "Active" indicator and you'll be notified that this is the "First time syncing data for this connector". This is your initial sync or historical sync for that dataset.

ActiveDashboard

Let's leave the connector syncing the initial historical sync - we'll check back in here in a few minutes. For now, let's move on to Section 4.

In this section, we will further explore the information that is available in the Fivetran UI associated with the connector that you just created.

  1. Click on "Logs" to see what Fivetran is doing in the background as a pure SaaS service to build, test, deploy, automate, and maintain the PostgreSQL to Snowflake connector.

ViewLogs

  1. Click on "Schema" to see the schemas, tables, and columns that were selected from the PostgreSQL source. You have the option to select or deselect tables and columns here and block them in future syncs. Just like when we first set up this connector, if you hover over a column you'll see that you can hash individual columns if additional data privacy is required. I can also update the table sync setting mode here if desired. Fivetran's default is "soft delete" and you have the option for "history mode" as well. Do NOT change anything on the schema page for this lab.

ViewSchema

  1. Click on "Usage" and you'll see the Usage Details for your PostgreSQL to Snowflake connector. Once we complete the initial sync and perform incremental syncs for 7 days, then usage stats will be available.On this tab, you can:
    • View paid and free MAR: The Incremental MAR bar chart displays your paid and free MAR for each day of the selected month so that you can see trends in your consumption.
    • Track daily MAR for the connector and its tables: Hover over each bar to see your overall daily MAR and the daily MAR for your top three tables.
    • View past usage data: By default, the Usage tab shows data from the current month. To see past usage data, use the drop-down menu at the top of the Usage tab to select the month you'd like to view.
    • Track MAR consumption per table.
    • Analyze the percent change in MAR from the previous month.

NOTE: We calculate the percent change by comparing this month's MAR to date to the MAR from the equivalent period last month (for example, we compare May 1-10 to April 1-10).

ViewUsage

  1. Click on "Setup" and you'll see the Connection Details along with the default settings for your PostgreSQL connector. Notice that the Incremental Sync Frequency drop-down box allows you to select an incremental sync (CDC) frequency from 5 minutes to 24 hours - Fivetran sets up CDC automatically for each connector. Fivetran's default incremental sync frequency for this connector is 6 hours so your next sync will run in 6 hours and Fivetran will sync any changes that have taken place in your PostgreSQL database.

Now that you have seen the Fivetran UI / dashboard for this connector, let's check back to see how the initial historical sync is progressing. Click back to the status dashboard, change the view to "1 hour" then hover over the green graph to view statistics on the volume of data and time to sync:

ViewStatistics

Next, let's switch to Snowflake to preview the data that has synced. First, select "Databases" on the menu on the left and then expand your database name and the tables:

ViewDatabases

After expanding the list of tables, select the BSAD table and then click "Data Preview" to view the data in the table:

ViewBSAD

As mentioned, during this section we used a replicated SAP data set from a PostgreSQL database. If you would like to try our new SAP HANA ERP Connector, please search for SAP and use this connector:

SAPConnectors

Follow this link if you would like to view a quick recording of this SAP connector in action.

This concludes the Fivetran data ingestion portion of the Quickstart Lab. When you are ready, click on the "Next" button to move on to Section 5 - Install NTT Data's Native App from Snowflake Marketplace

  1. Back in your Snowflake account, as an ACCOUNTADMIN role, click on the Marketplace tab to view the list of native apps available on snowflake marketplace.

NTTMarketplace

  1. Here search for "Account Payable Analytics" from the search bar and you should be able to see the "Account Payable Analytics" APP by NTT DATA:

NTTAPSearch

  1. You will be presented with the app's details. Read through the APP information and Click on ‘Get':

NTTGetApp

  1. Provide an Application Name (the default name being "Account_Payable_Analytics"), and a warehouse. Click on the blue ‘Get' button:

NTTGetAppConfirm

NTTGetAppEmail

  1. Once installed an email will be sent to your registered email id. Do not launch the APP yet. The APP needs a few permissions before we can start using it.
  2. Copy-Paste the below SQL commands into an SQL worksheet. Replace:
    • <APP_NAME> with the Application Name you provided in Step #4 above (the default name being "Account_Payable_Analytics")
    • <AP_Database> with the database name where AP data got loaded. In this lab, we ingested the data via fivetran into the PC_FIVETRAN_DB database
    • <AP_Schema> with the schema name where the AP data got loaded. Depending on the destination prefix you assigned in Fivetran (Section 3::Step 7), the schema name will vary for you.
    • Execute all statements (select all, ctrl/cmd+enter)
GRANT IMPORTED PRIVILEGES ON DATABASE snowflake TO APPLICATION <APP_NAME>;
GRANT APPLICATION ROLE SAP_APP to ROLE ACCOUNTADMIN;
GRANT EXECUTE TASK ON ACCOUNT TO APPLICATION  <APP_NAME>;
GRANT EXECUTE MANAGED TASK ON ACCOUNT TO APPLICATION <APP_NAME>; 
GRANT USAGE ON DATABASE <AP_Database> TO APPLICATION <APP_NAME>;
GRANT USAGE ON SCHEMA <AP_Database>.<AP_Schema> TO APPLICATION <APP_NAME>;
GRANT SELECT ON ALL TABLES IN SCHEMA <AP_Database>.<AP_Schema> TO APPLICATION <APP_NAME>;
GRANT SELECT ON ALL VIEWS IN SCHEMA <AP_Database>.<AP_Schema> TO APPLICATION <APP_NAME>;
  1. Once installed, launch the app by going in the Apps tab on the left panel, then clicking on the app name:

NTTClickApp

  1. When you launch the APP for the very first time, you should see a message like below.The APP runs a task in the background to fetch database and table details from the Snowflake views for faster processing

NTTAppRunning

  1. When you refresh the page after a couple of seconds, you will be presented with a page as below. Here you select the ‘Audit Start Date' as the start of the Audit Period. Audit period is the date for how far back in time should you be looking at the data to find the potential duplicate payment records. Typically these are 3 years, but you can choose a value to suit your use case.
  2. Next you choose the database and its corresponding schema where the SAP Accounts Payable data is located. Chose the database (PC_FIVETRAN_DB) and schema we created above and hit "Fetch Tables":

NTTAuditStart

  1. This will automatically populate all the tables based on their description as below. Depending on your environment, you may be using different table names. You can choose the corresponding tables from the drop down, based on the description. Please note that the table column names and data types should match what has been published in the readme file. If not, you can create views on top of the data to match the expected column names and data types:

NTTChooseTables

  1. Once the table names are selected, hit the ‘Submit' button. This sends the data for processing. You should see a confirmation message as follows:

NTTPotentialDups

  1. Once the processing is complete, you should see a confirmation message as follows:

NTTConfMsg

  1. Click on the ‘Potential Duplicates' Tab to view the duplicate claims and related metrics.
  2. Here you will first see the metrics related to total number of potential duplicate claims, the potential amount which can be recovered once these claims are processed and both of these metrics as a percentage of total number of records processed as follows:

NTTMetrics

  1. We also see a quick distribution of Vendors by their total claim amounts. This helps audit teams to identify vendors who have a higher percentage of potential claims streamline the process for these vendors to avoid potential duplicates in future:

NTTGraph

  1. We can also see distribution of claims by Invoice date which can help Audit teams identify the possible cause of these erroneous transactions based on the date range:

NTTLineGraph

  1. Finally we see the actual potential claim details by Claim types here:

NTTClaims

  1. The audit teams can work with these potential claim details for further recovery processing
  2. All the catalog details can be refreshed at the frequency of your choosing to ensure optimal performance. Use CRON conventions for scheduling:

NTTCron

  1. It's recommended that you suspend the refresh task if you do not intend to keep changing your source database and schemas to save on snowflake credits.

The app showcases only a couple of permutations and combinations of ways to identify duplicate payments. There are at least a dozen more and we can help include them in the app depending on the P2P cycle that your organization runs.

Reach out to snowflakemarketplace@nttdata.com if you would like all the permutation and combinations implemented for your organization.

If you have any questions about any Fivetran connectors, please check out the Fivetran website - website.

Additional resources