Getting started with Snowflake

Stambia Data Integration allows to work with Snowflake Databases, offering the possibility to produce fully customized Integration Processes.

In this article, you'll learn to reverse your first Metadata, optionally configure it to use external storages such as Microsoft Azure Storage or Amazon S3 and to produce Mappings or Processes.

The idea in our examples will be to load data from the demo HSQL Database, and from delimited files, to a Snowflake Database.

Refer to the presentation article for the download section and the list of supported features.

Prerequisites:

Generic Templates

Snowflake Connector

Optionally Azure Connector

Optionally Amazon Connector

Installation

The installation procedure is the following:

Download and install Generic Templates in your workspace
Download and install Snowflake Connector
Optionally install the Azure Connector
Optionally install the Amazon Connector
Restart the Designer and Runtime

Metadata Configuration

Snowflake Database Metadata

The first step is to create the Snowflake Database Metadata.

For this, simply create a new Metadata for Snowflake as you would usually do for any other database:

metadata creation

Here is an example of configuration:

metadata wizard

That's all, you can now connect and reverse your schemas and tables as usual.

If the wizard informs you that the JDBC driver class cannot be loaded, you must click on the "Add/Edit Drivers", find or create Snowflake Driver entry in the new window displayed and configure it as explained in this article.

The Snowflake Driver Class is "net.snowflake.client.jdbc.SnowflakeDriver".

Here is an example of reversed Metadata:

metadata example

About Internal and External Storage

For optimizing data loading into Snowflake Stambia Templates goes through internal or external storage to temporarily store data as file before loading it.

The idea is to first extract source data to be loaded into Snowflake in file, store it on a storage Snowflake can access, and finally load data into Snowflake from the storage.

This also work for loading flat files into Snowflake.

Actually, the Templates support the following storage providers:

Internal Snowflake Stage
External Storage
- Microsoft Azure Storage
- Amazon S3

Defining Storage in Metadata

Storage information are defined in Metadata, which allows to specify the default storage type to be used when working with this Metadata, and also necessary information about the external storage which should be used.

On Metadata server node, go into 'Storage' tab and define here the 'Default Storage Type' and 'External Storage' attributes:

metadata storage type default

Below is some information about those attributes:

Attribute	Description
Storage Method	Storage Method used by default in Mappings loading data into Snowflake. internalStage: temporary files will be stored inside a Snowflake internal stage. externalStorage: temporary files will be stored in an external storage system which must be defined using 'External Storage' attribute. Note that this will be default method for all schemas and tables of the Metadata.
External Storage	Metadata link of the external storage container to be used when using external storage. Drag and drop here a Microsoft Azure Storage Container or Amazon S3 Folder. This is not necessary if you only plan to use internal stage storage type. Note that this will be the default for all the schemas and tables of the Metadata.
Compression Type	Defines the compression type used when loading files (source files or temporarly generated files) into this database. It can be overridden on schemas and tables if needed.
Compression Strategy	Defines how compression behavior when loading files (source files or temporarly generated files) into this database. It can be overridden on schemas and tables if needed. - compress at execution: use this option when you want Stambia to compress files which are sent to Snowflake during flows automatically during execution. Template will use correct Snowflake options to specify compression type, compress the files using a Gzip Process Action, and then send compressed file to Snowflake. - delegate compression to driver: use this option when you want the compression of files which are sent to Snowflake during flows to be performed automatically by the Snowflake driver, using the 'AUTO_COMPRESS' option supported on the 'PUT' statement. Note that this is supported only when using Snowflake internal stage (see 'Storage Method' parameter') - assume already compressed: use this option when source files have already been compressed before the execution of Mappings. Templates will use the correct Snowflake options to specify compression type and simply send them without compressing them as they are already compressed.

Those attributes can be overridden per schema and per table easily: for this simply go on the desired schema or table node, then go inside the 'Storage' tab where you'll find the same attributes, which will override the value set on the parent nodes.

The storage method and compression behavior can also be overridden directly in Mappings through parameters on Stambia's Snowflake Load Templates, allowing to manually define a different behavior than specified in Metadata for each Mapping.

'External Storage' is not linked to 'Default Storage Type'. You can decide to use 'internalStage' storage type by default but provide an External Storage link for specific cases where you'll override the default storage type. (For instance, forcing externalStorage on a specific Table or in a specific Mapping.)