Set up DataSync Agent to share to Snowflake

Krypton

You can configure your DataSync Agent to share data from ServiceNow to a Snowflake database by changing some additional configurations in your agent.xml file.

The DataSync Agent uses the Snowflake COPY command to bulk load data from the local file system where the Agent is running to optimize performance. Being that Snowflake is a cloud-based database, the COPY command is used to improve performance as standard SQL operations (i.e. INSERT, UPDATE, etc) on a record by record basis will have slow performance due to network latency.

Snowflake support is available as of Krypton 8.0.8 and newer
The following features are not currently supported and will be added in a future version:
- Altering tables to add new columns or increase column lengths as is supported with other databases
- Deleting records
- Table Compare
- Temporal

Prerequisites

First, you will need to Install the DataSync Agent.

You will also need to create a ServiceNow dynamic share/bulk share.

Make sure to stop running your DataSync Agent before making any Agent configuration changes.

Procedure

To set up your DataSync Agent to share application data to Snowflake, follow these steps:

Navigate to the directory where you saved your agent.xml file when installing your DataSync Agent.

Open your agent.xml file in a text editing application. Then, locate the <task> directive(s) within your <subscribe> directive, and update the following database-related directives:

Directive	Value to enter
<database_type>	snowflake
<database_server>	URL/IP address for your Snowflake database.
<database_user>	Username used to access your Snowflake database
<database_password>	Password used to access your Snowflake database
<database_parms>	To specify the schema, warehouse and role to access your Snowflake database schema=<SCHEMA_NAME>&warehouse=<WAREHOUSE_NAME>&role=<ROLE> For example: schema=PUBLIC&warehouse=COMPUTE_WH&role=ACCOUNTADMIN would be used if your schema name is PUBLIC, your warehouse name is COMPUTE_WH and your role is ACCOUNTADMIN
<database>	Name of your Snowflake database
<database_port>	(Optional) Port number to access your Snowflake database
<enable_snowflake>	To enable the DataSync Agent for Snowflake. By default, the Agent is not enabled for Snowflake because of the aforementioned performance limitations with standard SQL operations.

<files_directory>	To specify a local directory where the Agent is running and has access to so it can save its temporary CSV files. These files are used push to Snowflake staging tables in order to do the COPY command for bulk loading data from the local file system.
<file_max_size>	To specify how large the temporary files will be before they are pushed to Snowflake. To optimize performance, records should be accumulated in the temporary files before they are pushed so we're not continuously pushing files to Snowflake which can affect performance. NOTE: The size can be in KB, MB, or GB, i.e. 50KB, 250MB, 1GB. Make sure to have NO space in between the number and the unit. The minimum value for this directive is 25KB. If you input a value less than 25KB, the value will be set to 25KB. Conversely, the maximum value for this directive is 10GB. If you input a value greater than 10GB, the value will be set to 10GB.
<uppercase_columns>	(Optional) To specify if the columns should be saved in uppercase lettering in the Snowflake database. Since ServiceNow's columns are lowercase, this will convert them to uppercase for querying and saving content into Snowflake (i.e. sys_id becomes SYS_ID). It is recommended to enable this configuration since Snowflake by default uses uppercase lettering.
<flush_interval>	(Optional) To specify how often the DataSync Agent will flush its temporary files and push to Snowflake if they have not accumulated to the size specified in <file_max_size>. Values are entered as whole numbers to represent minutes, such as 10 to represent 10 minutes. The default value is 5 for 5 minutes if this configuration is not specified. This way, we don't wait too long to push changes if there are no more records to be shared. For example, if you share 100 records and your <file_max_size> configuration is 400KB and then 80 records fit within the 400KB, those 80 records will be pushed but the remaining 20 records will stay in the next temporary file until it reaches 400KB. This flush interval ensure they get pushed in case there isn't any more sharing done for now, in which case we wouldn't reach 400KB again for some time.
<delete_files_interval>	(Optional) To specify how old the temporary files should be before they are deleted. To ensure your temporary files don't take up too much space on the file system while still keeping them for any troubleshooting purposes, we only delete the files once they get older (last modified) beyond a certain day/time related to the current date and time. The value is used each time the Agent runs to flush its temporary files (using the above <flush_interval> configuration), checking this configuration for any temporary files older than the value specified here related to the current time. For example, if the value is 12H (for 12 hours), when the Agent runs its flush, it will check for any temporary files older than 12 hours from now and delete those files. NOTE: The value can be entered in hours (H) and days (D) i.e. 12H (for 12 hours), 3D (for 3 days). Make sure to have NO space in between the number and the unit. The default value is 24 hours if this configuration is not specified.

Your agent.xml file should look similar to the example shown below:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<config>
    <agent>
       	<share>
     	</share>  
     	<subscribe>
     	<task>
            <task_name>test_snowflake_subscribe</task_name>
            <message_connection password="encrypted:vlOtU71yu8N/EFIJH85SSBtaIt7qEEfvqiqft9VZyYE=" user="exampleuser">amqps://example.perspectium.net</message_connection>
            <instance_connection password="encrypted:vlOtU71yu8N/EFIJH85SSBtaIt7qEEfvqiqft9VZyYE=" user="example.user">https://myinstance.service-now.com</instance_connection>
            <handler>com.perspectium.replicator.sql.SQLSubscriber</handler>
            <decryption_key>The cow jumped over the moon</decryption_key>
            <database_type>snowflake</database_type>
            <database_server>servername.snowflakecomputing.com</database_server>
            <database_user>exampleuser</database_user>
            <database_password>examplepassword</database_password>
            <database_params>schema=PUBLIC&warehouse=COMPUTE_WH&role=ACCOUNTADMIN</database_params>
            <database>SAMPLE</database>
			
			<enable_snowflake/>
			<uppercase_columns/>
				
			<files_directory>/usr/local/datasyncagent/tempfiles</files_directory>
			<file_max_size>500KB</file_max_size>
			<flush_interval>3</flush_interval>				
			<delete_files_interval>12H</delete_files_interval>          
		</task>
      	</subscribe>
      	<max_reads_per_connect>4000</max_reads_per_connect>
      	<polling_interval>30</polling_interval>
   </agent>
</config>

Save the changes you made to your agent.xml file and close the file.

After configuring your agent.xml file to support replication to your IBM DB2 database, start running your DataSync Agent again.

Content

Space Tools

Prerequisites