You can configure your DataSync Agent to share data from ServiceNow to a Snowflake database by changing some additional configurations in your agent.xml file

The DataSync Agent uses the Snowflake COPY command to bulk load data from the local file system where the Agent is running to optimize performance. Being that Snowflake is a cloud-based database, the COPY command is used to improve performance as standard SQL operations (i.e. INSERT, UPDATE, etc) on a record by record basis will have slow performance due to network latency.

  • Snowflake support is available as of Krypton 8.0.8 and newer
  • The following features are not currently supported and will be added in future versions:
    • Altering tables to add new columns or increase column lengths as is supported with other databases
    • Agent plugins such as the IODatetime
    • Deleting records
    • Table Compare
    • Temporal


Prerequisites


(warning) First, you will need to Install the DataSync Agent.

(warning) You will also need to create a ServiceNow dynamic share/bulk share.

(warning) Make sure to stop running your DataSync Agent before making any Agent configuration changes.


Procedure

To set up your DataSync Agent to share application data to Snowflake, follow these steps:


Navigate to the conf directory in the installation folder where you installed your DataSync Agent and look for the agent.xml configuration file.

Open your agent.xml file in a text editing application. Then, locate the <task> directive(s) within your <subscribe> directive, and update the following database-related directives:

DirectiveValue to enter
<handler>

com.perspectium.replicator.sql.SnowflakeSQLSubscriber

<database_type>snowflake
<database_server>

URL/IP address for your Snowflake database.

<database_user>Username used to access your Snowflake database
<database_password>Password used to access your Snowflake database
<database_parms>

To specify the schema, warehouse and role to access your Snowflake database

schema=<SCHEMA_NAME>&amp;warehouse=<WAREHOUSE_NAME>&amp;role=<ROLE>


For example:

schema=PUBLIC&amp;warehouse=COMPUTE_WH&amp;role=ACCOUNTADMIN

would be used if your schema name is PUBLIC, your warehouse name is COMPUTE_WH and your role is ACCOUNTADMIN

<database>Name of your Snowflake database
<database_port>(Optional) Port number to access your Snowflake database
<enable_snowflake>To enable the DataSync Agent for Snowflake. By default, the Agent is not enabled for Snowflake because of the aforementioned performance limitations with standard SQL operations. 


<files_directory>

To specify a local directory where the Agent is running and has access to so it can save its temporary CSV files. These files are used push to Snowflake staging tables in order to do the COPY command for bulk loading data from the local file system.

(info) NOTE: The folder specified should only be used for the Agent. If you store other files in this directory, they will be deleted as part of the cleanup using the <delete_files_interval> configuration.

<file_max_size>

To specify how large the temporary files will be before they are pushed to Snowflake. To optimize performance, records should be accumulated in the temporary files before they are pushed so we're not continuously pushing files to Snowflake which can affect performance.

(info) NOTE:

  • The size can be in KB, MB, or GB, i.e. 50KB, 250MB, 1GB. Make sure to have NO space in between the number and the unit. 
  • The minimum value for this directive is 25KB. If you input a value less than 25KB, the value will be set to 25KB.    
  • Conversely, the maximum value for this directive is 10GB. If you input a value greater than 10GB, the value will be set to 10GB.
<uppercase_columns>

(Optional) To specify if the columns should be saved in uppercase lettering in the Snowflake database. Since ServiceNow's columns are lowercase, this will convert them to uppercase for querying and saving content into Snowflake (i.e. sys_id becomes SYS_ID).

It is recommended to enable this configuration since Snowflake by default uses uppercase lettering.

<flush_interval>

(Optional) To specify how often the DataSync Agent will flush its temporary files and push to Snowflake if they have not accumulated to the size specified in <file_max_size>. Values are entered as whole numbers to represent minutes, such as 10 to represent 10 minutes. The default value is 5 for 5 minutes if this configuration is not specified.

This way, we don't wait too long to push changes if there are no more records to be shared. For example, if you share 100 records and your <file_max_size> configuration is 400KB and then 80 records fit within the 400KB, those 80 records will be pushed but the remaining 20 records will stay in the next temporary file until it reaches 400KB. This flush interval ensure they get pushed in case there isn't any more sharing done for now, in which case we wouldn't reach 400KB again for some time.

<delete_files_interval>

(Optional) To specify how old the temporary files should be before they are deleted. To ensure your temporary files don't take up too much space on the file system while still keeping them for any troubleshooting purposes, we only delete the files once they get older (last modified) beyond a certain day/time related to the current date and time.

The value is used each time the Agent runs to flush its temporary files (using the above <flush_interval> configuration), checking this configuration for any temporary files older than the value specified here related to the current time. For example, if the value is 12H (for 12 hours), when the Agent runs its flush, it will check for any temporary files older than 12 hours from now and delete those files.

(info) NOTE:

  • The value can be entered in hours (H) and days (D) i.e. 12H (for 12 hours), 3D (for 3 days). Make sure to have NO space in between the number and the unit. 
  • The default value is 24 hours if this configuration is not specified.

Your agent.xml file should look similar to the example shown below:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<config>
    <agent>
       	<share>
     	</share>  
     	<subscribe>
     	<task>
            <task_name>test_snowflake_subscribe</task_name>
            <message_connection password="encrypted:vlOtU71yu8N/EFIJH85SSBtaIt7qEEfvqiqft9VZyYE=" user="exampleuser">amqps://example.perspectium.net</message_connection>
            <instance_connection password="encrypted:vlOtU71yu8N/EFIJH85SSBtaIt7qEEfvqiqft9VZyYE=" user="example.user">https://myinstance.service-now.com</instance_connection>     
			<decryption_key>The cow jumped over the moon</decryption_key>
            
			<handler>com.perspectium.replicator.sql.SnowflakeSQLSubscriber</handler>             
			<database_type>snowflake</database_type>
            <database_server>servername.snowflakecomputing.com</database_server>
            <database_user>exampleuser</database_user>
            <database_password>examplepassword</database_password>
            <database_params>schema=PUBLIC&warehouse=COMPUTE_WH&role=ACCOUNTADMIN</database_params>
            <database>SAMPLE</database>
			
			<enable_snowflake/>
			<uppercase_columns/>
				
			<files_directory>/usr/local/datasyncagent/tempfiles</files_directory>
			<file_max_size>500KB</file_max_size>
			<flush_interval>3</flush_interval>				
			<delete_files_interval>12H</delete_files_interval>          
		</task>
      	</subscribe>
      	<max_reads_per_connect>4000</max_reads_per_connect>
      	<polling_interval>30</polling_interval>
   </agent>
</config>

Save the changes you made to your agent.xml file and close the file.

Navigate back to the conf directory in the DataSync Agent installation folder and look for the wrapper.conf file.

Open up the wrapper.conf file in a text editing application. Look for the # Java Additional Parameters section and add the following two lines:

wrapper.java.additional.4=--add-opens 
wrapper.java.additional.5=java.base/java.nio=ALL-UNNAMED

The changes should look as follows:

# Java Additional Parameters
wrapper.java.additional.1=-Djava.util.logging.config.file=../conf/logging.properties
wrapper.java.additional.2=-Dlog4j.configurationFile=../conf/log4j2.xml
wrapper.java.additional.3=-Dfile.encoding=UTF-8
wrapper.java.additional.4=--add-opens 
wrapper.java.additional.5=java.base/java.nio=ALL-UNNAMED  

This configuration is needed for the Agent to work properly so it can access the Java modules to save data in memory before pushing it to Snowflake to use the COPY command. 

Save the changes you made to your wrapper.conf file and close the file.

After configuring your agent.xml and wrapper.conf files to support replication to your Snowflake database, start running your DataSync Agent again.

Agent Log Reporting Notes

The Agent reports both the records it processes (i.e. records it reads from the Perspectium Integration Mesh queue and then prepares for saving into Snowflake) along with the records it pushes into Snowflake (records that are sent into Snowflake and then copied into the database tables with the COPY command). Because of how the Agent batches records together to a max file size before using the COPY command, these details will be reported separately and these two actions will not happen at the exact same time. For example, if looking at the Agent's log file (perspectium.log), you will see log entries as follows:

2024-03-28 23:33:51.049 INFO  - PerformanceReportGenerator - PerformanceReport - change_request.bulk=[100, 425560]: 100 messages processed and 425560 bytes processed

2024-03-28 23:37:21.049 INFO  - Timer-4 - SnowflakeSQLSubscriber - Scheduled flushing to push records

2024-03-28 23:37:28.144 INFO  - Timer-4 - SnowflakeSQLSubscriber - Processed 100 record(s) to table: change_request

The PerformanceReportGenerator logging will represent the processing of records by the Agent from the Integration Mesh queue (i.e. reading the messages, decrypting the content, formatting it and then saving into the local temporary files) and then the SnowflakeSQLSubscriber logging will represent the actual pushing of the records into the Snowflake database (taking the temporary files, pushing them into Snowflake and then using the COPY command to save into the actual Snowflake database tables themselves).