Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

To enhance your DataSync integration for Parquet, you can configure the Parquet Bulk Load Meshlet to the directives listed below. 

Anchor
top
top

Panel
titleWhat's on this page?

Table of Contents
maxLevel2
absoluteUrltrue


To check out the general meshlet configurations, see General Meshlet Configurations for DataSync.



DirectiveDefault ValueDescription

maxFileSize


Required Configuration. This configuration specifies the max size of records for each parquet file. 10000 will be used if input is over 10000 to prevent possible performance and memory issues. A suggested value is 5000.

(info) NOTE: Negative values are not supported and will result in 1 record for each parquet file. 

Code Block
languageyml
perspectium:
	filesubscriber: 
		maxFileSize: 5000


customFileName

$table-$randomid

Dynamic naming convention that will be replaced when creating files using the following keywords. File names MUST be unique.

KeyDescription
$tableName of table file
$zonedatetime

Time of file writing  - Default format: yyyy-MM-dd'T'HH:mm:ss.SSSZ

(info) NOTE: Format can be changed (see dateFormat configuration)

$randomIDRandom ID to ensure unique file naming


Code Block
languageyml
perspectium:
	filesubscriber: 
		customFileName: $table-$randomid


fileDirectory

/files

Directory where the locally created files get made. (In respects to where application is running)

Code Block
languageyml
perspectium:
	filesubscriber: 
		fileDirectory: /files


postInterval

2

Minutes to check dead periods. Check every x minutes to compare if the in memory collection is the same as the last x minutes. If so, write records to file and push to parquet

Code Block
languageyml
perspectium:
	parquet: 
		postInterval: 2


dateFormatyyyy-MM-dd'T'HH:mm:ss.SSSZ

Date format used to create the file name. A valid SimpleDateFormat required.

Code Block
languageyml
perspectium:
	parquet: 
		postInterval: 2


timeZoneGMT

ID of timezone to be used in $zonedatetime.

Code Block
languageyml
perspectium:
	parquet: 
		timeZone: GMT


file_prefix

Prefix used for file naming.

Code Block
languageyml
perspectium:
	parquet: 
		file_prefix: psp


file_suffix

Suffix used for file naming.

Code Block
languageyml
perspectium:
	parquet: 
		file_suffix: bt






Azure External Storage

To enable sharing Parquet files to Azure, use the following directives: 

DirectiveDefault ValueDescription
connectionString

Connection URL for your Azure. To access the URL, go to Azure Portal > Storage Account > Access Keys > Show Keys > Connection String.

Code Block
perspectium:  
	azure:    
    	connectionString: DefaultEndpointsProtocol=.....EndpointSuffix=core.windows.net


destinationContainer

Name of your Azure Blob Storage container, including subdirectories if desired, to specify where the records will be uploaded e.g. container/folder1/folder2.  

For example, the following will save records into the pspcontainer blob storage container:

Code Block
perspectium:  
	azure:    
		destinationContainer: pspcontainer

If the following is configured:

Code Block
perspectium:  
	azure:    
		destinationContainer: pspcontainer/tables/$table

If an incident record is being processed and uploaded to the Azure Blob Storage container, then the record will be saved in the pspcontainer container and in the /tables/incident directory in that container, creating the directoriy incident automatically.

(info) NOTE: Adding the $table token indicates this token will be replaced by the table name of the record.


UI Text Box
typetip

To enable the sharing of Parquet files to Azure, the spring profile will need to include azure: 

Code Block
java -jar -Dspring.profiles.active=dev,azure