To enhance your DataSync integration for Parquet, you can configure the Parquet Bulk Load Meshlet to the directives listed below:
Directive | Default Value | Description | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
maxFileSize | Required Configuration. This configuration specifies the max size for temporary files created as the meshlet pushes records to parquet. 10000 will be used if input is over 10000 to prevent possible performance and memory issues. A suggested value is 5000. perspectium: filesubscriber: maxFileSize: 15000 | |||||||||
customFileName | $table-$randomid | Dynamic naming convention that will be replaced when creating files using the following keywords. File names MUST be unique.
perspectium: filesubscriber: customFileName: $table-$randomid | ||||||||
fileDirectory | /files | Directory where the locally created files get made. (In respects to where application is running) perspectium: filesubscriber: fileDirectory: /files | ||||||||
postInterval | 2 | Minutes to check dead periods. Check every x minutes to compare if the in memory collection is the same as the last x minutes. If so, write records to file and push to parquet perspectium: parquet: postInterval: 2 | ||||||||
dateFormat | yyyy-MM-dd'T'HH:mm:ss.SSSZ | Date format used to create the file name. A valid SimpleDateFormat required. | ||||||||
timeZone | GMT | ID of timezone to be used in $zonedatetime. | ||||||||
file_prefix | Prefix used for file naming. | |||||||||
file_suffix | Suffix used for file naming. |