Configure With Hadoop - Altinity/parquet-regression GitHub Wiki
Hadoop Configurations
An alternative way to specify configurations is to use the hadoop library and specify the configurations from there.
"hadoop": {
"options": {
"parquet.compression": "UNCOMPRESSED",
"parquet.enable.dictionary": "true",
"parquet.page.size": "1048576"
}
Possible Configurations To Set With Hadoop
Full description for each parameter can be found here
Property | Values | Default |
---|---|---|
parquet.summary.metadata.level |
all , common_only , none |
all |
parquet.enable.summary-metadata |
true , false , NONE , all |
true |
parquet.block.size |
any integer value | 134217728 |
parquet.page.size |
any integer value | 1048576 |
parquet.compression |
uncompressed , snappy , gzip , lzo , brotli , lz4 , zstd and lz4_raw |
uncompressed |
parquet.enable.dictionary |
true , false |
true |
parquet.dictionary.page.size |
any integer value | 1048576 |
parquet.writer.version |
PARQUET_1_0 , PARQUET_2_0 |
PARQUET_1_0 |
parquet.validation |
true , false |
false |
parquet.columnindex.truncate.length |
any integer value | 64 |
parquet.statistics.truncate.length |
any integer value | 2147483647 |
parquet.bloom.filter.enabled |
true , false |
false |
parquet.bloom.filter.enabled#column.path |
true , false |
false |
parquet.bloom.filter.adaptive.enabled |
true , false |
false |
parquet.bloom.filter.candidates.number |
any integer value | 5 |
parquet.bloom.filter.expected.ndv#column.path |
any integer value | 200 |
parquet.bloom.filter.fpp#column.path |
any integer value | 0.01 |
parquet.bloom.filter.max.bytes |
any integer value | 1048576 |
parquet.page.row.count.limit |
any integer value | 20000 |
parquet.page.write-checksum.enabled |
true , false |
true |