Usage Guide - MiguelElGallo/iparq GitHub Wiki
Usage Guide
iparq is a CLI tool for inspecting Parquet files. The main command is iparq inspect.
Basic Inspection
iparq inspect yourfile.parquet
This shows:
ParquetMetaModelwith:created_bynum_columnsnum_rowsnum_row_groupsformat_versionserialized_size
- A Rich table with:
Row GroupColumn NameIndexCompressionBloomEncryptedMin ValueMax ValueExact
- A compression codec summary
Example:
ParquetMetaModel(
created_by='parquet-cpp-arrow version 14.0.2',
num_columns=3, num_rows=3, num_row_groups=1,
format_version='2.6', serialized_size=2223
)
โโโโโโโโโโโโโณโโโโโโโโโโโโโโณโโโโโโโโณโโโโโโโโโโโโโโณโโโโโโโโณโโโโโโโโโโโโณโโโโโโโโโโโโณโโโโโโโโโโโโณโโโโโโโโ
โ Row Group โ Column Name โ Index โ Compression โ Bloom โ Encrypted โ Min Value โ Max Value โ Exact โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ 0 โ one โ 0 โ SNAPPY โ โ
โ โ โ -1.0 โ 2.5 โ N/A โ
โ 0 โ two โ 1 โ SNAPPY โ โ
โ โ โ bar โ foo โ N/A โ
โ 0 โ three โ 2 โ SNAPPY โ โ
โ โ โ False โ True โ N/A โ
โโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโ
Compression codecs: {'SNAPPY'}
Output Formats
Rich (default)
Rich output is the default format:
iparq inspect yourfile.parquet
It prints the metadata model, the formatted table, and the compression codec summary.
JSON format
iparq inspect yourfile.parquet --format json
This outputs structured JSON containing:
metadatacolumnsarray, with each column entry including all available fieldscompression_codecslist
Metadata Only
iparq inspect yourfile.parquet --metadata-only
This shows only the ParquetMetaModel without the column table.
Column Filtering
iparq inspect yourfile.parquet --column column_name
This limits the table output to the specified column.
Size Information
iparq inspect yourfile.parquet --sizes
This adds extra columns to the table:
ValuesCompressed(human-readable size)Ratio(compression ratio such as1.0x)
Multiple Files
iparq inspect file1.parquet file2.parquet file3.parquet
When inspecting multiple files:
- each file gets its own header
- duplicate files are deduplicated
Glob Patterns
You can pass shell-expanded file patterns:
iparq inspect *.parquet
iparq inspect yellow*.parquet data_*.parquet
iparq inspect important.parquet temp_*.parquet
Combining Options
Options can be combined as needed:
iparq inspect *.parquet --format json --sizes --column my_col
Understanding the Output
The Rich table columns mean:
- Row Group: Parquet row group index
- Column Name: Name of the column
- Index: Column index within the schema
- Compression: Codec used, such as
SNAPPY,ZSTD,GZIP, orLZ4 - Bloom:
โif a bloom filter is present,โif not - Encrypted:
๐if encrypted,โif not - Min Value / Max Value: Column statistics
- Exact:
โif min/max are exact (PyArrow 22+)โif approximate~if partialN/Aif unknown
Error Handling
- Non-existent files show an error, but processing continues for other files
- Invalid Parquet files are reported gracefully
See also Home.