Usage Guide - MiguelElGallo/iparq GitHub Wiki

Usage Guide

iparq is a CLI tool for inspecting Parquet files. The main command is iparq inspect.

Basic Inspection

iparq inspect yourfile.parquet

This shows:

  • ParquetMetaModel with:
    • created_by
    • num_columns
    • num_rows
    • num_row_groups
    • format_version
    • serialized_size
  • A Rich table with:
    • Row Group
    • Column Name
    • Index
    • Compression
    • Bloom
    • Encrypted
    • Min Value
    • Max Value
    • Exact
  • A compression codec summary

Example:

ParquetMetaModel(
    created_by='parquet-cpp-arrow version 14.0.2',
    num_columns=3, num_rows=3, num_row_groups=1,
    format_version='2.6', serialized_size=2223
)
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Row Group โ”ƒ Column Name โ”ƒ Index โ”ƒ Compression โ”ƒ Bloom โ”ƒ Encrypted โ”ƒ Min Value โ”ƒ Max Value โ”ƒ Exact โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚     0     โ”‚ one         โ”‚   0   โ”‚ SNAPPY      โ”‚  โœ…   โ”‚     โ€”     โ”‚ -1.0      โ”‚ 2.5       โ”‚  N/A  โ”‚
โ”‚     0     โ”‚ two         โ”‚   1   โ”‚ SNAPPY      โ”‚  โœ…   โ”‚     โ€”     โ”‚ bar       โ”‚ foo       โ”‚  N/A  โ”‚
โ”‚     0     โ”‚ three       โ”‚   2   โ”‚ SNAPPY      โ”‚  โœ…   โ”‚     โ€”     โ”‚ False     โ”‚ True      โ”‚  N/A  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Compression codecs: {'SNAPPY'}

Output Formats

Rich (default)

Rich output is the default format:

iparq inspect yourfile.parquet

It prints the metadata model, the formatted table, and the compression codec summary.

JSON format

iparq inspect yourfile.parquet --format json

This outputs structured JSON containing:

  • metadata
  • columns array, with each column entry including all available fields
  • compression_codecs list

Metadata Only

iparq inspect yourfile.parquet --metadata-only

This shows only the ParquetMetaModel without the column table.

Column Filtering

iparq inspect yourfile.parquet --column column_name

This limits the table output to the specified column.

Size Information

iparq inspect yourfile.parquet --sizes

This adds extra columns to the table:

  • Values
  • Compressed (human-readable size)
  • Ratio (compression ratio such as 1.0x)

Multiple Files

iparq inspect file1.parquet file2.parquet file3.parquet

When inspecting multiple files:

  • each file gets its own header
  • duplicate files are deduplicated

Glob Patterns

You can pass shell-expanded file patterns:

iparq inspect *.parquet
iparq inspect yellow*.parquet data_*.parquet
iparq inspect important.parquet temp_*.parquet

Combining Options

Options can be combined as needed:

iparq inspect *.parquet --format json --sizes --column my_col

Understanding the Output

The Rich table columns mean:

  • Row Group: Parquet row group index
  • Column Name: Name of the column
  • Index: Column index within the schema
  • Compression: Codec used, such as SNAPPY, ZSTD, GZIP, or LZ4
  • Bloom: โœ… if a bloom filter is present, โŒ if not
  • Encrypted: ๐Ÿ”’ if encrypted, โ€” if not
  • Min Value / Max Value: Column statistics
  • Exact:
    • โœ… if min/max are exact (PyArrow 22+)
    • โŒ if approximate
    • ~ if partial
    • N/A if unknown

Error Handling

  • Non-existent files show an error, but processing continues for other files
  • Invalid Parquet files are reported gracefully

See also Home.