Data Format Parquet - keshavbaweja-git/guides GitHub Wiki

  • Columnar, binary data storage format.
  • Design Goals
    • Interoperability
    • Space efficiency
    • Query efficiency
  • Language agnostic format specification
  • Java converters available for following Object Models
    • Avro
    • Thrift
    • Protocol Buffer
    • Pig Hive
    • Hive SerDe
  • C++ encoding used by Impala
  • Columnar storage is more space efficient as homogeneous column values are stored together allowing for less encoding data and better compression performance.