Hive Partition (Still to Consolidate) - ignacio-alorre/Hive GitHub Wiki

Hive Static Partitioning

  • Insert input data files individually into a partition table is Static Partition.
  • Usually when loading files (big files) into Hive tables static partitions are preferred.
  • Static Partition saves your time in loading data compared to dynamic partition.
  • You “statically” add a partition in the table and move the file into the partition of the table.
  • We can alter the partition in the static partition.
  • You can get the partition column value from the filename, day of date etc without reading the whole big file.
  • If you want to use the Static partition in the hive you should set property set hive.mapred.mode = strict This property set by default in hive-site.xml
  • Static partition is in Strict Mode.
  • You should use where clause to use limit in the static partition.
  • You can perform Static partition on Hive Manage table or external table.

Hive Dynamic Partitioning

  • Single insert to partition table is known as a dynamic partition.
  • Usually, dynamic partition loads the data from the non-partitioned table.
  • Dynamic Partition takes more time in loading data compared to static partition.
  • When you have large data stored in a table then the Dynamic partition is suitable.
  • If you want to partition a number of columns but you don’t know how many columns then also dynamic partition is suitable.
  • Dynamic partition there is no required where clause to use limit.
  • we can’t perform alter on the Dynamic partition.
  • You can perform dynamic partition on hive external table and managed table.
  • If you want to use the Dynamic partition in the hive then the mode is in non-strict mode.

Note: You should run MSCK REPAIR TABLE when the structure or partition of the external table is changed. This command updates the metadata of the table.

If you are using a field named dt, which represent a date to partition the table

  • Yesterday, you inserted some data which is dt=2018-06-12, then you should run MSCK REPAIR TABLE to update the metadata to tell hive to aware a new partition dt=2018-06-12.