Hive Partition (Still to Consolidate) - ignacio-alorre/Hive GitHub Wiki
Hive Static Partitioning
- Insert input data files individually into a partition table is Static Partition.
- Usually when loading files (big files) into Hive tables static partitions are preferred.
- Static Partition saves your time in loading data compared to dynamic partition.
- You “statically” add a partition in the table and move the file into the partition of the table.
- We can alter the partition in the static partition.
- You can get the partition column value from the filename, day of date etc without reading the whole big file.
- If you want to use the Static partition in the hive you should set property set hive.mapred.mode = strict This property set by default in hive-site.xml
- Static partition is in Strict Mode.
- You should use where clause to use limit in the static partition.
- You can perform Static partition on Hive Manage table or external table.
Hive Dynamic Partitioning
- Single insert to partition table is known as a dynamic partition.
- Usually, dynamic partition loads the data from the non-partitioned table.
- Dynamic Partition takes more time in loading data compared to static partition.
- When you have large data stored in a table then the Dynamic partition is suitable.
- If you want to partition a number of columns but you don’t know how many columns then also dynamic partition is suitable.
- Dynamic partition there is no required where clause to use limit.
- we can’t perform alter on the Dynamic partition.
- You can perform dynamic partition on hive external table and managed table.
- If you want to use the Dynamic partition in the hive then the mode is in non-strict mode.
Note: You should run MSCK REPAIR TABLE
when the structure or partition of the external table is changed. This command updates the metadata of the table.
If you are using a field named dt, which represent a date to partition the table
- Yesterday, you inserted some data which is
dt=2018-06-12
, then you should runMSCK REPAIR TABLE
to update the metadata to tell hive to aware a new partitiondt=2018-06-12
.