GCS BigQuery partitioning clustering - ghdrako/doc_snipets GitHub Wiki

Partitioning

Column type:

  • Time unit (TIMESTAMP,DATE,DATETIME) included time of ingetion (data zaladowania danych)
  • Integer

Partition by date

CREATE TABLE <name>
PARTITION BY <column_name to partition>
AS
SELECT ...,<column_name to partition>,...
FROM <table>

In type DATE default partitioning is DAY. Other option is Weekly,Monthly itd

Partition by integers

CREATE TABLE <tab_name> (
id INT64,
fname STRING,
lname STRING,
level INT64)
PARTITION BY 
  RANGE_BUCKET(level,GENERATE_ARRAY(1,10,5))  -- generate two partition first for level 1 -5 and second level 6-10 
OPTIONS (
  require_partition_filter = true
)  

select * from <tab_name>  --not work because require_partition_filter = true

Partition by ingestion time

CREATE TABLE <tab_name> (
id INT64,
fname STRING,
lname STRING,
level INT64)
PARTITION BY _PARTITIONDATE  -- default partitioning DAY
OPTION (
  partition_expiration_days=30
)

SELECT * from <tab_name> where _PARTITIONDATE = '2023-10-02'

Clustering

  • Text column can be use to clustering but not partitioning
  • Can be used to multiple columns
  • only if table or partition is lerger then 1GB

Combine partitioning and clustering

CREATE TBLE <tab_name>
PARTITION BY 
  RANGE_BUCKET(power_engine,GENERATE_ARRAY(1,1000,100))
CLUSTER BY maker,manufacter_yer
AS
SELECT * FROM 'sales_data.automobile'



⚠️ **GitHub.com Fallback** ⚠️