GCS BigQuery partitioning clustering - ghdrako/doc

Partitioning

Column type:

Time unit (TIMESTAMP,DATE,DATETIME) included time of ingetion (data zaladowania danych)
Integer

Partition by date

CREATE TABLE <name>
PARTITION BY <column_name to partition>
AS
SELECT ...,<column_name to partition>,...
FROM <table>

In type DATE default partitioning is DAY. Other option is Weekly,Monthly itd

Partition by integers

CREATE TABLE <tab_name> (
id INT64,
fname STRING,
lname STRING,
level INT64)
PARTITION BY 
  RANGE_BUCKET(level,GENERATE_ARRAY(1,10,5))  -- generate two partition first for level 1 -5 and second level 6-10 
OPTIONS (
  require_partition_filter = true
)  

select * from <tab_name>  --not work because require_partition_filter = true

Partition by ingestion time

CREATE TABLE <tab_name> (
id INT64,
fname STRING,
lname STRING,
level INT64)
PARTITION BY _PARTITIONDATE  -- default partitioning DAY
OPTION (
  partition_expiration_days=30
)

SELECT * from <tab_name> where _PARTITIONDATE = '2023-10-02'

Clustering

Text column can be use to clustering but not partitioning
Can be used to multiple columns
only if table or partition is lerger then 1GB

Combine partitioning and clustering

CREATE TBLE <tab_name>
PARTITION BY 
  RANGE_BUCKET(power_engine,GENERATE_ARRAY(1,1000,100))
CLUSTER BY maker,manufacter_yer
AS
SELECT * FROM 'sales_data.automobile'

GCS BigQuery partitioning clustering - ghdrako/doc_snipets GitHub Wiki

Partitioning

Partition by date

Partition by integers

Partition by ingestion time

Clustering

Combine partitioning and clustering

⚠️ GitHub.com Fallback ⚠️

GCS BigQuery partitioning clustering - ghdrako/doc_snipets GitHub Wiki

Partitioning

Partition by date

Partition by integers

Partition by ingestion time

Clustering

Combine partitioning and clustering

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️