Partioning Criteria vs Methods - rFronteddu/general_wiki GitHub Wiki

Summary

  • Partitioning criteria is what you use,
  • Partitioning method is how you use it.

Let’s say you’re storing log entries:

  • Criteria: timestamp
  • Method: Range partitioning (e.g., logs from Jan go in Partition A, Feb in Partition B)

Or:

  • Criteria: user_id
  • Method: Hash partitioning (distribute users across N shards)

Methods

  • Range Partition: Divides data into segments based on a specified range of values for a partition key column.
  • Key/Hash Based: Divides a table based on a hash function applied to a specified column, typically the ID column. (remember consistent hashing)
  • List: Each partition is assigned a list of values
  • Vertical (or Column): Splits a table by columns based on the frequency or type of access. For example separating frequently accessed columns from rarely accessed columns.
  • Composite (or Hybrid): Combine multiple methods to create detailed and adaptable partitions. For example, first range and then hash.

More examples

  • Partitioning Criteria - The basis or rule used to decide where data goes
    • Think: What property of the data are we using to decide the partition?

Examples:

A date column (created_at)
A user ID
A product category
A hash of a key
  • Partitioning Methods - The technique or algorithm used to apply the criteria
  • Think: How do we use the chosen criteria to divide the data?

Examples:

Range method: Based on ranges of values (e.g., dates)
Hash method: Use a hash function on the value
List method: Explicitly map specific values to partitions
Composite method: Combine two or more methods
Vertical method: Split by columns instead of rows