Heap vs Append Optimized table - yellia1989/greenplum-install-doc GitHub Wiki

本文主要探讨的内容如下:

  1. 存储模式的类型和它们各自的使用场景
  2. 针对10W,100W数据,测试采用不同的存储模式所占用的空间大小。

存储模式的类型和它们各自的使用场景

  1. Heap Storage 默认的存储模式,对于数据量不多的(小于100w),支持大量更新和删除的表比较合适。
  • 使用MVCC来实现一致性(处理update和delete的时候,Heap表并没有真正删除数据,而只是依靠version信息屏蔽老的数据,因此如果你的表有大量的update或者delete,表占用的物理空间会不断增大,这个时候需要依靠vacuum来清理老数据)
  • 支持分区表,只支持行存,不支持列存和压缩
  1. Append-Optimized Storage 针对查询做了优化,适用于large table,对需要经常更新的表不能选用此选项,主要用于数据分析。
  • AO表支持行存和列存,并且也支持对数据进行压缩
  • 不同于Heap表使用MVCC控制update和delete之后数据的可见性,AO表使用一个附加的bitmap表来实现,这个表的的内容就是表示AO表中哪些数据是可见的。对于有大量update和delete的AO表,同样需要vacuum进行维护,不过在AO表中,vacuum需要对bitmap进行重置并压缩物理文件,因此通常比Heap的vacuum要慢。

Storage Orientation Model

分为Row-Orientation Model,Column-Orientation Model两种,更进一步指定表中的每一行采用的存储模式。

  1. Row-Orientation Model 当行中的很多列都需要在查询中使用到的时候采用
  2. Column-Orientation Model 当行中列只有很少一部分需要用到的时候采用,只能在Append-Optimized Storage上采用。

Compression

压缩,只针对append-Optimized Storage

测试数据

CREATE TABLE test10w.add_item_heap_10w (
    _rid integer,
    "time" timestamp without time zone,
    logymd text,
    loghms text,
    roleid bigint,
    zoneid integer,
    usercreatetime timestamp without time zone,
    usercreateymd text,
    usercreatehms text,
    moneytotal double precision,
    activedays integer,
    viplevel integer,
    country text,
    ostype integer,
    operate text,
    baseid integer,
    add_num integer,
    cur_num integer
)
DISTRIBUTED BY (_rid);

CREATE TABLE test10w.add_item_append_10w (
    _rid integer,
    "time" timestamp without time zone,
    logymd text,
    loghms text,
    roleid bigint,
    zoneid integer,
    usercreatetime timestamp without time zone,
    usercreateymd text,
    usercreatehms text,
    moneytotal double precision,
    activedays integer,
    viplevel integer,
    country text,
    ostype integer,
    operate text,
    baseid integer,
    add_num integer,
    cur_num integer
)
WITH (appendonly='true')
DISTRIBUTED BY (_rid)

CREATE TABLE test10w.add_item_append_column_10w (
    _rid integer ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    "time" timestamp without time zone ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    logymd text ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    loghms text ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    roleid bigint ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    zoneid integer ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    usercreatetime timestamp without time zone ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    usercreateymd text ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    usercreatehms text ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    moneytotal double precision ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    activedays integer ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    viplevel integer ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    country text ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    ostype integer ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    operate text ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    baseid integer ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    add_num integer ENCODING (compresstype=none,blocksize=32768,compresslevel=0),
    cur_num integer ENCODING (compresstype=none,blocksize=32768,compresslevel=0)
)
WITH (appendonly='true', orientation='column')
DISTRIBUTED BY (_rid);

CREATE TABLE test10w.add_item_append_column_compression9_10w (
    _rid integer ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    "time" timestamp without time zone ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    logymd text ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    loghms text ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    roleid bigint ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    zoneid integer ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    usercreatetime timestamp without time zone ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    usercreateymd text ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    usercreatehms text ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    moneytotal double precision ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    activedays integer ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    viplevel integer ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    country text ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    ostype integer ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    operate text ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    baseid integer ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    add_num integer ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768),
    cur_num integer ENCODING (compresstype=zlib,compresslevel=9,blocksize=32768)
)   
WITH (appendonly='true', orientation='column', compresstype=zlib, compresslevel='9')
DISTRIBUTED BY (_rid);

CREATE TABLE test10w.add_item_append_column_compression_10w (
    _rid integer ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    "time" timestamp without time zone ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    logymd text ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    loghms text ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    roleid bigint ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    zoneid integer ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    usercreatetime timestamp without time zone ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    usercreateymd text ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    usercreatehms text ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    moneytotal double precision ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    activedays integer ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    viplevel integer ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    country text ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    ostype integer ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    operate text ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    baseid integer ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    add_num integer ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768),
    cur_num integer ENCODING (compresstype=zlib,compresslevel=1,blocksize=32768)
)
WITH (appendonly='true', orientation='column', compresstype=zlib, compresslevel='1')
DISTRIBUTED BY (_rid);

统计语句

SELECT sotaidtablename,sotaidtablesize/1024.0/1024 FROM gp_toolkit.gp_size_of_table_and_indexes_disk where sotaidschemaname='test10w' ORDER BY sotaidtablename;
数据量行数 Heap Storage Append-Optimized Storage(row) Append-Optimized Storage(column no compression) Append-Optimized Storage(column no zlib-1) Append-Optimized Storage(column no zlib-9)
10w 16MB 14MB 11MB 1MB 1MB
100w 162MB 140MB 115MB 14MB 11MB
500w 812MB