gp configuration - 9dian/Index GitHub Wiki
面向列的存储:比较适合数据仓库工作负载即在少量列上计算聚合的数据,或者只需要定期更新单个列而不修改其他列数据。 混合工作负载或通常情况下面向行储提供了灵活性和性能的最佳结合,但是,也有一些用例中面向列的存储模型提供了更高效的I/O和存储。按照以下需求考虑表的存储模型:
-
表数据更新。如果频繁地加载和更新表数据,请选择一个面向行的堆表。面向列的表存储只能在append-optimized表上可用。
-
频繁的插入。如果经常将行插入表中,请考虑面向行的模型。面向列的表对写入操作没有进行优化,因为列的列值必须写到磁盘上的不同位置。
-
压缩.列数据具有相同的数据类型,因此在面向列的数据中可用存储大小优化,这些数据在面向行的数据中不可用。例如,许多压缩方案使用相邻数据的相似性来压缩。然而,越大的相邻压缩实现,就越难获得随机访问,因为数据必须被压缩为被读取。
分区表中混合使用列存储和行存储
归档数据使用列存储、当前月份数据使用行存储,随时间推移不但将之前换存储的“热”数据切换为归档的列存储。 示例,一开始(2017-11)的创建表语句:
CREATE TABLE pay_order
(
order_no character varying(32) NOT NULL,
mch_id bigint,
out_trade_no character varying(32),
money bigint,
trade_state bigint,
add_time timestamp without time zone,
trade_time timestamp without time zone,
total_fee bigint,
transaction_id character varying(40),
trade_type character varying(32),
center_id bigint,
fee_type character varying(10),
cash_fee bigint,
cash_fee_type character varying(10),
termtype character varying(32),
groupno character varying(32),
mch_no character varying(16),
operno character varying(32),
agentid character varying(32),
bank_no character varying(32),
bank_type character varying(32),
device_info character varying(32),
appid character varying(64) ,
partner character varying(32),
openid character varying(255) ,
is_subscribe character varying(16),
coupon_fee bigint,
mch_create_ip character varying(32),
pay_create_ip character varying(32),
sub_partner character varying(32),
sub_appid character varying(64),
sub_openid character varying(255),
sub_is_subscribe character varying(16),
sign_agentno character varying(32),
client_ip character varying(32),
client_type character varying(32),
fld_n2 bigint,
modify_time timestamp without time zone,
fld_s1 character varying(256)
)
-- WITH (appendonly=true, orientation=column)
-- tablespace detail_tbs
DISTRIBUTED BY (order_no)
PARTITION BY RANGE (trade_time)
( PARTITION Before201607 START (date '2010-01-01') END (date '2016-07-01') WITH (appendonly=true, orientation=column, compresstype=zlib, compresslevel=5),
PARTITION Q3Q4of2016 START (date '2016-07-01') END (date '2017-01-01') WITH (appendonly=true, orientation=column, compresstype=zlib, compresslevel=5),
PARTITION Q1Q2of2017 START (date '2017-01-01') END (date '2017-07-01') WITH (appendonly=true, orientation=column, compresstype=zlib, compresslevel=5),
PARTITION Q3of2017 START (date '2017-07-01') END (date '2017-10-01') WITH (appendonly=true, orientation=column, compresstype=quicklz),
PARTITION Oct2017 START (date '2017-10-01') END (date '2017-11-01') WITH (appendonly=true, orientation=column, compresstype=quicklz),
PARTITION Nov2017 START (date '2017-11-01') END (date '2017-12-01')
) ;
到2017-12后,增加2017-12的分区并切换2017-11的分区为列存储。
首先我们将2017-11的数据转移到一个新表,然后使用交换分区(Exchanging a Partition)与上表中的nov2017分区交换,最后就是为2017-12数据增加新的分区。
CREATE TABLE pay_order_201711
(
order_no character varying(32) NOT NULL,
mch_id bigint,
out_trade_no character varying(32),
money bigint,
trade_state bigint,
add_time timestamp without time zone,
trade_time timestamp without time zone,
total_fee bigint,
transaction_id character varying(40),
trade_type character varying(32),
center_id bigint,
fee_type character varying(10),
cash_fee bigint,
cash_fee_type character varying(10),
termtype character varying(32),
groupno character varying(32),
mch_no character varying(16),
operno character varying(32),
agentid character varying(32),
bank_no character varying(32),
bank_type character varying(32),
device_info character varying(32),
appid character varying(64) ,
partner character varying(32),
openid character varying(255) ,
is_subscribe character varying(16),
coupon_fee bigint,
mch_create_ip character varying(32),
pay_create_ip character varying(32),
sub_partner character varying(32),
sub_appid character varying(64),
sub_openid character varying(255),
sub_is_subscribe character varying(16),
sign_agentno character varying(32),
client_ip character varying(32),
client_type character varying(32),
fld_n2 bigint,
modify_time timestamp without time zone,
fld_s1 character varying(256)
)
WITH (appendonly=true, orientation=column, compresstype=quicklz)
DISTRIBUTED BY (order_no);
insert into pay_order_201711
select * from pay_order where trade_time >= timestamp '20171101 00:00:00' and trade_time < timestamp '20171201 00:00:00'
ANALYZE pay_order_201711 ;
(Note: got the partition table name from pg_partitions.) 下一步可选,可以通过truncate 'pay_order_1_prt_nov2017'子表(分区表)加快交换分区。
TRUNCATE TABLE pay_order_1_prt_nov2017 ;
将新建的列表pay_order_201711与分区表Nov2017交换分区。
ALTER TABLE pay_order EXCHANGE PARTITION FOR (RANK(6)) WITH TABLE pay_order_201711 WITH VALIDATION;
or
ALTER TABLE pay_order EXCHANGE PARTITION FOR (pay_order_1_prt_nov2017) WITH TABLE pay_order_201711 WITH VALIDATION;
(Note: got the RANK from pg_partitions.)
删除临时表 pay_order_201711
drop table pay_order_201711 cascade;
最后为2017-12创建dec2017分区
ALTER TABLE pay_order ADD PARTITION Dec2017 START (date '2017-11-01') END (date '2017-12-01')
验证下更新后的创建表的DDL语句如下:
Greenplum集群增加Master Standby
假定之前的集群中只有一个Master,为了提高集群的高可用性增加Master Standby节点。 安装Greenplum软件包之前的准备工作与安装Master、Segment节点一样,安装软件包有两种方法
-
直接在Master Standby解压安装包。
-
在Master节点使用 gpseginstall 命令
gpseginstall -f standby -u gpadmin -p 'Password'
增加到Greenplum集群 直接使用 gpinitstandby -s 11_2_smdw,如果有filespace会提示创建与Master同样路径的目录, [gpadmin@11_3_mdw ~]$ gpinitstandby -s 11_2_smdw
The filespace locations on the master must be mapped to
locations on the standby. These locations must be empty on the
standby master host. The default provided is the location of
the filespace on the master (except if the master and the
standby are hosted on the same node or host). In most cases the defaults can be used.
Enter standby filespace location for filespace detail (default: /data/detail/master/gpseg-1):
Enter standby filespace location for filespace transtemp (default: /data/transtemp/master/gpseg-1):
Enter standby filespace location for filespace dwh00 (default: /data/dwh00/master/gpseg-1):
20171211:15:47:53:018641 gpinitstandby:11_3_mdw:gpadmin-[INFO]:-Validating environment and parameters for standby initialization... 20171211:15:47:53:018641 gpinitstandby:11_3_mdw:gpadmin-[ERROR]:-Parent directory /data/master does not exist on host 11_2_smdw 20171211:15:47:53:018641 gpinitstandby:11_3_mdw:gpadmin-[ERROR]:-This directory must be created before running gpactivatestandby 20171211:15:47:53:018641 gpinitstandby:11_3_mdw:gpadmin-[ERROR]:-Failed to create standby 20171211:15:47:53:018641 gpinitstandby:11_3_mdw:gpadmin-[ERROR]:-Error initializing standby master: Parent directory /data/master does not exist
也可以直接使用
gpinitstandby -s 11_2_smdw -F detail:/data/detail/master/gpseg-1,dwh00:/data/dwh00/master/gpseg-1,pg_system:/data/master/gpseg-1,transtemp:/data/transtemp/master/gpseg-1