GreenPlum快速入门实践

最新推荐文章于 2023-09-19 09:40:26 发布

飞虹147

最新推荐文章于 2023-09-19 09:40:26 发布

阅读量1.7k

点赞数

分类专栏： Greenplum 文章标签： postgresql 数据库大数据

本文链接：https://blog.csdn.net/zjsearching/article/details/110086416

版权

文章目录1. 表创建1.1 获取帮助1.2 指定分布键1.3 其它创建方式1.4 SELECT 查询结果展现顺序2. 分区表2.1 range分区2.2 list分区2.3 分区管理3. 数据加载3.1 Insert3.2 Copy3.3 外部表3.3.1 介绍3.3.2 实践示例3.3.3 可执行外部表3.3.4 可写外部表数据导出3.4 gpload4. 拉链表实现5. 数据字典OID6. 集群维护1. 表创建1.1 获取帮助-- 获取帮助\h create table1.2 指定分布键

摘要由CSDN通过智能技术生成

1. 表创建

1.1 获取帮助

-- 获取帮助
\h create table

1.2 指定分布键

创建表时指定分布键
Greenplum是一个分布式数据库，数据肯定是分布在每一个节点。有hash分布和随机分布两种策略

hash分布 : 指定一个或多个分布键(列)，计算hash值，并通过hash值将数据路由到特定的segment。

语法：Distributed by 键，未指定分布键时，默认以第一个字段。

create table test004(id int,name varchar(10)) distributed by (id);

随机分布 ：或称平均分布。数据随机分散在每一个节点，与数据内容无关。但在SQL的关联操作中，仍需要将数据重分布、性能较差。默认将主键或唯一键作为分布键

语法：Distributed randomly

create table test003(id int, name varchar(10)) distributed randomly;

注：在Greenplum Database 6.10.0 版本执行后，却直接报错

create table test007(id int unique, name varchar(20)) distributed by(id,name);
--错误如下：
ERROR:  UNIQUE constraint and DISTRIBUTED BY definitions are incompatible
HINT:  When there is both a UNIQUE constraint and a DISTRIBUTED BY clause, the DISTRIBUTED BY clause must be a subset of the UNIQUE constraint.

-- 随机分布：默认以唯一键、主键来作为分布键
create table test005(id int unique, name varchar(20));
create table test006(id int primary key, name varchar(200));

/*
postgres=# \d test007
           Table "public.test007"
 Column |         Type          | Modifiers 
--------+-----------------------+-----------
 id     | integer               | 
 name   | character varying(20) | 
Indexes:
    "test007_id_key" UNIQUE CONSTRAINT, btree (id)
Distributed by: (id)
*/

1.3 其它创建方式

Create like

新表与源表仅仅表结构一致，如压缩、只增(appendonly)等属性并不一样。分布键默认与原表一致。

-- 
create table test008_like (like test007);
-- 错误写法
create table test008_like like test007;
create table test008_like like (test007);

SELECT INTO

分布键不能指定，必须使用默认的。

select * into test4_003 from test004;
-- 如果手动指定分布键，则会报语法错
postgres=# select * into test4_004 from test004 distributed by (id);
ERROR:  syntax error at or near "distributed"
LINE 1: select * into test4_004 from test004 distributed by (id);

CREATE AS

可以指定或者使用默认分布键

create table test004_2 as select * from test004;
create table test004_3 as select * from test004 distributed by(id);

insert、update、delete注意

insert时，分布键不要为空，否则为默认为NULL，全分布到一个节点，造成数据分布不均匀

update：不能批量对分布键update。分布键的更新涉及到数据重分布。这是书中旧版本的描述。看以下，在新版本中可以支持更新。

update test004 set id = 6 where id = 4;

-- 对整张表删除、直接删除然后创建新的物理文件
truncate test004_2

1.4 SELECT 查询结果展现顺序

查询结果展现顺序的随机性。Greenplum的数据分布在所有Segment，数据在Master展现时，是以各Segment的数据到达Master的先后顺序来展示，但这个到达顺序是随机的。所以只有加order by,才会强制指定固定顺序。

-- 示例：观察查询结果展现顺序的随机性
create table test004(id int unique, name varchar(20));
insert into test004 values(1,'jun'),(2,'jun2'),(3,'jun3'),(4,'jun4');
/*
-- 查询1 
postgres=# select * from test004;
 id | name 
----+------
  2 | jun2
  3 | jun3
  4 | jun4
  1 | jun
(4 rows)

-- 查询2
postgres=# select * from test004;
 id | name 
----+------
  1 | jun
  2 | jun2
  3 | jun3
  4 | jun4
(4 rows)
*/

2. 分区表

2.1 range分区

-- 1、range分区
-- 逐个指定分区名称，
-- 注：区间为前闭后开[start, end)
create table test_partition_range(id int, name varchar(8), dw_end_date date)
distributed by (id)
partition by range(dw_end_date)
(partition p20111230 start('2011-12-30'::date) end('2011-12-31'::date)
partition p20111231 start('2011-12-31'::date) end('2012-01-01'::date)

)

-- 使用every
create table test_partition_range2(id int, name varchar(8), dw_end_date date)
distributed by (id)
partition by range(dw_end_date)
(partition p201112 start('2011-12-01'::date) end('2011-12-31'::date) every ('1 days'::interval));

2.2 list分区

-- 2、list分区
create table test_partition_list(member_id int, city character varying(32))
distributed by (member_id)
partition by list(city)
(
partition guangzhou values('guangz

最低0.47元/天解锁文章

飞虹147

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
GreenPlum快速入门实践

文章目录1. 表创建1.1 获取帮助1.2 指定分布键1.3 其它创建方式1.4 SELECT 查询结果展现顺序2. 分区表2.1 range分区2.2 list分区2.3 分区管理3. 数据加载3.1 Insert3.2 Copy3.3 外部表3.3.1 介绍3.3.2 实践示例3.3.3 可执行外部表3.3.4 可写外部表数据导出3.4 gpload4. 拉链表实现5. 数据字典OID6. 集群维护1. 表创建1.1 获取帮助-- 获取帮助\h create table1.2 指定分布键
复制链接

扫一扫