apache paimon-update

apache paimon

Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.

在不同的merge-engine下的update效果

Deduplicate
Partial Update
Aggregation
基本概念详见官网链接

创建并使用catalog

tableEnv.executeSql("CREATE CATALOG paimon WITH ('type' = 'paimon', 'warehouse'='file:///D://data')");
tableEnv.executeSql("USE CATALOG paimon");

Deduplicate

1:准备数据

Row.ofKind(RowKind.INSERT, 1,183.0,15)
Row.ofKind(RowKind.INSERT, 1,11.9,20)

2:建表

CREATE TABLE if not exists MyTable (
    product_id BIGINT,
    price DOUBLE,
    sales BIGINT,
    PRIMARY KEY (product_id) NOT ENFORCED
)

3:先插入第一条数据之后的结果

//Row.ofKind(RowKind.INSERT, 1,183.0,15)
+---------------+--------+---------+
|    product_id |  price |   sales |
+---------------+--------+---------+
|             1 |  183.0 |      15 |
+---------------+--------+---------+

4:插入第二条数据的结果

+----+-------------+---------+-------------+
| op |  product_id |   price |       sales |
+----+-------------+---------+-------------+
| +I |           1 |    11.9 |          20 |
+----+-------------+---------+-------------+
//查询最终结果
+----------------+--------+--------+
|     product_id |  price |  sales |
+----------------+--------+--------+
|              1 |   11.9 |     20 |
+----------------+--------+--------+

5:删除操作

+----+-------------+----------+--------+
| op |  product_id |    price |  sales |
+----+-------------+----------+--------+
| -D |           1 |     11.9 |     20 |
+----+-------------+----------+--------+
//查询结果
Empty set

Process finished with exit code 0

数据读取方式

流批都支持

结论

deduplicate 是默认的 merge engine 同一主键只保留最后一条数据,如果最后的操作是删除则被删除。简而言之就是支持update和delete功能。

Partial Update

1:准备数据

Row.ofKind(RowKind.INSERT, "李思思", 22,"男",null,"上海")
Row.ofKind(RowKind.INSERT, "汪呜呜", 1,"女",null,null)

Row.ofKind(RowKind.INSERT, "李思思", 22,"男","中国",null)
Row.ofKind(RowKind.INSERT, "汪呜呜", 1,"女",null,"北京")

2:建表

CREATE TABLE if not exists MyTable(
name STRING,
age INT,
sex STRING,
country STRING,
city STRING,
PRIMARY KEY (name,sex) NOT ENFORCED
) PARTITIONED BY (sex)
WITH (
'merge-engine'='partial-update',
'bucket' = '2'
)

3:结果

//第一次插入数据之后查询结果
+---------+------+-----+-----------+----------+
|    name |  age | sex |   country |     city |
+---------+------+-----+-----------+----------+
|  汪呜呜  |    1 ||    (NULL) |   (NULL) |
|  李思思  |   22 ||    (NULL) |     上海 |
+---------+------+-----+-----------+----------+
//第二次插入数据
+----+---------+-------------+------+-----------+------------+
| op |    name |         age |  sex |   country |       city |
+----+---------+-------------+------+-----------+------------+
| +I |  李思思  |          22 ||      中国  |     (NULL) |
| +I |  汪呜呜  |           1 ||    (NULL) |       北京 |
+----+---------+-------------+------+-----------+------------+
//查询结果
+-----------+-------------+-------+-----------+--------+
|      name |         age |   sex |   country |   city |
+-----------+-------------+-------+-----------+--------+
|    李思思 |          22 ||       中国 |   上海 |
|    汪呜呜 |           1 ||    (NULL) |   北京 |
+-----------+-------------+-------+-----------+--------+

3:删除数据

//不支持删除数据查询会报错
Caused by: java.lang.IllegalArgumentException: Partial update can not accept delete records. Partial delete is not supported!
//设置'partial-update.ignore-delete'='true'参数忽略删除操作可正常读取
//删除
+----+--------+------+------+---------+----------+
| op |   name |  age |  sex | country |     city |
+----+--------+------+------+---------+----------+
| -D | 李思思  |   22 ||    中国 |   (NULL)  |
+----+--------+------+------+---------+----------+
//查询
|-----------+------+------+------------+--------+
|      name |  age |  sex |    country |   city |
+-----------+------+------+------------+--------+
|    李思思  |   22 ||       中国  |   上海  |
|    汪呜呜  |    1 ||     (NULL) |   北京  |
+-----------+------+------+------------+--------+

数据读取方式

批模式
流模式报错:Exception in thread “main” java.lang.RuntimeException: Partial update streaming reading is not supported. You can use ‘lookup’ or ‘full-compaction’ changelog producer to support streaming reading.

结论

支持update操作,但是当最后数据为null时不会进行update,不支持delete操作,不支持流读

Sequence Group

在’merge-engine’='partial-update’时配合sequence group实现多流的更新写入

1:准备数据

//第一次插入的数据
Row.ofKind(RowKind.INSERT, "李思思", 22,"男",1,null,1,"上海")
Row.ofKind(RowKind.INSERT, "汪呜呜", 1,"女",1,null,1,null)
//第二次插入的数据
Row.ofKind(RowKind.INSERT, "李思思", 32,"男",null,"中国",2,null)
Row.ofKind(RowKind.INSERT, "汪呜呜", 11,"女",1,"中国",2,"北京")

2:建表

CREATE TABLE if not exists MyTable(
name STRING,
age INT,
sex STRING,
version INT,
country STRING,
version2 INT,
city STRING,
PRIMARY KEY (name,sex) NOT ENFORCED
) PARTITIONED BY (sex)
WITH (
'merge-engine'='partial-update',
'fields.version.sequence-group'='age',
'fields.version2.sequence-group'='country,city',
'bucket' = '2'
)

3:结果

//第一次插入数据之后的查询结果
+-------------+-------------+--------+-------------+------------+-------------+----------+
|        name |         age |    sex |     version |    country |    version2 |     city |
+-------------+-------------+--------+-------------+------------+-------------+----------+
|      汪呜呜 |           1 ||           1 |     (NULL) |           1 |   (NULL) |
|      李思思 |          22 ||           1 |     (NULL) |           1 |     上海 |
+-------------+-------------+--------+-------------+------------+-------------+----------+
//插入数据
+----+--------+------+-----+-------------+----------+-------------+---------+
| op |   name |  age | sex |     version |  country |    version2 |    city |
+----+--------+------+-----+-------------+----------+-------------+---------+
| +I | 李思思  |   32 ||      (NULL) |     中国  |           2 |  (NULL) |
| +I | 汪呜呜  |   11 ||           1 |     中国  |           2 |    北京 |
+----+--------+------+-----+-------------+----------+-------------+---------+
//最终查询结果
+---------+------+------+-------------+----------+-------------+---------+
|    name |  age |  sex |     version |  country |    version2 |    city |
+---------+------+------+-------------+----------+-------------+---------+
|  汪呜呜  |   11 ||           1 |     中国 |           2 |    北京 |
|  李思思  |   22 ||           1 |     中国 |           2 |  (NULL) |
+---------+------+------+-------------+----------+-------------+---------+

4:删除
同样不支持删除操作

数据读取方式

批模式

结论

当sequence-group >=原来的sequence-group是会做update,不支持删除操作,要配合’partial-update.ignore-delete’='true’使用,只支持批读取。
sequence-group 的作用是用来解决并发写的问题,这个后续会专门测试

Aggregation

1:准备数据

Row.ofKind(RowKind.INSERT, 1,1.0,1)
Row.ofKind(RowKind.INSERT, 1,12.9,2)

2:建表

CREATE TABLE if not exists MyTable (
    product_id BIGINT,
    price DOUBLE,
    sales BIGINT,
    PRIMARY KEY (product_id) NOT ENFORCED
) WITH (
    'merge-engine' = 'aggregation',
    'fields.price.aggregate-function' = 'max',
    'fields.sales.aggregate-function' = 'sum'
)

3:结果

//插入数据
+----+-------------+-------+-------------+
| op |  product_id | price |       sales |
+----+-------------+-------+-------------+
| +I |           1 |   1.0 |           1 |
+----+-------------+-------+-------------+
//查询结果
+-------------+---------+-------------+
|  product_id |   price |       sales |
+-------------+---------+-------------+
|           1 |     1.0 |           1 |
+-------------+---------+-------------+
//插入数据
+----+-------------+---------+-------------+
| op |  product_id |   price |       sales |
+----+-------------+---------+-------------+
| +I |           1 |    12.9 |           2 |
+----+-------------+---------+-------------+
//最终结果
+-------------+--------+-------------+
|  product_id |  price |       sales |
+-------------+--------+-------------+
|           1 |   12.9 |           3 |
+-------------+--------+-------------+

4:删除

//不支持删除操作,报如下错
 Aggregate function 'max' does not support retraction, If you allow this function to ignore retraction messages, you can configure 'fields.${field_name}.ignore-retract'='true'.
 //添加参数忽略删除操作
 CREATE TABLE if not exists MyTable (
    product_id BIGINT,
    price DOUBLE,
    sales BIGINT,
    PRIMARY KEY (product_id) NOT ENFORCED
) WITH (
    'merge-engine' = 'aggregation',
    'fields.price.aggregate-function' = 'max',
    'fields.price.ignore-retract' = 'true',
    'fields.sales.ignore-retract' = 'true',
    'fields.sales.aggregate-function' = 'sum'
);

数据读取方式

批模式

结论

不支持删除操作,只支持批模式读取,按照指定的聚合函数对指定字段进行聚合

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值