PAIMON常用命令

问题点,yarn上任务的ApplicationMaster点击没有反应

首先在yarn-site.xml添加如下配置
  <property>
    <name>yarn.web-proxy.address</name>
    <value>master01:8089</value>
  </property>
 
cd /opt/installs/hadoop3.1.4/
./sbin/yarn-daemon.sh start proxyserver 
hive的后台挂起命令
nohup hiveserver2 > /tmp/hivesql2.log 2>&1 &

flink后端运行一个yarn-session

cd /opt/installs/flink-1.17.0
./bin/yarn-session.sh -d

进入flinksql进行paimon的演示

./bin/sql-client.sh  -s yarn-session

结果显示模式的转换为图表类型

SET 'sql-client.execution.result-mode' = 'tableau';

文件系统作为Catalog的存储位置

create catalog fs_catalog with(
	'type'='paimon',
	'warehouse'='hdfs://192.168.130.200:8020/paimon/fs'
);

使用hive的catalog作为元数据的管理,首先要启动Hive的metaStore

自己搭建的hive使用的metastore一般都是内置的,生产中一般换成外置的如下
cd /opt/installs/hive/conf/
vim hive-site.xml


<!--外置的Metastore-->
        <property>
            <name>hive.metastore.uris</name>
            <value>thrift://pa:9083</value>
        </property>

后台启动hive的metastore
nohup hive  --service metastore &
查看是否启动
[root@pa flink-1.17.0]# netstat -anp| grep 9083
tcp6       0      0 :::9083                 :::*                    LISTEN      16327/java

创建hive的catalog

 create catalog  hive_catalog with(
 'type'='paimon'
 ,'metastore'='hive'
 ,'uri'='thrift://pa:9083'
 ,'hive-conf-dir'='/opt/installs/hive/conf'
 ,'warehouse'='hdfs://pa:8020/paimon/hive'
 );

use catalog hive_catalog;

上述的所有文件,在sql客户端关闭之后,在次登录就会查询不到,但是真实的数据依然存在与fs,此时需要创建一个初始化文件

cd /opt/installs/flink-1.17.0/conf
vim sql-client-init.sql

将创建catalog的语句放在此文件下,每次启动都会先加载catalog


create catalog fs_catalog with(
        'type'='paimon',
        'warehouse'='hdfs://192.168.130.200:8020/paimon/fs'
);

 create catalog  hive_catalog with(
 'type'='paimon'
 ,'metastore'='hive'
 ,'uri'='thrift://pa:9083'
 ,'hive-conf-dir'='/opt/installs/hive/conf'
 ,'warehouse'='hdfs://pa:8020/paimon/hive'
 );
use catalog fs_catalog;
SET 'sql-client.execution.result-mode' = 'tableau';

再次启动sql客户端的命令为

 ./bin/sql-client.sh  -s yarn-session -i conf/sql-client-init.sql

paimon的DDL

Flink SQL> show current catalog;
+----------------------+
| current catalog name |
+----------------------+
|           fs_catalog |
+----------------------+

表分为管理表【类似hive的内部表】,外部表吗,临时表

cts,创建的表只会复制表的schemavalue,但是类似于主键和分区等信息不会被复制,需要使用with属性自己声明
Flink SQL> desc test;
+----------+--------+-------+----------------------+--------+-----------+
|     name |   type |  null |                  key | extras | watermark |
+----------+--------+-------+----------------------+--------+-----------+
|  user_id | BIGINT | FALSE | PRI(dt, hh, user_id) |        |           |
|  item_id | BIGINT |  TRUE |                      |        |           |
| behavior | STRING |  TRUE |                      |        |           |
|       dt | STRING | FALSE | PRI(dt, hh, user_id) |        |           |
|       hh | STRING | FALSE | PRI(dt, hh, user_id) |        |           |
+----------+--------+-------+----------------------+--------+-----------+
create table test_n1 with ('primary-key'='user_id,dt','partition'='dt','file.format'='parquet') as select * from test;
Flink SQL> desc test_n1;
+----------+--------+-------+------------------+--------+-----------+
|     name |   type |  null |              key | extras | watermark |
+----------+--------+-------+------------------+--------+-----------+
|  user_id | BIGINT | FALSE | PRI(user_id, dt) |        |           |
|  item_id | BIGINT |  TRUE |                  |        |           |
| behavior | STRING |  TRUE |                  |        |           |
|       dt | STRING | FALSE | PRI(user_id, dt) |        |           |
|       hh | STRING | FALSE |                  |        |           |
+----------+--------+-------+------------------+--------+-----------+

上述的CTS会复制数据和基本的表结构,但是主键和其他信息不会复制,使用LIKE会复制表的所有结构信息但是不会复制数据。

Flink SQL> create table test_like1 LIKE test;
[INFO] Execute statement succeed.

Flink SQL> desc test_like1;
+------+--------+-------+-----------------+--------+-----------+
| name |   type |  null |             key | extras | watermark |
+------+--------+-------+-----------------+--------+-----------+
|   id | BIGINT | FALSE | PRI(dt, hh, id) |        |           |
| name | STRING |  TRUE |                 |        |           |
|  age | BIGINT |  TRUE |                 |        |           |
|   dt | STRING | FALSE | PRI(dt, hh, id) |        |           |
|   hh | STRING | FALSE | PRI(dt, hh, id) |        |           |
+------+--------+-------+-----------------+--------+-----------+
5 rows in set

PAIMON创建外部表【与HIVE一样】

CREATE TABLE ex (
    user_id BIGINT,
    item_id BIGINT,
    behavior STRING,
    dt STRING,
    hh STRING,
    PRIMARY KEY (dt, hh, user_id) NOT ENFORCED
) WITH (
    'connector' = 'paimon',
    'path' = 'hdfs://pa:8020/paimon/external/ex',
    'auto-create' = 'true'     --如果文件不存在就直接创建文件路径
);

insert into  ex VALUES(1001,1001,'搭建','2023/03/18','21:56:23');

创建临时表

CREATE TEMPORARY TABLE temp (
    k INT,
    v STRING
) WITH (
    'connector' = 'filesystem',
    'path' = 'hdfs://hadoop102:8020/temp.csv',
    'format' = 'csv'
);

insert overwrite只适合批

RESET 'execution.checkpointing.interval';         --重置checkpoint 
SET 'execution.runtime-mode' = 'batch';           --设置运行模式为批

插入没有分区的 表
select * from test;
+---------+---------+----------+------------+----------+
| user_id | item_id | behavior |         dt |       hh |
+---------+---------+----------+------------+----------+
|    1001 |    1001 |     喝酒 | 2023-12-21 | 12:30:21 |
|    1001 |    1001 |     喝酒 | 2023-12-22 | 12:30:21 |
|    1004 |    1004 |     喝酒 | 2024-12-22 | 12:30:21 |
+---------+---------+----------+------------+----------+
insert  OVERWRITE  test values(2001,2001,'搭建','2023-12-31','12:32:45');
+---------+---------+----------+------------+----------+
| user_id | item_id | behavior |         dt |       hh |
+---------+---------+----------+------------+----------+
|    2001 |    2001 |     搭建 | 2023-12-31 | 12:32:45 |
+---------+---------+----------+------------+----------+

插入有分区的表
select * from test_p;
+---------+---------+----------+------------+----------+
| user_id | item_id | behavior |         dt |       hh |
+---------+---------+----------+------------+----------+
|    1004 |    1004 |     喝酒 | 2024-12-22 | 12:30:21 |
|    1001 |    1001 |     喝酒 | 2023-12-22 | 12:30:21 |
|    1001 |    1001 |     喝酒 | 2023-12-21 | 12:30:21 |
+---------+---------+----------+------------+----------+
可以不用写分区的字段,会自动识别
insert  OVERWRITE  test values(2001,2001,'搭建','2023-12-22','12:32:45');

paimon的merge into 方法

CREATE TABLE ws1 (
    id INT,
    ts BIGINT,
    vc INT,
    PRIMARY KEY (id) NOT ENFORCED
);

INSERT INTO ws1 VALUES(1,1,1),(2,2,2),(3,3,3);


CREATE TABLE ws_t (
    id INT,
    ts BIGINT,
    vc INT,
    PRIMARY KEY (id) NOT ENFORCED
);
INSERT INTO ws_t VALUES(2,2,2),(3,3,3),(4,4,4),(5,5,5);


案例一: ws_t与ws1匹配id,将ws_t中ts>2的vc改为10,ts<=2的删除
bin/flink run \
    /opt/jar/paimon-flink-action-0.5-20230703.002437-53.jar \
    merge-into \
    --warehouse hdfs://pa:8020/paimon/hive \
    --database test \
    --table ws_t \
    --source-table test.ws1 \
    --on "ws_t.id = ws1.id" \
    --merge-actions matched-upsert,matched-delete    \
    --matched-upsert-condition "ws_t.ts > 2" \
    --matched-upsert-set "vc = 10"          \
	--matched-delete-condition "ws_t.ts<=2" 

案例二: ws_t与ws1匹配id,匹配上的将ws_t中vc加10,ws1中没匹配上的插入ws_t中
bin/flink run \
    /opt/jar/paimon-flink-action-0.5-20230703.002437-53.jar \
    merge-into \
    --warehouse hdfs://pa:8020/paimon/hive \
    --database test \
    --table ws_t \
    --source-table test.ws1 \
    --on "ws_t.id = ws1.id" \
    --merge-actions matched-upsert,not-matched-insert    \
    --matched-upsert-set "vc = ws_t.vc + 10"          \
	--not-matched-insert-values "*"  

案例三: ws_t与ws1匹配id,ws_t中没匹配上的,ts大于4则vc加20,ts=4则删除

bin/flink run \
	/opt/jar/paimon-flink-action-0.5-20230703.002437-53.jar \
	merge-into \
	--warehouse hdfs://pa:8020/paimon/hive \
	--database test \
	--table ws_t \
	--source-table test.ws1 \
	--on "ws_t.id = ws1.id" \
	--merge-actions not-matched-by-source-upsert,not-matched-by-source-delete    \
	--not-matched-by-source-upsert-condition "ws_t.id > 4"      \
	--not-matched-by-source-upsert-set "vc = ws_t.vc + 20"          \
	--not-matched-by-source-delete-condition "ws_t.id = 4"  

案例四: 使用--source-sql创建新catalog下的源表,匹配ws_t的id,没匹配上的插入ws_t

bin/flink run \
	/opt/jar/paimon-flink-action-0.5-20230703.002437-53.jar \
	merge-into \
	--warehouse hdfs://pa:8020/paimon/hive \
	--database test \
	--table ws_t \
    --source-sql "CREATE CATALOG fs2 WITH ('type' = 'paimon','warehouse' = 'hdfs://pa:8020/paimon/fs2')" \
    --source-sql "CREATE DATABASE IF NOT EXISTS fs2.test" \
    --source-sql "CREATE TEMPORARY VIEW fs2.test.ws2 AS SELECT id+10 as id,ts,vc FROM test.ws1" \
    --source-table fs2.test.ws2 \
    --on "ws_t.id = ws2. id" \
    --merge-actions not-matched-insert\
    --not-matched-insert-values "*"
时间旅行【批查询】

在这里插入图片描述
选择对应的快照,点击进去获取时间戳
在这里插入图片描述

消费指定版本的数据
根据id读取快照数据
select * from ws_t /*+ OPTIONS('scan.snapshot-id'='1') */;
根据时间戳读取快照数据
select * from ws_t /* + OPTIONS('scan.timestamp-millis'='1711379413843')*/;

查询系统表:
select * from ws_t$snapshots;
查询系统的标签表
select * from ws_t$tags;

增量查询【未掌握】

读取   (开始快照,结束快照]  之间的增量更改。例如读取第三个快照到第五个快照之间变更的数据
select * from ws_t /* + OPTIONS('incremental-between'='1,8' )*/;  --查询到的数据是变更的数据

流式查询
读取最新的快照数据
select * from sxc /* +OPTIONS('scan.mode'='latest')*/;

从最新开始读取,设置扫描参数
select * from ws_t /* +OPTIONS('scan.mode'='latest')*/;

流式查询也有时间旅行,操作指令与批的时间旅行一样

读取指定快照的全量数据并继续读取变化
select * from ws_t /*+ OPTIONS('scan.mode'='from-snapshot-full','scan.snapshot-id'='3') */;
--返回的数据包括第三次的全量数据和后续的数据变化

Consumer_id[流] 流式查询的消费id功能
指定consumer_id,类似与kafka的消费者,在此消费的时候通过consumer_id会继续上次的数据消费
演示案例:
select * from ws_t /*+ OPTIONS('consumer-id'='aa') */;
+----+-------------+----------------------+-------------+
| op |          id |                   ts |          vc |
+----+-------------+----------------------+-------------+
| +I |           1 |                    1 |           1 |
| +I |           2 |                    2 |           2 |
| +I |           3 |                    3 |          20 |
| +I |           5 |                    5 |          25 |
| +I |          11 |                    1 |           1 |
| +I |          12 |                    2 |           2 |
| +I |          13 |                    3 |           3 |
insert into ws_t values(9,9,9);
+----+-------------+----------------------+-------------+
| op |          id |                   ts |          vc |
+----+-------------+----------------------+-------------+
| +I |           9 |                    9 |           9 |

查看系统表

-- 查看快照信息
 select * from  ws_t$snapshots;
 --查看schema的信息
select * from  ws_t$schemas;
--查看表的参数
select * from  ws_t$schemas;    --不显示都代表为默认参数
--查看审计日志(数据的变化过程,批)
select * from ws_t$audit_log;
--查询最新的快照对应的文件
select * from ws_t$files;
--查询指定的快照对应的文件
select * from ws_t$files /* + OPTIONS('scan.snapshot-id'='1') */ ;

LookUP Join 维表JOIN

CREATE TABLE customers (
    id INT PRIMARY KEY NOT ENFORCED,
    name STRING,
    country STRING,
    zip STRING
);

INSERT INTO customers VALUES(1,'zs','ch','123'),(2,'ls','ch','456'), (3,'ww','ch','789');


CREATE TEMPORARY TABLE Orders (
    order_id INT,
    total INT,
    customer_id INT,
    proc_time AS PROCTIME()
) WITH (
    'connector' = 'datagen', 
    'rows-per-second'='1', 
    'fields.order_id.kind'='sequence', 
    'fields.order_id.start'='1', 
'fields.order_id.end'='1000000', 
    'fields.total.kind'='random', 
    'fields.total.min'='1', 
    'fields.total.max'='1000', 
    'fields.customer_id.kind'='random', 
    'fields.customer_id.min'='1', 
    'fields.customer_id.max'='3'
);



lookup   join




select * from Orders as o 
JOIN  customers 
for SYSTEM_TIME AS OF o.proc_time as c on o.customer_id = c.id;

CDC

MysqlCDC
将flinkcdc的jar包拷贝到flink的lib包下:




paimon官网操作API
https://paimon.apache.org/docs/master/how-to/altering-tables/

  • 4
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值