# Flink适配(S3)实现保存点上传

Flink适配S3实现保存点上传

Amazon Simple Storage Service (Amazon S3) 提供用于多种场景的云对象存储。S3 可与 Flink 一起使用以读取、写入数据,并可与 流的 State backends 相结合使用。

​ Flink 提供两种文件系统插件用来与 S3 交互:flink-s3-fs-prestoflink-s3-fs-hadoop,Flink官方建议checkpoint的存储使用flink-s3-fs-presto进行实现,但是当认证方式为STS方式时,只能用flink-s3-fs-hadoop插件进行实现

​ 可参考Flink官方文档:https://nightlies.apache.org/flink/flink-docs-release-1.16/zh/docs/deployment/filesystems/s3/

​ 参考代码:Flink源码中的 flink-filesystems模块、插件源码

一、单机版Flink配置,在Flink web UI进行操作

1、配置flink-s3-fs-presto插件

# 进入到flink的根目录下后,在plugins下创建文件夹,并将opt下的插件复制到该文件夹下
mkdir ./plugins/s3-fs-presto
cp ./opt/flink-s3-fs-presto-1.16.0.jar ./plugins/s3-fs-presto/

2、配置flink-conf.yaml配置文件

在配置文件的# state.backend: hashmap下插入如下配置

state.backend: filesystem
fs.allowed-fallback-filesystems: s3
state.checkpoints.dir: s3://test0001/checkpoints # "s3://<your-bucket>/<endpoint>" 
state.savepoints.dir: s3://test0001/savepoints
state.backend.incremental: true
s3.access-key: ffjdgkjdslk
s3.secret-key: lzsEitkc4YT0MAWgZq1j6g
s3.ssl.enabled: true
s3.path.style.access: true
s3.endpoint: https://myoss.store:10000

3、重启Flink

./bin/stop-cluster.sh
./bin/start-cluster.sh

4、编写Flink jar任务开启checkpoint

/**
 * @Author wangkeshuai
 * @Date 2023/4/7 10:37
 * @Description 使用Flink进行无界流处理
 */
public class StreamWordCount {

    public static void main(String[] args) throws Exception{
        //1. 创建流式执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //2. 使用socket方式读取文本流,用 nc -lk 7777进行输入
        DataStreamSource<String> lineDSS = env.socketTextStream("localhost", 7777);
        //3. 转换数据格式
        SingleOutputStreamOperator<Tuple2<String, Long>> wordAndOne = lineDSS.flatMap((String line, Collector<String> words) -> {
                    Arrays.stream(line.split(" ")).forEach(words::collect);
                })
                .returns(Types.STRING)
                .map(word -> Tuple2.of(word, 1L))
                .returns(Types.TUPLE(Types.STRING, Types.LONG));
       //4. 开启checkpoint,并20秒存储一次
        env.enableCheckpointing(20000);
       //5. 分组
        KeyedStream<Tuple2<String, Long>, String> wordAndOneKS = wordAndOne.keyBy(t -> t.f0);
        //6. 求和
        SingleOutputStreamOperator<Tuple2<String, Long>> result = wordAndOneKS.sum(1);
        //7. 打印
        result.print();
        //8. 执行
        env.execute();
    }

}

5、打包,Flink提交作业,查看作业的checkpoint信息

在这里插入图片描述

6、在Flink的bin目录下执行保存savepoint命令

wangkeshuai@wangkeshuaideMacBook-Air bin % ./flink savepoint efe270f12e0fa756ed9711b64b97c4ac
Triggering savepoint for job efe270f12e0fa756ed9711b64b97c4ac.
Waiting for response...
Savepoint completed. Path: s3://test0001/savepoints/savepoint-efe270-c62ef333961f
You can resume your program from this savepoint with the run command.

7、调用接口查询OSS桶中的存储对象

S3ObjectSummary{bucketName='test0001', key='checkpoints/50cd5e82bce1167bffa76a932a737a81/chk-1232/_metadata', eTag='1d93edcc2bfae7ea401de3224ab5ea5d', size=3876, lastModified=Thu May 11 14:28:15 CST 2023, storageClass='STANDARD', owner=S3Owner [name=test-flink,id=test-flink]}

S3ObjectSummary{bucketName='test0001', key='savepoints/savepoint-efe270-67bc83ccd4fd/_metadata', eTag='281c94e3d96c3e782690f4ebe0a9a923', size=2686, lastModified=Wed May 10 17:05:58 CST 2023, storageClass='STANDARD', owner=S3Owner [name=test-flink,id=test-flink]}

S3ObjectSummary{bucketName='test0001', key='savepoints/savepoint-efe270-c62ef333961f/_metadata', eTag='68c921824c52456bbda0c56732a60be4', size=2686, lastModified=Wed May 10 17:06:53 CST 2023, storageClass='STANDARD', owner=S3Owner [name=test-flink,id=test-flink]}

8、在Flink Web UI可指定jar并配置savepoint在OSS中的地址进行作业恢复

在这里插入图片描述

二、在StreamPark的K8S模式下进行配置(flink-s3-fs-presto插件)

1、配置插件并重新构建Flink基础镜像

  1. DockerFile如下
FROM apache/flink:1.14.3-scala_2.12
RUN mkdir -p $FLINK_HOME/usrlib

RUN chown flink:flink $FLINK_HOME/lib/*.jar

COPY flink-s3-fs-presto-1.14.3.jar $FLINK_HOME/plugins/s3-fs-presto/flink-s3-fs-presto-1.14.3.jar

RUN chown -R flink:flink $FLINK_HOME/plugins/
  1. 通过docker build 进行构建镜像,然后将其推送到镜像仓库
docker build -t lvlin241/wks-flink-xos:v1
docker push lvlin241/wks-flink-xos:v1

2、savepoint配置作业,上传至OSS及恢复作业

  1. sp创建flink作业时,flink base image 输入第1步的镜像名称 ,由于flink base image输入框页面上已经隐藏,可以创建作业后在数据库中进行修改,Flink SQL示例填写如下:
drop table if exists StreamSourceTable;
drop table if exists StreamSinkTable;
CREATE TABLE StreamSourceTable (
  content STRING
) WITH (
  'connector' = 'datagen'
);
CREATE TABLE StreamSinkTable (
  orgin STRING
) WITH (
  'connector' = 'print'
);
INSERT INTO StreamSinkTable SELECT content FROM StreamSourceTable;
  1. 在动态参数中输入连接OSS所需参数:
-Dstate.backend=filesystem
-Dfs.allowed-fallback-filesystems=s3
-Dstate.checkpoints.dir=s3://test0001/checkpoints
-Dstate.savepoints.dir=s3://test0001/savepoints
-Dstate.backend.incremental=true
-Ds3.access-key=gnX50XbxqwFIEYqwTPAW
-Ds3.secret-key=lzsEitk914WzhumYu7G16iM0Hc4YT0MAWgZq1j6g
-Ds3.ssl.enabled=true
-Ds3.path.style.access=true
-Ds3.endpoint=https://nmtcoss.ctyun.store:10003
  1. 填写其他信息后提交发布并运行作业,取消任务时有以下选项,可以不输入自定义路径,系统会自动生成路径并上传到OSS中

在这里插入图片描述

  1. 取消时状态会变为savepoint

在这里插入图片描述

  1. 查看作业详情可以看到savepoint在OSS中的存储位置

在这里插入图片描述

  1. 再次启动任务时可以对savepoint进行选择,apply后会对作业进行恢复

在这里插入图片描述

3、checkpoint配置作业,上传至OSS及恢复作业

  1. 通过Flink SQL的方式开启checkpoint
drop table if exists StreamSourceTable;
drop table if exists StreamSinkTable;
CREATE TABLE StreamSourceTable (
  content STRING
) WITH (
  'connector' = 'datagen'
);
CREATE TABLE StreamSinkTable (
  orgin STRING
) WITH (
  'connector' = 'print'
);
SET execution.checkpointing.interval = 10000;
INSERT INTO StreamSinkTable SELECT content FROM StreamSourceTable;
  1. 在动态参数中输入连接OSS所需参数:
-Dstate.backend=filesystem
-Dfs.allowed-fallback-filesystems=s3
-Dstate.checkpoints.dir=s3://test0001/checkpoints
-Dstate.savepoints.dir=s3://test0001/savepoints
-Dstate.backend.incremental=true
-Ds3.access-key=gnX50XbxqwFIEYqwTPAW
-Ds3.secret-key=lzsEitk914WzhumYu7G16iM0Hc4YT0MAWgZq1j6g
-Ds3.ssl.enabled=true
-Ds3.path.style.access=true
-Ds3.endpoint=https://nmtcoss.ctyun.store:10003
  1. 查看作业详情中的checkpoint信息

在这里插入图片描述

  1. 通过接口调用查看OSS中的存储信息
S3ObjectSummary{bucketName='test0001', key='checkpoints/50cd5e82bce1167bffa76a932a737a81/chk-2048/_metadata', eTag='9879cacb07bc17533bf3db32ac9eb35d', size=3876, lastModified=Thu May 11 19:03:40 CST 2023, storageClass='STANDARD', owner=S3Owner [name=test-flink,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='eflink/zkx-1/mysql-connector-java-5.1.46 - 副本 (14).jar', eTag='83ff44cdb987f4acaf59c09bd0d560d4', size=1004838, lastModified=Thu Apr 13 16:13:38 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='eflink/zkx-1/mysql-connector-java-5.1.46 - 副本 (19).jar', eTag='83ff44cdb987f4acaf59c09bd0d560d4', size=1004838, lastModified=Thu Apr 13 15:39:37 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='eflink/zkx-1/mysql-connector-java-5.1.46 - 副本 (2).jar', eTag='83ff44cdb987f4acaf59c09bd0d560d4', size=1004838, lastModified=Wed Apr 12 14:36:52 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='eflink/zkx-1/mysql-connector-java-5.1.46 - 副本 (20).jar', eTag='83ff44cdb987f4acaf59c09bd0d560d4', size=1004838, lastModified=Thu Apr 13 15:38:46 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='eflink/zkx-1/mysql-connector-java-5.1.46 - 副本 (3).jar', eTag='83ff44cdb987f4acaf59c09bd0d560d4', size=1004838, lastModified=Wed Apr 12 14:36:57 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='eflink/zkx-1/mysql-connector-java-5.1.46 - 副本 (4).jar', eTag='83ff44cdb987f4acaf59c09bd0d560d4', size=1004838, lastModified=Wed Apr 12 14:37:04 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='eflink/zkx-1/mysql-connector-java-5.1.46 - 副本 (5).jar', eTag='83ff44cdb987f4acaf59c09bd0d560d4', size=1004838, lastModified=Wed Apr 12 14:37:09 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='eflink/zkx-1/mysql-connector-java-5.1.46 - 副本 (8).jar', eTag='83ff44cdb987f4acaf59c09bd0d560d4', size=1004838, lastModified=Thu Apr 13 16:43:47 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='eflink/zkx-1/mysql-connector-java-5.1.46 - 副本.jar', eTag='83ff44cdb987f4acaf59c09bd0d560d4', size=1004838, lastModified=Wed Apr 12 14:36:33 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='eflink/zkx-1/mysql-connector-java-5.1.46.jar', eTag='83ff44cdb987f4acaf59c09bd0d560d4', size=1004838, lastModified=Thu Apr 13 16:15:29 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='hello', eTag='fc3ff98e8c6a0d3087d515c0473f8677', size=12, lastModified=Tue Apr 04 13:54:20 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='hello world huch!', eTag='1fb53c2ac4f48714170c518cf3c2f793', size=50, lastModified=Tue Apr 11 15:02:14 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='hello world!', eTag='1fb53c2ac4f48714170c518cf3c2f793', size=50, lastModified=Tue Apr 11 15:00:15 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='mysql-connector-java-5.1.46 - 副本 (2).jar', eTag='83ff44cdb987f4acaf59c09bd0d560d4', size=1004838, lastModified=Tue Apr 11 09:44:06 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='mysql-connector-java-5.1.46 - 副本 (3).jar', eTag='83ff44cdb987f4acaf59c09bd0d560d4', size=1004838, lastModified=Tue Apr 11 09:44:12 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='mysql-connector-java-5.1.46 - 副本 (4).jar', eTag='83ff44cdb987f4acaf59c09bd0d560d4', size=1004838, lastModified=Tue Apr 11 09:44:18 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='mysql-connector-java-5.1.46 - 副本.jar', eTag='83ff44cdb987f4acaf59c09bd0d560d4', size=1004838, lastModified=Tue Apr 11 09:44:01 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='savepoints/savepoint-b9d51d-29d670f70c04/_metadata', eTag='fba87a310f733bb1858484000e14191d', size=82, lastModified=Mon May 15 18:29:11 CST 2023, storageClass='STANDARD', owner=S3Owner [name=test-flink,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='savepoints/savepoint-cb694d-fa713bcea08d/_metadata', eTag='9d05a4a12945bde7889b3a2faa2b8532', size=82, lastModified=Fri May 12 16:58:55 CST 2023, storageClass='STANDARD', owner=S3Owner [name=test-flink,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='savepoints/savepoint-efe270-67bc83ccd4fd/_metadata', eTag='281c94e3d96c3e782690f4ebe0a9a923', size=2686, lastModified=Wed May 10 17:05:58 CST 2023, storageClass='STANDARD', owner=S3Owner [name=test-flink,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='savepoints/savepoint-efe270-c62ef333961f/_metadata', eTag='68c921824c52456bbda0c56732a60be4', size=2686, lastModified=Wed May 10 17:06:53 CST 2023, storageClass='STANDARD', owner=S3Owner [name=test-flink,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='savepoints/savepoint-fe2c3e-3dc2d9fca7fa/_metadata', eTag='8c495d3fcb46318d05d2aa1f35fca092', size=82, lastModified=Fri May 12 16:25:24 CST 2023, storageClass='STANDARD', owner=S3Owner [name=test-flink,id=test-flink]}
S3ObjectSummary{bucketName='test0001', key='zkx-file-1.txt', eTag='8ee0a895435e8468f3fa4b15333b0ad5', size=21, lastModified=Tue Apr 11 15:33:44 CST 2023, storageClass='STANDARD', owner=S3Owner [name=,id=test-flink]}
数量:24

三、使用STS认证方式在StreamPark的K8S模式下进行配置(flink-s3-fs-hadoop插件)

使用STS方式时,操作步骤同第二部分,不同如下:

1、构建镜像时plugins下的jar包改为 flink-s3-fs-hadoop-1.14.3.jar

2、动态参数改为(注意STS方式的AK/SK/TOKEN需要从OSS接口获取,有效期最长12h):

-Dstate.backend=filesystem
-Dfs.allowed-fallback-filesystems=s3
-Dstate.checkpoints.dir=s3://test0001/checkpoints
-Dstate.savepoints.dir=s3://test0001/savepoints
-Dstate.backend.incremental=true
-Dfs.s3a.access.key=vFHeUzrhFGna8X78hMl
-Dfs.s3a.secret.key=1PXKFE8TQJ4JS60AAQAMC9YAHSGPVWR0WLCFFE5
-Dfs.s3a.session.token=lMFucXqgPxpjWLWh0hg9eh5nOB3u9MaD6d6cQLsuk1yzUBdnTim7m+thYoUixuASwcLq4IPpEmolW3YgAVzbYPbADyDtaLPaRSEF2gOQm1AV56+wVr+b+kjvuiZmcYziw9NJoZjMvO0CNUYmg46y32cYOIp4JsFkKLPL36gTHdFs4gL0d1oixlCRidQz/KaJQaxPC0XQXaD8CBX3w2XOLGBbikRMEcwhbKCEsHR1M9PP9s5Qkchn7VxYj/bjdzxf0ZWfr9E3O4yvmeL1jz5KpkDIYbIn15U4uXJH6MTVpKorHRgJfNd2eA47g00zfVAcNLQNLpCYXrg7Z/KZD5fUn1UO/u7z//Mrnv/z6vZ5JrCxOazVml6W8Rwi4YFCy7AahzVb2HakFUd8slNAe0NsGvfKtvCXodhfd6St4QXIUfOAZn7t+rw5FhNYpDA8wj5F2qRbbmURvGGFr8nbdiBnnZXUjVLd4qRDBEZmBQPdfWZiuDwVb2fk2Do/yXQv7agc
-Dfs.s3a.ssl.enabled=true
-Dfs.s3a.path.style.access=true
-Dfs.s3a.endpoint=https://nmtcoss.ctyun.store:10003
-Dfs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider

3、checkpoint上传后,在OSS中存储方式与presto插件不同,hadoop方式多了shared和taskowned目录,且任务取消时无法自动删除

问题:

1、STS认证方式Flink动态加载配置文件问题

2、STS认证方式的AK/SK/TOKEN过期问题

3、STS认证方式的checkpoint取消后,shared和taskowned目录没有删除

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值