目录
业务背景
需要针对商品属性做非常复杂的查询,商品属性分散在5,6张表中,需要将数据抽取到es中,方便筛选查询,又因为业务对实时性要求较高,故选用flink做实时在线同步.
技术选型
初期使用flink sql开发,join全部的表数据写入elasticSearch.
CREATE TEMPORARY TABLE es_sink_beesforce_poc_list (
id STRING,
pocMiddleId STRING,
...,
validationType INT,
PRIMARY KEY (id) NOT ENFORCED -- 主键可选,如果定义了主键,则作为文档ID,否则文档ID将为随机值。
) WITH (
'connector' = 'elasticsearch-7',
'index' = '*****',
'hosts' = '*****',
'username' ='*****',
'password' ='*****'
);
INSERT INTO es_sink_beesforce_poc_list WITH -- 渠道 层级 1
channel_info_level_1 AS (
SELECT
channel.channel_code,
channel.channel_name,
channel.channel_code AS parent_channel_code,
channel.channel_name AS parent_channel_name
FROM
`****`.`****`.poc_channel_info /*+ OPTIONS('server-id'='5400-5409') */
AS channel
WHERE
channel.channel_level = 1
),-- 渠道 层级 2
channel_info_level_2 AS (
SELECT
channel.channel_code,
channel.channel_name,
concat( parent.parent_channel_code, ',', channel.channel_code ) AS parent_channel_code,
concat( parent.parent_channel_name, '-', channel.channel_name ) AS parent_channel_name
FROM
`****`.`****`.poc_channel_info /*+ OPTIONS('server-id'='5410-5419') */
AS channel
LEFT JOIN channel_info_level_1 AS parent ON channel.parent_channel_code = parent.channel_code
WHERE
channel.channel_level = 2
),-- 渠道 层级 3
channel_info_level_3 AS (
SELECT
channel.channel_code,
channel.channel_name,
concat( parent.parent_channel_code, ',', channel.channel_code ) AS parent_channel_code,
concat( parent.parent_channel_name, '-', channel.channel_name ) AS parent_channel_name
FROM
`****`.`****`.poc_channel_info /*+ OPTIONS('server-id'='5420-5429') */
AS channel
LEFT JOIN channel_info_level_2 AS parent ON channel.parent_channel_code = parent.channel_code
WHERE
channel.channel_level = 3
),-- 渠道 层级 4
channel_info_level_4 AS (
SELECT
channel.channel_code,
channel.channel_name,
concat( parent.parent_channel_code, ',', channel.channel_code ) AS parent_channel_code,
concat( parent.parent_channel_name, '-', channel.channel_name ) AS parent_channel_name
FROM
`****`.`****`.poc_channel_info /*+ OPTIONS('server-id'='5430-5439') */
AS channel
LEFT JOIN channel_info_level_3 AS parent ON channel.parent_channel_code = parent.channel_code
WHERE
channel.channel_level = 4
),-- 查询 渠道层级
channel_info_level AS ( SELECT * FROM channel_info_level_1 UNION ALL SELECT * FROM channel_info_level_2 UNION ALL SELECT * FROM channel_info_level_3 UNION ALL SELECT * FROM channel_info_level_4 )
SELECT
concat(
poc_info.poc_middle_id,
'_',
IF
( salesman_ref.id IS NOT NULL, salesman_ref.id, '' )) AS id,
poc_info.poc_middle_id AS pocMiddleId,
...,
poc_info.validation_type AS validationType
FROM
`****`.`****`.poc_base_info /*+ OPTIONS('server-id'='5440-5449') */
AS poc_info
LEFT JOIN (
SELECT
label_ref.poc_middle_id,
LISTAGG ( label_info.label_code ) label_code,
LISTAGG ( label_info.label_name ) label_name
FROM
`****`.`****`.poc_label_ref /*+ OPTIONS('server-id'='5450-5459') */
AS label_ref
INNER JOIN `****`.`****`.poc_label_info /*+ OPTIONS('server-id'='5460-5469') */
AS label_info ON label_ref.label_code = label_info.label_code
AND label_ref.deleted = 0
GROUP BY
label_ref.poc_middle_id
) label_info ON poc_info.poc_middle_id = label_info.poc_middle_id
LEFT JOIN channel_info_level AS channel_info ON poc_info.channel_format_code = channel_info.channel_code
LEFT JOIN `****`.`****`.poc_salesman_ref /*+ OPTIONS('server-id'='5470-5479') */
AS salesman_ref ON poc_info.poc_middle_id = salesman_ref.poc_middle_id
LEFT JOIN `****`.`****`.poc_extend_info /*+ OPTIONS('server-id'='5480-5489') */
AS extend_info ON poc_info.poc_middle_id = extend_info.poc_middle_id
LEFT JOIN `****`.`****`.wccs_dict_info /*+ OPTIONS('server-id'='5490-5499') */
AS wccs_chain ON extend_info.wccs_chain_code_2022_version = wccs_chain.dict_code
AND wccs_chain.dict_type = 4
LEFT JOIN `****`.`****`.wccs_dict_info /*+ OPTIONS('server-id'='5500-5509') */
AS wccs_grade ON extend_info.wccs_grade_code_2022_version = wccs_grade.dict_code
AND wccs_grade.dict_type = 6
LEFT JOIN `****`.`****`.poc_bees_project_info /*+ OPTIONS('server-id'='6300-6309') */
AS bees_project_info ON poc_info.poc_middle_id = bees_project_info.poc_middle_id
AND bees_project_info.deleted = 0
WHERE
poc_info.deleted = 0
AND poc_info.poc_middle_id IS NOT NULL;
逐步开发测试中发现flink sql有很多缺陷,flink社区对部分缺陷也没有成熟的解决方案,具体如下
1.flink sql中双流join的历史数据存在state中,且这些数据是有时效性的,默认36小时,flink启动时,所有涉及到的表数据都被flink加载到state中,36小时之后,state数据会失效(期中如果有更新,会刷新36小时的有效期),此时,flink监听mysql binlog之后join不到数据,部分数据会丢失.
2.如果有多个flink作业同时启动,且监听的mysql地址在一起,会经常报错,导致flink重启(可使用/*+ OPTIONS('server-id'='5430-5439') */语法指定监听binlog的server-id解决该问题),具体原因是mysql cdc source会伪装成mysql集群中的一个slave,每个slave对于主库来说拥有唯一的id,也就是serverId,同时每个serverid下也会记录不用的binlog点位,如果多个slave享有同样的serverid,会导致数据拉取错乱,如果不指定,会随机分配一个,很容易发生重复,最好是各个团队约定好,指定一个唯一的serverid.
3.flink sql自身不支持直接同步elastic search中的nested类型字段(后面调研发现,可以使用UDF上传jar包的方式实现自定义函数)
因为问题1无法解决,项目组决定使用flink datastream作业.
技术可行性研究
为了多张表分布同步,我们取消了之前关联所有数据之后在同步至es的方案,改成每次读到binlog都去更新es,为此必须做到所有表之间对于es单条数据的解耦.同时为了避免从表数据比主表数据先被flink监听到的场景,我们新引入了esLastOptTime字段,不管主表数据有没有被监听到,都在es中新增一个document文档,同时文档中没有id字段,仅有从表业务系统的字段,仅读取到主表数据的时候才给id字段赋值(此id非es的document的id,为业务系统id)
代码实现
maven依赖(必须把provided的scope注释掉,项目才能本地启动)
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<artifactId>abi-cloud-beesforce-data-board</artifactId>
<groupId>com.abi</groupId>
<version>1.0.0-SNAPSHOT</version>
</parent>
<artifactId>abi-cloud-beesforce-data-board-flink</artifactId>
<name>abi-cloud-beesforce-data-board-flink</name>
<description>abi api project</description>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<flink.version>1.13.1</flink.version>
<target.java.version>1.8</target.java.version>
<scala.binary.version>2.11</scala.binary.version>
<maven.compiler.source>${target.java.version}</maven.compiler.source>
<maven.compiler.target>${target.java.version}</maven.compiler.target>
<log4j.version>2.12.4</log4j.version>
</properties>
<repositories>
<repository>
<id>apache.snapshots</id>
<name>Apache Development Snapshot Repository</name>
<url>https://repository.apache.org/content/repositories/snapshots/</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
<dependencies>
<!-- Apache Flink dependencies -->
<!-- These dependencies are provided, because they should not be packaged into the JAR file. -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- Add connector dependencies here. They must be in the default scope (compile). -->
<!-- Example:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-elasticsearch7_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>com.ververica</groupId>
<artifactId>flink-connector-mysql-cdc</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-base</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-common</artifactId>
<version>${flink.version}</version>
</dependency>
<!-- Add logging framework, to produce console output when running in the IDE. -->
<!-- These dependencies are excluded from the application JAR by default. -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>${log4j.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>${log4j.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${log4j.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.24</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>druid</artifactId>
<version>1.1.10</version>
</dependency>
<dependency>
<groupId>cn.hutool</groupId>
<artifactId>hutool-json</artifactId>
<version>5.7.16</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>1.9</version>
</dependency>
</dependencies>
<build>
<plugins>
<!-- Java Compiler -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>${target.java.version}</source>
<target>${target.java.version}</target>
</configuration>
</plugin>
<!-- We use the maven-shade plugin to create a fat jar that contains all necessary dependencies. -->
<!-- Change the value of <mainClass>...</mainClass> if your program entry point changes. -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.1.1</version>
<executions>
<!-- Run shade goal on package phase -->
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<excludes>
<exclude>org.apache.flink:force-shading</exclude>
<exclude>com.google.code.findbugs:jsr305</exclude>
<exclude>org.slf4j:*</exclude>
<exclude>org.apache.logging.log4j:*</exclude>
</excludes>
</artifactSet>
<filters>
<filter>
<!-- Do not copy the signatures in the META-INF folder.
Otherwise, this might cause SecurityExceptions when using the JAR. -->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<!-- <transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.example.StreamingJob</mainClass>
</transformer>
</transformers>-->
</configuration>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<!-- This improves the out-of-the-box experience in Eclipse by resolving some warnings. -->
<plugin>
<groupId>org.eclipse.m2e</groupId>
<artifactId>lifecycle-mapping</artifactId>
<version>1.0.0</version>
<configuration>
<lifecycleMappingMetadata>
<pluginExecutions>
<pluginExecution>
<pluginExecutionFilter>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<versionRange>[3.1.1,)</versionRange>
<goals>
<goal>shade</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore/>
</action>
</pluginExecution>
<pluginExecution>
<pluginExecutionFilter>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<versionRange>[3.1,)</versionRange>
<goals>
<goal>testCompile</goal>
<goal>compile</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore/>
</action>
</pluginExecution>
</pluginExecutions>
</lifecycleMappingMetadata>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
日志打印级别控制,log4j2.properties
################################################################################
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################
rootLogger.level = INFO
rootLogger.appenderRef.console.ref = ConsoleAppender
appender.console.name = ConsoleAppender
appender.console.type = CONSOLE
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n
binlog数据流实体类
@NoArgsConstructor
@Data
public class BinlogStreamBean implements Serializable {
@JsonProperty("before")
private Map<String,Object> before;
@JsonProperty("after")
private Map<String,Object> after;
@JsonProperty("source")
private SourceDTO source;
@JsonProperty("op")
private String op;
@JsonProperty("ts_ms")
private Long tsMs;
@JsonProperty("transaction")
private Object transaction;
@NoArgsConstructor
@Data
public static class SourceDTO {
@JsonProperty("version")
private String version;
@JsonProperty("connector")
private String connector;
@JsonProperty("name")
private String name;
@JsonProperty("ts_ms")
private Integer tsMs;
@JsonProperty("snapshot")
private String snapshot;
@JsonProperty("db")
private String db;
@JsonProperty("sequence")
private Object sequence;
@JsonProperty("table")
private String table;
@JsonProperty("server_id")
private Integer serverId;
@JsonProperty("gtid")
private Object gtid;
@JsonProperty("file")
private String file;
@JsonProperty("pos")
private Integer pos;
@JsonProperty("row")
private Integer row;
@JsonProperty("thread")
private Object thread;
@JsonProperty("query")
private Object query;
}
}
部分同步代码
@Slf4j
public class SinkToEsSteaming {
public static void main(String[] args) {
//1.处理main方法入参
ParameterTool parameterTool = ParameterTool.fromArgs(args);
//2.数据处理
ElasticsearchSinkFunction<String> elasticsearchSinkFunction = (row, ctx, indexer) -> {
//1-数据准备
//获取全局参数
ParameterTool parameterTool1 = (ParameterTool) ctx.getExecutionConfig().getGlobalJobParameters();
//索引名
String indexName = "es_sink_jar_beesforce_poc_list" + EnvUtil.getUnderscoreEnv(parameterTool1.get("env"));
//binlog数据
BinlogStreamBean binlogStreamBean = JSONUtil.toBean(row, BinlogStreamBean.class);
//2-获取实际操作类型
String optType = binlogStreamBean.getOp();
if("u".equals(optType)){
String deleted = String.valueOf(binlogStreamBean.getAfter().get("deleted"));
//特殊处理逻辑删除
if("1".equals(deleted)){
optType = "d";
}else{
optType = "u";
}
}
//3-获取操作的request
List<UpdateRequest> updateRequestList = Lists.newArrayList();
Map<String, Object> data = Maps.newHashMap();
if("c".equals(optType)){
data = binlogStreamBean.getAfter();
}else if("u".equals(optType)){
data = binlogStreamBean.getAfter();
}else if("d".equals(optType)){
data = binlogStreamBean.getBefore();
}else if ("r".equals(optType)) {
//读取期初数据只需读取deleted等于0的数据
String deleted = String.valueOf(binlogStreamBean.getAfter().get("deleted"));
if (!"1".equals(deleted)) {
data = binlogStreamBean.getAfter();
}
}
//es最后一次操作的事件
data.put("esLastOptTime",new Date().getTime());
//id
String id = IStringUtil.valueOf(data.get("poc_middle_id"));
IndexRequest indexRequest = Requests.indexRequest(indexName)
.id(id)
.source(data);
UpdateRequest updateRequest = new UpdateRequest(indexName, id)
.doc(indexRequest)
.upsert(indexRequest);
updateRequestList.add(updateRequest);
//4-放入indexer
for (UpdateRequest esUpdateRequest : updateRequestList) {
indexer.add(esUpdateRequest);
}
};
//3.设置Elasticsearch连接与漏斗处理方法
ElasticsearchSink<String> esSink = getEsSink(elasticsearchSinkFunction, parameterTool);
//4-拿到mysql binlog的输入流,并绑定漏斗
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(3000);
DataStreamSource<String> dataStreamSource = env
.fromSource(getMysqlSource(parameterTool), WatermarkStrategy.noWatermarks(), "MySQL Source");
//设置全局参数
env.getConfig().setGlobalJobParameters(parameterTool);
String parallelism = parameterTool.get("parallelism");
if (StringUtils.isNotBlank(parallelism)) {
env.setParallelism(Integer.parseInt(parallelism));
}
dataStreamSource.addSink(esSink);
try {
dataStreamSource.getExecutionEnvironment().execute();
} catch (Exception e) {
log.info("启动失败");
}
}
private static MySqlSource<String> getMysqlSource(ParameterTool parameterTool) {
String mysqlHost = parameterTool.get("mysqlHost");
String mysqlPort = parameterTool.get("mysqlPort");
String mysqlUsername = parameterTool.get("mysqlUsername");
String mysqlPassword = parameterTool.get("mysqlPassword");
String serverId = parameterTool.get("serverId");
String pocBaseInfo = "poc_base_info";
String pocLabelRef = "poc_label_ref";
String pocSalesmanRef = "poc_salesman_ref";
String pocExtendInfo = "poc_extend_info";
String pocBeesProjectInfo = "poc_bees_project_info";
String pocWholesalerRef = "poc_wholesaler_ref";
String databaseMiddle = "abi-cloud-middle-platform-poc" + EnvUtil.getHorizontalEnv(parameterTool.get("env"));
//创建解析器
Map<String,Object> config = Maps.newHashMap();
config.put(JsonConverterConfig.DECIMAL_FORMAT_CONFIG, DecimalFormat.NUMERIC.name());
JsonDebeziumDeserializationSchema jdd = new JsonDebeziumDeserializationSchema(false, config);
MySqlSource<String> mySqlSource = MySqlSource.<String>builder()
.hostname(mysqlHost)
.port(Integer.parseInt(mysqlPort))
.databaseList(databaseMiddle)
.tableList(
MessageFormat.format("{0}.{1}", databaseMiddle, pocBaseInfo),
MessageFormat.format("{0}.{1}", databaseMiddle, pocLabelRef),
MessageFormat.format("{0}.{1}", databaseMiddle, pocSalesmanRef),
MessageFormat.format("{0}.{1}", databaseMiddle, pocExtendInfo),
MessageFormat.format("{0}.{1}", databaseMiddle, pocBeesProjectInfo),
MessageFormat.format("{0}.{1}", databaseMiddle, pocWholesalerRef)
)
.username(mysqlUsername)
.password(mysqlPassword)
.startupOptions(StartupOptions.initial())
.deserializer(jdd)
.serverId(serverId)
.build();
return mySqlSource;
}
private static ElasticsearchSink<String> getEsSink(ElasticsearchSinkFunction<String> elasticsearchSinkFunction, ParameterTool parameterTool) {
String esAddress = parameterTool.get("esAddress");
String esPort = parameterTool.get("esPort");
String esUsername = parameterTool.get("esUsername");
String esPassword = parameterTool.get("esPassword");
//1-httpHosts
List<HttpHost> httpHosts = new ArrayList<>();
httpHosts.add(new HttpHost(esAddress, Integer.parseInt(esPort), "http"));
//2-restClientFactory
RestClientFactory restClientFactory = restClientBuilder -> {
Node node = new Node(new HttpHost(esAddress, Integer.parseInt(esPort), "https"));
List<Node> nodes = new ArrayList<>();
nodes.add(node);
Header[] header = new Header[1];
BasicHeader authHeader = new BasicHeader("Authorization", "Basic " + Base64.encode((esUsername + ":" + esPassword).getBytes()));
header[0] = authHeader;
restClientBuilder.setDefaultHeaders(header);
restClientBuilder.build().setNodes(
nodes
);
};
//3-创建数据源对象 ElasticsearchSink
ElasticsearchSink.Builder<String> esSinkBuilder = new ElasticsearchSink.Builder<String>(
httpHosts, elasticsearchSinkFunction
);
esSinkBuilder.setRestClientFactory(restClientFactory);
//配置批量提交
esSinkBuilder.setBulkFlushBackoff(true);
//设置批量提交的最大条数
esSinkBuilder.setBulkFlushMaxActions(3000);
//设置批量提交最大数据量(以MB为单位)
esSinkBuilder.setBulkFlushMaxSizeMb(50);
//设置批量提交间隔
esSinkBuilder.setBulkFlushInterval(100);
//设置重试次数
esSinkBuilder.setBulkFlushBackoffRetries(1);
//设置重试间隔
esSinkBuilder.setBulkFlushBackoffDelay(2000L);
//设置重试策略CONSTANT: 常数 eg: 重试间隔为2s 重试3次 会在2s->4s->6s进行; EXPONENTIAL:指数 eg: 重试间隔为2s 重试3次 会在2s->4s->8s进行
esSinkBuilder.setBulkFlushBackoffType(ElasticsearchSinkBase.FlushBackoffType.CONSTANT);
//设置重试机制
esSinkBuilder.setFailureHandler(new RetryRejectedExecutionFailureHandler());
ElasticsearchSink<String> esSink = esSinkBuilder.build();
return esSink;
}
}
main方法入参样例
--esAddress
********
--esPort
********
--esUsername
********
--esPassword
********
--mysqlHost
********
--mysqlJdbcUrl
********
--mysqlPort
********
--mysqlDriver
********
--mysqlUsername
********
--mysqlPassword
********
--env
dev
--serverId
5500-5529
踩过的坑
1.flink task manager和job manager之间的环境是隔离的,main方法体内的变量, 在process方法中读不到值(可使用flink自带的ParameterTool类解决,把某些变量变成全局变量)
2.必须自定义解析器,解析binlog日志,不然有些如decimal类型的字段会被encode
Map<String,Object> config = Maps.newHashMap();
config.put(JsonConverterConfig.DECIMAL_FORMAT_CONFIG, DecimalFormat.NUMERIC.name());
JsonDebeziumDeserializationSchema jdd = new JsonDebeziumDeserializationSchema(false, config);
3.必须单独处理nested类型的字段,更新nested字段需使用script脚本处理各种特殊情况
public static List<UpdateRequest> buildNestedUpdateRequest(String indexName, String id,
String fieldName, String optType,
String nestedIdField, String nestedId, Map<String,Object> nestedObj){
List<UpdateRequest> updateRequestList = new ArrayList<>();
if (StringUtils.isBlank(id) || null == nestedObj || nestedObj.isEmpty()) {
return updateRequestList;
}
//1-确保有这条数据,没有则新增,有则修改一个没卵用的字段
List<UpdateRequest> createDocIfNotExistReqList = buildUpdateRequest(indexName,id,new HashMap<>());
updateRequestList.addAll(createDocIfNotExistReqList);
//2-构造真正的update请求
Script script = null;
//根据操作类型搞出不同的脚本
if("u".equals(optType) || "c".equals(optType)){
//语句
String code = String.format("if (ctx._source.%s == null) {ctx._source.%s = [];}ctx._source.%s.removeIf(item -> item.%s == params.detail.%s);ctx._source.%s.add(params.detail);",
fieldName,
fieldName,
fieldName,
nestedIdField,
nestedIdField,
fieldName);
//参数
Map<String, Object> paramMap = new HashMap<>();
paramMap.put("detail", nestedObj);
//构造脚本
script = new Script(ScriptType.INLINE, "painless", code, paramMap);
}else if("d".equals(optType)){
//语句
String code = String.format("if(ctx._source.%s==null){return;}ctx._source.%s.removeIf(item -> item.%s == params.nestedId);if(ctx._source.%s.length==0){ctx._source.remove('%s')}",
fieldName,
fieldName,
nestedIdField,
fieldName,
fieldName);
//参数
Map<String, Object> paramMap = new HashMap<>();
paramMap.put("nestedId", nestedId);
//构造脚本
script = new Script(ScriptType.INLINE, "painless", code, paramMap);
}
//创建request
UpdateRequest updateRequest = new UpdateRequest(indexName, id)
.script(script);
updateRequestList.add(updateRequest);
return updateRequestList;
}