背景
第一次开发Flink jar程序,遇到的了诸多问题,且由于调试不方便,每次改完代码要重新打包,上传公司的实时计算平台,启动验证,耗时诸多,因此记录下遇到的问题,防止下次再次踩坑。
问题记录
Flink版本:1.13.5
ES版本:7.10.2
问题记录:
1、提交任务到Yard集群,任务启动报错,提示对象没有序列化:
2024-05-27 13:57:31 [error] taskId-3004147 org.apache.flink.api.common.InvalidProgramException: The implementation of the ElasticsearchSinkBase is not serializable. The object probably contains or references non serializable fields.
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:164)
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:69)
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.clean(StreamExecutionEnvironment.java:2053)
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.flink.streaming.api.datastream.DataStream.clean(DataStream.java:203)
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.flink.streaming.api.datastream.DataStream.addSink(DataStream.java:1243)
2024-05-27 13:57:31 [error] taskId-3004147 at com.xxx.demo.MetricsApplication.main(MetricsApplication.java:68)
2024-05-27 13:57:31 [error] taskId-3004147 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2024-05-27 13:57:31 [info] Flink submit task process destroyed: Task id: 3004147, ip: 10.206.23.140, pid: 355597, stop time: 2024-05-27 13:57:31
2024-05-27 13:57:31 [error] taskId-3004147 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2024-05-27 13:57:31 [error] taskId-3004147 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2024-05-27 13:57:31 [error] taskId-3004147 at java.lang.reflect.Method.invoke(Method.java:497)
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
2024-05-27 13:57:31 [error] taskId-3004147 at java.security.AccessController.doPrivileged(Native Method)
2024-05-27 13:57:31 [error] taskId-3004147 at javax.security.auth.Subject.doAs(Subject.java:422)
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
2024-05-27 13:57:31 [error] taskId-3004147 Caused by: java.io.NotSerializableException: org.apache.flink.shaded.hadoop2.org.apache.commons.httpclient.HttpHost
2024-05-27 13:57:31 [error] taskId-3004147 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
2024-05-27 13:57:31 [error] taskId-3004147 at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
2024-05-27 13:57:31 [error] taskId-3004147 at java.util.ArrayList.writeObject(ArrayList.java:762)
2024-05-27 13:57:31 [error] taskId-3004147 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2024-05-27 13:57:31 [error] taskId-3004147 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2024-05-27 13:57:31 [error] taskId-3004147 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2024-05-27 13:57:31 [error] taskId-3004147 at java.lang.reflect.Method.invoke(Method.java:497)
2024-05-27 13:57:31 [error] taskId-3004147 at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
2024-05-27 13:57:31 [error] taskId-3004147 at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
2024-05-27 13:57:31 [error] taskId-3004147 at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
2024-05-27 13:57:31 [error] taskId-3004147 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
2024-05-27 13:57:31 [error] taskId-3004147 at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
2024-05-27 13:57:31 [error] taskId-3004147 at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
2024-05-27 13:57:31 [error] taskId-3004147 at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
2024-05-27 13:57:31 [error] taskId-3004147 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
2024-05-27 13:57:31 [error] taskId-3004147 at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
2024-05-27 13:57:31 [error] taskId-3004147 at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
2024-05-27 13:57:31 [error] taskId-3004147 at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
2024-05-27 13:57:31 [error] taskId-3004147 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
2024-05-27 13:57:31 [error] taskId-3004147 at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:624)
2024-05-27 13:57:31 [error] taskId-3004147 at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:143)
2024-05-27 13:57:31 [error] taskId-3004147 ... 21 more
2024-05-27 13:57:31 [error] taskId-3004147
2024-05-27 13:57:31 [error] taskId-3004147 ------------------------------------------------------------
2024-05-27 13:57:31 [error] taskId-3004147 The program finished with the following exception:
2024-05-27 13:57:31 [error] taskId-3004147
2024-05-27 13:57:31 [error] taskId-3004147 The implementation of the ElasticsearchSinkBase is not serializable. The object probably contains or references non serializable fields.
2024-05-27 13:57:31 [error] taskId-3004147 org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:164)
2024-05-27 13:57:31 [error] taskId-3004147 org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:69)
2024-05-27 13:57:31 [error] taskId-3004147 org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.clean(StreamExecutionEnvironment.java:2053)
2024-05-27 13:57:31 [error] taskId-3004147 org.apache.flink.streaming.api.datastream.DataStream.clean(DataStream.java:203)
2024-05-27 13:57:31 [error] taskId-3004147 org.apache.flink.streaming.api.datastream.DataStream.addSink(DataStream.java:1243)
2024-05-27 13:57:31 [error] taskId-3004147 com.xxx.demo.MetricsApplication.main(MetricsApplication.java:68)
看报错提示是 org.apache.flink.shaded.hadoop2.org.apache.commons.httpclient.HttpHost 类没有实现序列化接口,该类来自
org.apache.flink:flink-shaded-hadoop-2-uber:2.7.5-10.0
一开始怀疑是包需要升级,遂试着找个更高的版本2.8.3-10.0,发现新版中已没有这个类,但是有 org.apache.flink.shaded.hadoop2.org.apache.http.HttpHost,且实现了序列化接口,于是改为导入此类。
再次提交,这次能成功启动,但是在执行flink处理程序是报错:
java.lang.ArrayStoreException
at java.util.ArrayList.toArray(ArrayList.java:408)
at org.apache.flink.streaming.connectors.elasticsearch7.Elasticsearch7ApiCallBridge.createClient(Elasticsearch7ApiCallBridge.java:69)
at org.apache.flink.streaming.connectors.elasticsearch7.Elasticsearch7ApiCallBridge.createClient(Elasticsearch7ApiCallBridge.java:46)
at org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.open(ElasticsearchSinkBase.java:317)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:34)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
at org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:46)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:442)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:585)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100)
at org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:565)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:540)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:759)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
at java.lang.Thread.run(Thread.java:745)
报错位置的源代码:
@Override
public RestHighLevelClient createClient(Map<String, String> clientConfig) {
RestClientBuilder builder =
RestClient.builder(httpHosts.toArray(new HttpHost[httpHosts.size()]));
restClientFactory.configureRestClientBuilder(builder);
RestHighLevelClient rhlClient = new RestHighLevelClient(builder);
return rhlClient;
}
报错的方法是httpHosts.toArray(),发现这里的HttpHost其实是
org.apache.http.HttpHost
所以是导错了包导致的乌龙。
2、ES连接报401未授权。
一开始的连接方法:
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
……
esSinkBuilder.setRestClientFactory(restClientBuilder -> {
CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY,
new UsernamePasswordCredentials("es_username", "es_password"));
restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
@Override
public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpAsyncClientBuilder) {
httpAsyncClientBuilder.disableAuthCaching();
return httpAsyncClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
}
});
});
由于ES设置了安全认证,无法直接通过账号密码直连,改用Basic加密认证。
esSinkBuilder.setRestClientFactory(restClientBuilder -> {
String username = "es_username";
String password = "es_password";
String auth = Base64.encodeBase64String((username + ":" + password).getBytes());
restClientBuilder.setDefaultHeaders(new BasicHeader[]{new BasicHeader("Authorization", "Basic " + auth)});
});
3、写入ES的数据对象中包含_id报错。
由于ES中文档的记录id是_id,之前是想在写入的报文JSON中直接取业务数据中的请求id赋值给_id,结果报错,修改为调用id()方法。
完整程序示例
不可直接复制执行,仅展示关键代码,公司内部二方包替换为xxx.demo。
- 依赖pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.xxx.demo</groupId>
<artifactId>scp-metrics-flink</artifactId>
<version>1.0.0-SNAPSHOT</version>
<name>scp-metrics-flink</name>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<flink.version>1.13.5</flink.version>
<scala.binary.version>2.11</scala.binary.version>
</properties>
<dependencies>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.83</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- Need this dependency which flink version below 1.14 -->
<!-- 1.14 以下版本必须使用该依赖, 1.14 blink包变为 非 blink 包-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner-blink_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- 测试单元 -->
<!-- 测试依赖包 -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<!-- 本地测试webui, 不指定启动环境为env时,输出日志Web frontend 会打印web ui 地址(非8081端口);-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-runtime-web_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-statebackend-rocksdb_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- flink table language support -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-java-bridge_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- 常用 connector -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-elasticsearch7_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<!-- 常用 format -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-json</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- 本地 checkpoint 写hdfs 需求,可以改成写本地文件夹 -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-shaded-hadoop-2-uber</artifactId>
<version>2.8.3-10.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.xxx.demo</groupId>
<artifactId>sf-split-flow</artifactId>
<version>1.0.5</version>
</dependency>
<dependency>
<groupId>com.xxx.kafka</groupId>
<artifactId>sf-kafka-api-check-valid</artifactId>
<version>1.17.6</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-core</artifactId>
<version>4.3.2.RELEASE</version>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.83</version>
</dependency>
</dependencies>
</dependencyManagement>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.7.0</version>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.4</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<excludes>
<!-- 刨除以下包依赖,防止 yarn timeline server 等服务冲突-->
<exclude>javax.ws.rs:javax.ws.rs-api</exclude>
<exclude>org.glassfish.jersey.core:jersery-common</exclude>
<exclude>jakarta.ws.rs:jakarta.ws.rs-api</exclude>
<exclude>com.sun.jersey:jersey-core</exclude>
</excludes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>META-INF/services/javax.*</exclude>
<exclude>**/*.proto</exclude>
<exclude>**/*.xml</exclude>
<exclude>hbase-webapps/**</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
公共工具类
-
BeanUtil
package com.xxx.demo.util;
import org.springframework.cglib.beans.BeanMap;
import java.util.HashMap;
import java.util.Map;
public class BeanUtil {
public static <T> Map<String, Object> beanToMap(T bean) {
Map<String, Object> map = new HashMap<>();
if (bean != null) {
BeanMap beanMap = BeanMap.create(bean);
for (Object key : beanMap.keySet()) {
map.put(key.toString(), beanMap.get(key));
}
}
return map;
}
}
-
ConfigUtil
package com.xxx.demo.util;
import org.apache.flink.api.java.utils.ParameterTool;
import java.io.IOException;
import java.util.Properties;
/**
* 静态配置工具类
*/
public class ConfigUtil {
/**
* 根据控制台输入参数env,从工程本地对应目录获取静态配置
* @param args
* @param fileName
* @return
* @throws IOException
*/
public static Properties getStaticConfig(String[] args, String fileName) throws IOException {
Properties properties = new Properties();
String filePath = getConfigFromArgs(args, "env").concat("/").concat(fileName);
properties.load(ConfigUtil.class.getClassLoader().getResourceAsStream(filePath));
return properties;
}
/**
* 从控制台获取配置值
* @param args
* @param configKey
* @return
*/
public static String getConfigFromArgs(String[] args, String configKey) {
return ParameterTool.fromArgs(args).get(configKey);
}
}
-
DateUtil
package com.xxx.demo.util;
import java.time.LocalDateTime;
import java.time.ZoneId;
import java.time.format.DateTimeFormatter;
import java.util.Date;
public class DateUtil {
/**
* 年月日时分秒 yyyy-MM-dd HH:mm:ss
*/
public static final DateTimeFormatter DEFAULT_TIME_FORMATTER = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss");
/**
* 年月日 yyyyMMdd
*/
public static final DateTimeFormatter DAFAULT_DATE_FORMATTER = DateTimeFormatter.ofPattern("yyyyMMdd");
public static Date stringToDate(String timeString) {
return stringToDate(timeString, DEFAULT_TIME_FORMATTER);
}
public static Date stringToDate(String timeString, DateTimeFormatter formatter) {
LocalDateTime localDateTime = LocalDateTime.parse(timeString, formatter);
Date result = Date.from(localDateTime.atZone(ZoneId.systemDefault()).toInstant());
return result;
}
public static String dateToString(Date date) {
return dateToString(date, DAFAULT_DATE_FORMATTER);
}
public static String dateToString(Date date, DateTimeFormatter formatter) {
LocalDateTime localDateTime = LocalDateTime.ofInstant(date.toInstant(), ZoneId.systemDefault());
return formatter.format(localDateTime);
}
}
-
OtherUtils
package com.xxx.demo.util;
import org.apache.http.HttpHost;
import java.util.ArrayList;
import java.util.List;
public class OtherUtils {
public static List<HttpHost> loadHostArray(String nodes) {
List<HttpHost> httpHostList = new ArrayList<>();
String[] split = nodes.split(",");
for (int i = 0; i < split.length; ++i) {
String item = split[i];
httpHostList.add(new HttpHost(item.split(":")[0], Integer.parseInt(item.split(":")[1]), "http"));
}
return httpHostList;
}
}
- dto
package com.xxx.demo.dto;
import java.io.Serializable;
public class MetricsDto implements Serializable {
/**
* 数据id,保证唯一
*/
private String reqId;
/**
* 单号
*/
private String orderNo;
/**
* 用户id
*/
private String userId;
/**
* 请求时间
*/
private String reqTime;
// ingore get、set method
}
- map 操作
package com.xxx.demo.map;
import com.xxx.demo.dto.MetricsDto ;
import com.xxx.demo.utils.JsonUtils;
import org.apache.flink.api.common.functions.RichMapFunction;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class MetricsMap extends RichMapFunction<String, MetricsDto> {
private static final Logger logger = LoggerFactory.getLogger(MetricsMap.class);
@Override
public MetricsDto map(String value) throws Exception {
return JsonUtils.json2Object(value, MetricsDto.class);
}
}
- 主程序
package com.xxx.demo;
import com.xxx.kafka.check.util.AuthUtil;
import com.xxx.demo.dto.MetricsDto;
import com.xxx.demo.map.MetricsMap;
import com.xxx.demo.util.BeanUtil;
import com.xxx.demo.util.DateUtil;
import com.xxx.demo.util.OtherUtils;
import com.xxx.demo.util.ConfigUtil;
import org.apache.commons.lang3.StringUtils;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.calcite.shaded.org.apache.commons.codec.binary.Base64;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.elasticsearch.ActionRequestFailureHandler;
import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase;
import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction;
import org.apache.flink.streaming.connectors.elasticsearch.util.RetryRejectedExecutionFailureHandler;
import org.apache.flink.streaming.connectors.elasticsearch7.ElasticsearchSink;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.util.ExceptionUtils;
import org.apache.http.HttpHost;
import org.apache.http.message.BasicHeader;
import org.elasticsearch.ElasticsearchParseException;
import org.elasticsearch.client.Requests;
import org.elasticsearch.common.util.concurrent.EsRejectedExecutionException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.*;
public class MetricsApplication {
private static final Logger logger = LoggerFactory.getLogger(MetricsApplication.class);
@SuppressWarnings("unchecked")
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf);
String sourceParallelism = StringUtils.defaultIfBlank(ConfigUtil.getConfigFromArgs(args, "sourceParallelism"), "4");
String mapParallelism = StringUtils.defaultIfBlank(ConfigUtil.getConfigFromArgs(args, "mapParallelism"), "4");
String sinkParallelism = StringUtils.defaultIfBlank(ConfigUtil.getConfigFromArgs(args, "sinkParallelism"), "4");
logger.info("source -> map -> sink parallelism config is {} -> {} -> {}", sourceParallelism, mapParallelism, sinkParallelism);
try {
env.addSource(getKafkaConsumer(args))
.setParallelism(Integer.parseInt(sourceParallelism))
.name("SCP_PRINT_METRICS_SOURCE")
.uid("SCP_PRINT_METRICS_SOURCE")
.map(new MetricsMap())
.setParallelism(Integer.parseInt(mapParallelism))
.name("SCP_PRINT_METRICS_MAP")
.uid("SCP_PRINT_METRICS_MAP")
.addSink(getEsSinkBuilder(args).build())
.setParallelism(Integer.parseInt(sinkParallelism))
.name("SCP_PRINT_METRICS_ES")
.uid("SCP_PRINT_METRICS_ES");
env.execute("scp-print-metrics");
} catch (Exception e) {
logger.error("error msg is {}", e.getMessage(), e);
throw e;
}
}
/**
* 设置kafka消费者
* @param args
* @return
* @throws IOException
*/
private static FlinkKafkaConsumer getKafkaConsumer(String[] args) throws IOException {
Properties kafkaConfig = ConfigUtil.getStaticConfig(args, "kafka.properties");
logger.info("load kafka config: {}", kafkaConfig);
String cluster = kafkaConfig.getProperty("scpPrintMetricsCluster");
String topic = kafkaConfig.getProperty("scpPrintMetricsTopic");
String token = kafkaConfig.getProperty("scpPrintMetricsToken");
String momUrl = kafkaConfig.getProperty("scpPrintMetricsMomUrl");
String brokers = AuthUtil.getBrokers(cluster, topic + ":" + token, momUrl);
logger.info("brokers is " + brokers);
kafkaConfig.setProperty("bootstrap.servers", brokers);
return new FlinkKafkaConsumer(topic, new SimpleStringSchema(), kafkaConfig);
}
/**
* 设置ES Sink
* @param args
* @return
* @throws IOException
*/
private static ElasticsearchSink.Builder<MetricsDto> getEsSinkBuilder(String[] args) throws IOException {
Properties esConfig = ConfigUtil.getStaticConfig(args, "es.properties");
logger.info("load es config: {}", esConfig);
List<HttpHost> esHosts = OtherUtils.loadHostArray(esConfig.getProperty("es.host"));
ElasticsearchSink.Builder<MetricsDto> esSinkBuilder = new ElasticsearchSink.Builder(esHosts, (ElasticsearchSinkFunction<MetricsDto>) (metricsDto, runtimeContext, requestIndexer) -> {
String suffix = null;
if (StringUtils.isNotBlank(metricsDto.getPrintDateTime())) {
try {
Date printDateTime = DateUtil.stringToDate(metricsDto.getPrintDateTime());
suffix = DateUtil.dateToString(printDateTime);
} catch (Exception e) {
logger.warn("parse date {} fail", metricsDto.getPrintDateTime());
}
}
if (StringUtils.isBlank(suffix)) {
suffix = LocalDate.now().format(DateTimeFormatter.BASIC_ISO_DATE);
}
String index = esConfig.getProperty("es.index.prefix") + suffix;
Map<String, Object> map = BeanUtil.beanToMap(metricsDto);
logger.info("sink index: {} : {}", index, map);
requestIndexer.add(Requests.indexRequest().index(index).id(metricsDto.getDataId()).source(map));
});
esSinkBuilder.setFailureHandler((ActionRequestFailureHandler) (actionRequest, throwable, i, requestIndexer) -> {
if (ExceptionUtils.findThrowable(throwable, EsRejectedExecutionException.class).isPresent()) {
requestIndexer.add(actionRequest);
} else if (ExceptionUtils.findThrowable(throwable, ElasticsearchParseException.class).isPresent()) {
logger.error("sink fail: {}", throwable.getMessage(), throwable);
} else {
logger.error("sink fail2: {}", throwable.getMessage(), throwable);
}
});
esSinkBuilder.setRestClientFactory(restClientBuilder -> {
String username = esConfig.getProperty("es.username");
String password = esConfig.getProperty("es.password");
String auth = Base64.encodeBase64String((username + ":" + password).getBytes());
restClientBuilder.setDefaultHeaders(new BasicHeader[]{new BasicHeader("Authorization", "Basic " + auth)});
});
esSinkBuilder.setBulkFlushMaxActions(200);
esSinkBuilder.setBulkFlushInterval(3000);
esSinkBuilder.setBulkFlushMaxSizeMb(10);
esSinkBuilder.setBulkFlushBackoff(true);
esSinkBuilder.setBulkFlushBackoffRetries(2);
esSinkBuilder.setBulkFlushBackoffDelay(2);
esSinkBuilder.setBulkFlushBackoffType(ElasticsearchSinkBase.FlushBackoffType.EXPONENTIAL);
esSinkBuilder.setFailureHandler(new RetryRejectedExecutionFailureHandler());
return esSinkBuilder;
}
}