Flink jar程序开发采坑笔记

背景

第一次开发Flink jar程序,遇到的了诸多问题,且由于调试不方便,每次改完代码要重新打包,上传公司的实时计算平台,启动验证,耗时诸多,因此记录下遇到的问题,防止下次再次踩坑。

问题记录

Flink版本:1.13.5

ES版本:7.10.2

问题记录:

1、提交任务到Yard集群,任务启动报错,提示对象没有序列化:

2024-05-27 13:57:31 [error] taskId-3004147 org.apache.flink.api.common.InvalidProgramException: The implementation of the ElasticsearchSinkBase is not serializable. The object probably contains or references non serializable fields.
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:164)
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:69)
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.clean(StreamExecutionEnvironment.java:2053)
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.flink.streaming.api.datastream.DataStream.clean(DataStream.java:203)
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.flink.streaming.api.datastream.DataStream.addSink(DataStream.java:1243)
2024-05-27 13:57:31 [error] taskId-3004147 	at com.xxx.demo.MetricsApplication.main(MetricsApplication.java:68)
2024-05-27 13:57:31 [error] taskId-3004147 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2024-05-27 13:57:31 [info] Flink submit task process destroyed: Task id: 3004147, ip: 10.206.23.140, pid: 355597, stop time: 2024-05-27 13:57:31
2024-05-27 13:57:31 [error] taskId-3004147 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2024-05-27 13:57:31 [error] taskId-3004147 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.lang.reflect.Method.invoke(Method.java:497)
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.security.AccessController.doPrivileged(Native Method)
2024-05-27 13:57:31 [error] taskId-3004147 	at javax.security.auth.Subject.doAs(Subject.java:422)
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
2024-05-27 13:57:31 [error] taskId-3004147 Caused by: java.io.NotSerializableException: org.apache.flink.shaded.hadoop2.org.apache.commons.httpclient.HttpHost
2024-05-27 13:57:31 [error] taskId-3004147 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.util.ArrayList.writeObject(ArrayList.java:762)
2024-05-27 13:57:31 [error] taskId-3004147 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2024-05-27 13:57:31 [error] taskId-3004147 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2024-05-27 13:57:31 [error] taskId-3004147 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.lang.reflect.Method.invoke(Method.java:497)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
2024-05-27 13:57:31 [error] taskId-3004147 	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:624)
2024-05-27 13:57:31 [error] taskId-3004147 	at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:143)
2024-05-27 13:57:31 [error] taskId-3004147 	... 21 more
2024-05-27 13:57:31 [error] taskId-3004147 
2024-05-27 13:57:31 [error] taskId-3004147 ------------------------------------------------------------
2024-05-27 13:57:31 [error] taskId-3004147  The program finished with the following exception:
2024-05-27 13:57:31 [error] taskId-3004147 
2024-05-27 13:57:31 [error] taskId-3004147 The implementation of the ElasticsearchSinkBase is not serializable. The object probably contains or references non serializable fields.
2024-05-27 13:57:31 [error] taskId-3004147 	org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:164)
2024-05-27 13:57:31 [error] taskId-3004147 	org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:69)
2024-05-27 13:57:31 [error] taskId-3004147 	org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.clean(StreamExecutionEnvironment.java:2053)
2024-05-27 13:57:31 [error] taskId-3004147 	org.apache.flink.streaming.api.datastream.DataStream.clean(DataStream.java:203)
2024-05-27 13:57:31 [error] taskId-3004147 	org.apache.flink.streaming.api.datastream.DataStream.addSink(DataStream.java:1243)
2024-05-27 13:57:31 [error] taskId-3004147 	com.xxx.demo.MetricsApplication.main(MetricsApplication.java:68)

看报错提示是 org.apache.flink.shaded.hadoop2.org.apache.commons.httpclient.HttpHost 类没有实现序列化接口,该类来自

org.apache.flink:flink-shaded-hadoop-2-uber:2.7.5-10.0

一开始怀疑是包需要升级,遂试着找个更高的版本2.8.3-10.0,发现新版中已没有这个类,但是有 org.apache.flink.shaded.hadoop2.org.apache.http.HttpHost,且实现了序列化接口,于是改为导入此类。

再次提交,这次能成功启动,但是在执行flink处理程序是报错:

java.lang.ArrayStoreException
	at java.util.ArrayList.toArray(ArrayList.java:408)
	at org.apache.flink.streaming.connectors.elasticsearch7.Elasticsearch7ApiCallBridge.createClient(Elasticsearch7ApiCallBridge.java:69)
	at org.apache.flink.streaming.connectors.elasticsearch7.Elasticsearch7ApiCallBridge.createClient(Elasticsearch7ApiCallBridge.java:46)
	at org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.open(ElasticsearchSinkBase.java:317)
	at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:34)
	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
	at org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:46)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:442)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:585)
	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:565)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:540)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:759)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
	at java.lang.Thread.run(Thread.java:745)

报错位置的源代码:

  @Override
    public RestHighLevelClient createClient(Map<String, String> clientConfig) {
        RestClientBuilder builder =
                RestClient.builder(httpHosts.toArray(new HttpHost[httpHosts.size()]));
        restClientFactory.configureRestClientBuilder(builder);

        RestHighLevelClient rhlClient = new RestHighLevelClient(builder);

        return rhlClient;
    }

报错的方法是httpHosts.toArray(),发现这里的HttpHost其实是

org.apache.http.HttpHost

所以是导错了包导致的乌龙。

2、ES连接报401未授权。

一开始的连接方法:

import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
……



  esSinkBuilder.setRestClientFactory(restClientBuilder -> {
            CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
            credentialsProvider.setCredentials(AuthScope.ANY,
                    new UsernamePasswordCredentials("es_username", "es_password"));
            restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
                @Override
                public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpAsyncClientBuilder) {
                    httpAsyncClientBuilder.disableAuthCaching();
                    return httpAsyncClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
                }
            });
        });

由于ES设置了安全认证,无法直接通过账号密码直连,改用Basic加密认证。

 esSinkBuilder.setRestClientFactory(restClientBuilder -> {
            String username = "es_username";
            String password = "es_password";
            String auth = Base64.encodeBase64String((username + ":" + password).getBytes());
            restClientBuilder.setDefaultHeaders(new BasicHeader[]{new BasicHeader("Authorization", "Basic " + auth)});
        });

3、写入ES的数据对象中包含_id报错。

由于ES中文档的记录id是_id,之前是想在写入的报文JSON中直接取业务数据中的请求id赋值给_id,结果报错,修改为调用id()方法。

完整程序示例

不可直接复制执行,仅展示关键代码,公司内部二方包替换为xxx.demo。

  • 依赖pom.xml
<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.xxx.demo</groupId>
    <artifactId>scp-metrics-flink</artifactId>
    <version>1.0.0-SNAPSHOT</version>
    <name>scp-metrics-flink</name>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <flink.version>1.13.5</flink.version>
        <scala.binary.version>2.11</scala.binary.version>
    </properties>

    <dependencies>

        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.83</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>

        <!-- Need this dependency which flink version below 1.14 -->
        <!-- 1.14 以下版本必须使用该依赖, 1.14 blink包变为 非 blink 包-->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner-blink_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>

        <!--  测试单元 -->
        <!--  测试依赖包       -->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>

        <!-- 本地测试webui, 不指定启动环境为env时,输出日志Web frontend 会打印web ui 地址(非8081端口);-->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-runtime-web_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-statebackend-rocksdb_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>


        <!-- flink table language support  -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-api-java-bridge_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>

        <!-- 常用 connector -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-elasticsearch7_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>

        <!-- 常用 format        -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-json</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>

        <!-- 本地 checkpoint 写hdfs 需求,可以改成写本地文件夹 -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-shaded-hadoop-2-uber</artifactId>
            <version>2.8.3-10.0</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>com.xxx.demo</groupId>
            <artifactId>sf-split-flow</artifactId>
            <version>1.0.5</version>
        </dependency>

        <dependency>
            <groupId>com.xxx.kafka</groupId>
            <artifactId>sf-kafka-api-check-valid</artifactId>
            <version>1.17.6</version>
        </dependency>

        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-core</artifactId>
            <version>4.3.2.RELEASE</version>
        </dependency>

    </dependencies>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>com.alibaba</groupId>
                <artifactId>fastjson</artifactId>
                <version>1.2.83</version>
            </dependency>
        </dependencies>
    </dependencyManagement>


    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-source-plugin</artifactId>
                <version>3.2.1</version>
                <executions>
                    <execution>
                        <id>attach-sources</id>
                        <goals>
                            <goal>jar</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.7.0</version>
                <configuration>
                    <source>8</source>
                    <target>8</target>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.2.4</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <artifactSet>
                                <excludes>
                                    <!-- 刨除以下包依赖,防止 yarn timeline server 等服务冲突-->
                                    <exclude>javax.ws.rs:javax.ws.rs-api</exclude>
                                    <exclude>org.glassfish.jersey.core:jersery-common</exclude>
                                    <exclude>jakarta.ws.rs:jakarta.ws.rs-api</exclude>
                                    <exclude>com.sun.jersey:jersey-core</exclude>
                                </excludes>
                            </artifactSet>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                        <exclude>META-INF/services/javax.*</exclude>
                                        <exclude>**/*.proto</exclude>
                                        <exclude>**/*.xml</exclude>
                                        <exclude>hbase-webapps/**</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

        </plugins>
    </build>

</project>

公共工具类

  • BeanUtil
package com.xxx.demo.util;

import org.springframework.cglib.beans.BeanMap;
import java.util.HashMap;
import java.util.Map;

public class BeanUtil {

    public static <T> Map<String, Object> beanToMap(T bean) {
        Map<String, Object> map = new HashMap<>();
        if (bean != null) {
            BeanMap beanMap = BeanMap.create(bean);
            for (Object key : beanMap.keySet()) {
                map.put(key.toString(), beanMap.get(key));
            }
        }
        return map;
    }

}
  • ConfigUtil
package com.xxx.demo.util;

import org.apache.flink.api.java.utils.ParameterTool;
import java.io.IOException;
import java.util.Properties;

/**
 * 静态配置工具类
 */
public class ConfigUtil {

    /**
     * 根据控制台输入参数env,从工程本地对应目录获取静态配置
     * @param args
     * @param fileName
     * @return
     * @throws IOException
     */
    public static Properties getStaticConfig(String[] args, String fileName) throws IOException {
        Properties properties = new Properties();
        String filePath = getConfigFromArgs(args, "env").concat("/").concat(fileName);
        properties.load(ConfigUtil.class.getClassLoader().getResourceAsStream(filePath));
        return properties;
    }

    /**
     * 从控制台获取配置值
     * @param args
     * @param configKey
     * @return
     */
    public static String getConfigFromArgs(String[] args, String configKey) {
        return ParameterTool.fromArgs(args).get(configKey);
    }

}
  • DateUtil
package com.xxx.demo.util;

import java.time.LocalDateTime;
import java.time.ZoneId;
import java.time.format.DateTimeFormatter;
import java.util.Date;

public class DateUtil {

    /**
     * 年月日时分秒 yyyy-MM-dd HH:mm:ss
     */
    public static final DateTimeFormatter DEFAULT_TIME_FORMATTER = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss");

    /**
     * 年月日 yyyyMMdd
     */
    public static final DateTimeFormatter DAFAULT_DATE_FORMATTER = DateTimeFormatter.ofPattern("yyyyMMdd");

    public static Date stringToDate(String timeString) {
        return stringToDate(timeString, DEFAULT_TIME_FORMATTER);
    }

    public static Date stringToDate(String timeString, DateTimeFormatter formatter) {
        LocalDateTime localDateTime = LocalDateTime.parse(timeString, formatter);
        Date result = Date.from(localDateTime.atZone(ZoneId.systemDefault()).toInstant());
        return result;
    }

    public static String dateToString(Date date) {
       return dateToString(date, DAFAULT_DATE_FORMATTER);
    }

    public static String dateToString(Date date, DateTimeFormatter formatter) {
        LocalDateTime localDateTime = LocalDateTime.ofInstant(date.toInstant(), ZoneId.systemDefault());
        return formatter.format(localDateTime);
    }

}
  • OtherUtils
package com.xxx.demo.util;

import org.apache.http.HttpHost;
import java.util.ArrayList;
import java.util.List;

public class OtherUtils {

    public static List<HttpHost> loadHostArray(String nodes) {
        List<HttpHost> httpHostList = new ArrayList<>();
        String[] split = nodes.split(",");
        for (int i = 0; i < split.length; ++i) {
            String item = split[i];
            httpHostList.add(new HttpHost(item.split(":")[0], Integer.parseInt(item.split(":")[1]), "http"));
        }
        return httpHostList;
    }

}
  • dto 
package com.xxx.demo.dto;

import java.io.Serializable;

public class MetricsDto implements Serializable {

    /**
     * 数据id,保证唯一
     */
    private String reqId;

    /**
     * 单号
     */
    private String orderNo;

    /**
     * 用户id
     */
    private String userId;

    /**
     * 请求时间
     */
    private String reqTime;

   // ingore get、set method

}
  • map 操作
package com.xxx.demo.map;

import com.xxx.demo.dto.MetricsDto ;
import com.xxx.demo.utils.JsonUtils;
import org.apache.flink.api.common.functions.RichMapFunction;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class MetricsMap extends RichMapFunction<String, MetricsDto> {

    private static final Logger logger = LoggerFactory.getLogger(MetricsMap.class);

    @Override
    public MetricsDto map(String value) throws Exception {
        return JsonUtils.json2Object(value, MetricsDto.class);
    }
}
  • 主程序
package com.xxx.demo;

import com.xxx.kafka.check.util.AuthUtil;
import com.xxx.demo.dto.MetricsDto;
import com.xxx.demo.map.MetricsMap;
import com.xxx.demo.util.BeanUtil;
import com.xxx.demo.util.DateUtil;
import com.xxx.demo.util.OtherUtils;
import com.xxx.demo.util.ConfigUtil;
import org.apache.commons.lang3.StringUtils;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.calcite.shaded.org.apache.commons.codec.binary.Base64;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.elasticsearch.ActionRequestFailureHandler;
import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase;
import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction;
import org.apache.flink.streaming.connectors.elasticsearch.util.RetryRejectedExecutionFailureHandler;
import org.apache.flink.streaming.connectors.elasticsearch7.ElasticsearchSink;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.util.ExceptionUtils;
import org.apache.http.HttpHost;
import org.apache.http.message.BasicHeader;
import org.elasticsearch.ElasticsearchParseException;
import org.elasticsearch.client.Requests;
import org.elasticsearch.common.util.concurrent.EsRejectedExecutionException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.*;

public class MetricsApplication {

    private static final Logger logger = LoggerFactory.getLogger(MetricsApplication.class);

    @SuppressWarnings("unchecked")
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf);

        String sourceParallelism = StringUtils.defaultIfBlank(ConfigUtil.getConfigFromArgs(args, "sourceParallelism"), "4");
        String mapParallelism = StringUtils.defaultIfBlank(ConfigUtil.getConfigFromArgs(args, "mapParallelism"), "4");
        String sinkParallelism = StringUtils.defaultIfBlank(ConfigUtil.getConfigFromArgs(args, "sinkParallelism"), "4");

        logger.info("source -> map -> sink parallelism config is {} -> {} -> {}", sourceParallelism, mapParallelism, sinkParallelism);

        try {
            env.addSource(getKafkaConsumer(args))
                    .setParallelism(Integer.parseInt(sourceParallelism))
                    .name("SCP_PRINT_METRICS_SOURCE")
                    .uid("SCP_PRINT_METRICS_SOURCE")
                    .map(new MetricsMap())
                    .setParallelism(Integer.parseInt(mapParallelism))
                    .name("SCP_PRINT_METRICS_MAP")
                    .uid("SCP_PRINT_METRICS_MAP")
                    .addSink(getEsSinkBuilder(args).build())
                    .setParallelism(Integer.parseInt(sinkParallelism))
                    .name("SCP_PRINT_METRICS_ES")
                    .uid("SCP_PRINT_METRICS_ES");
            env.execute("scp-print-metrics");
        } catch (Exception e) {
            logger.error("error msg is {}", e.getMessage(), e);
            throw e;
        }
    }

    /**
     * 设置kafka消费者
     * @param args
     * @return
     * @throws IOException
     */
    private static FlinkKafkaConsumer getKafkaConsumer(String[] args) throws IOException {
        Properties kafkaConfig = ConfigUtil.getStaticConfig(args, "kafka.properties");
        logger.info("load kafka config: {}", kafkaConfig);
        String cluster = kafkaConfig.getProperty("scpPrintMetricsCluster");
        String topic = kafkaConfig.getProperty("scpPrintMetricsTopic");
        String token = kafkaConfig.getProperty("scpPrintMetricsToken");
        String momUrl = kafkaConfig.getProperty("scpPrintMetricsMomUrl");
        String brokers = AuthUtil.getBrokers(cluster, topic + ":" + token, momUrl);
        logger.info("brokers is " + brokers);
        kafkaConfig.setProperty("bootstrap.servers", brokers);
        return new FlinkKafkaConsumer(topic, new SimpleStringSchema(), kafkaConfig);
    }

    /**
     * 设置ES Sink
     * @param args
     * @return
     * @throws IOException
     */
    private static ElasticsearchSink.Builder<MetricsDto> getEsSinkBuilder(String[] args) throws IOException {
        Properties esConfig = ConfigUtil.getStaticConfig(args, "es.properties");
        logger.info("load es config: {}", esConfig);
        List<HttpHost> esHosts = OtherUtils.loadHostArray(esConfig.getProperty("es.host"));
        ElasticsearchSink.Builder<MetricsDto> esSinkBuilder = new ElasticsearchSink.Builder(esHosts, (ElasticsearchSinkFunction<MetricsDto>) (metricsDto, runtimeContext, requestIndexer) -> {
            String suffix = null;
            if (StringUtils.isNotBlank(metricsDto.getPrintDateTime())) {
                try {
                    Date printDateTime = DateUtil.stringToDate(metricsDto.getPrintDateTime());
                    suffix = DateUtil.dateToString(printDateTime);
                } catch (Exception e) {
                    logger.warn("parse date {} fail", metricsDto.getPrintDateTime());
                }
            }
            if (StringUtils.isBlank(suffix)) {
                suffix = LocalDate.now().format(DateTimeFormatter.BASIC_ISO_DATE);
            }
            String index = esConfig.getProperty("es.index.prefix") + suffix;
            Map<String, Object> map = BeanUtil.beanToMap(metricsDto);
            logger.info("sink index: {} : {}", index, map);
            requestIndexer.add(Requests.indexRequest().index(index).id(metricsDto.getDataId()).source(map));
        });

        esSinkBuilder.setFailureHandler((ActionRequestFailureHandler) (actionRequest, throwable, i, requestIndexer) -> {
            if (ExceptionUtils.findThrowable(throwable, EsRejectedExecutionException.class).isPresent()) {
                requestIndexer.add(actionRequest);
            } else if (ExceptionUtils.findThrowable(throwable, ElasticsearchParseException.class).isPresent()) {
                logger.error("sink fail: {}", throwable.getMessage(), throwable);
            } else {
                logger.error("sink fail2: {}", throwable.getMessage(), throwable);
            }
        });
        esSinkBuilder.setRestClientFactory(restClientBuilder -> {
            String username = esConfig.getProperty("es.username");
            String password = esConfig.getProperty("es.password");
            String auth = Base64.encodeBase64String((username + ":" + password).getBytes());
            restClientBuilder.setDefaultHeaders(new BasicHeader[]{new BasicHeader("Authorization", "Basic " + auth)});
        });
        esSinkBuilder.setBulkFlushMaxActions(200);
        esSinkBuilder.setBulkFlushInterval(3000);
        esSinkBuilder.setBulkFlushMaxSizeMb(10);
        esSinkBuilder.setBulkFlushBackoff(true);
        esSinkBuilder.setBulkFlushBackoffRetries(2);
        esSinkBuilder.setBulkFlushBackoffDelay(2);
        esSinkBuilder.setBulkFlushBackoffType(ElasticsearchSinkBase.FlushBackoffType.EXPONENTIAL);
        esSinkBuilder.setFailureHandler(new RetryRejectedExecutionFailureHandler());
        return esSinkBuilder;
    }

}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值