flink
问题
1.gc overhead limit exceeded:多次回收后,gc空间仍不够用
解决方案:切换成G1垃圾回收器
2.修改flink并发度后,提示:Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
解决方案:
3.环境问题,多试几次
2021-03-09 11:55:48|main|ERROR|org.apache.flink.runtime.entrypoint.ClusterEntrypoint|runClusterEntrypoint|520 - Could not start cluster entrypoint YarnJobClusterEntrypoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint YarnJobClusterEntrypoint.
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)
at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:119)
Caused by: java.lang.Exception: unable to establish the security context
at org.apache.flink.runtime.security.SecurityUtils.install(SecurityUtils.java:73)
at org.apache.flink.yarn.entrypoint.YarnEntrypointUtils.installSecurityContext(YarnEntrypointUtils.java:57)
at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.installSecurityContext(YarnJobClusterEntrypoint.java:58)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:166)
... 2 common frames omitted
Caused by: java.lang.RuntimeException: unable to generate a JAAS configuration file
at org.apache.flink.runtime.security.modules.JaasModule.generateDefaultConfigFile(JaasModule.java:170)
at org.apache.flink.runtime.security.modules.JaasModule.install(JaasModule.java:94)
at org.apache.flink.runtime.security.SecurityUtils.install(SecurityUtils.java:67)
... 5 common frames omitted
Caused by: java.nio.file.NoSuchFileException: /alidata1/soft/flink/tmp/jaas-6589995419609878833.conf
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.createFile(Files.java:632)
at java.nio.file.TempFileHelper.create(TempFileHelper.java:138)
at java.nio.file.TempFileHelper.createTempFile(TempFileHelper.java:161)
at java.nio.file.Files.createTempFile(Files.java:852)
at org.apache.flink.runtime.security.modules.JaasModule.generateDefaultConfigFile(JaasModule.java:163)
... 7 common frames omitted
4.Pending record count must be zero at this point
【Flink基础】-- 写入 Kafka 的两种方式_余额不足-CSDN博客
当flink的并发度小于下游topic的分区数时,部分分区不会写数据
package org.apache.flink.streaming.connectors.kafka.partitioner;
import org.apache.flink.util.Preconditions;
public class FlinkFixedPartitioner<T> extends FlinkKafkaPartitioner<T> {
private int parallelInstanceId;
public FlinkFixedPartitioner() {
}
public void open(int parallelInstanceId, int parallelInstances) {
Preconditions.checkArgument(parallelInstanceId >= 0, "Id of this subtask cannot be negative.");
Preconditions.checkArgument(parallelInstances > 0, "Number of subtasks must be larger than 0.");
this.parallelInstanceId = parallelInstanceId;
}
public int partition(T record, byte[] key, byte[] value, String targetTopic, int[] partitions) {
Preconditions.checkArgument(partitions != null && partitions.length > 0, "Partitions of the target topic is empty.");
return partitions[this.parallelInstanceId % partitions.length];
}
}
Flink根据接收器子任务ID和Kafka分区号计算余数。计算过程如下:
flink并行度为3(F0,F1,F2),分区数为2(P0,P1),则F0-> P0,F1-> P1,F2-> P0
flink并行度为2(F0,F1),分区数为3(P0,P1,P2),然后F0-> P0,F1-> P1
因此,默认分区程序将具有2个凹坑:
-
当接收器的并发性低于主题分区的数量时,写入分区的接收器任务将导致某些分区完全没有数据。
-
展开主题的分区后,需要重新启动操作以发现新的分区。
5.Caused by: java.nio.file.NoSuchFileException: /alidata1/soft/flink/tmp/jaas-4073091767736825725.conf
Caused by: java.lang.RuntimeException: unable to generate a JAAS configuration file
at org.apache.flink.runtime.security.modules.JaasModule.generateDefaultConfigFile(JaasModule.java:170)
at org.apache.flink.runtime.security.modules.JaasModule.install(JaasModule.java:94)
at org.apache.flink.runtime.security.SecurityUtils.install(SecurityUtils.java:67)
... 5 more
Caused by: java.nio.file.NoSuchFileException: /alidata1/soft/flink/tmp/jaas-4073091767736825725.conf
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.createFile(Files.java:632)
at java.nio.file.TempFileHelper.create(TempFileHelper.java:138)
at java.nio.file.TempFileHelper.createTempFile(TempFileHelper.java:161)
at java.nio.file.Files.createTempFile(Files.java:852)
at org.apache.flink.runtime.security.modules.JaasModule.generateDefaultConfigFile(JaasModule.java:163)
没有设置flink重启机制,
env.setRestartStrategy(RestartStrategies.fixedDelayRestart(3, 10));
flink on yarn提交命令
flink run \
-d \
-m yarn-cluster \
-p 4 \
-ys 1 \
-yjm 1024m \
-ytm 4096m \
-yqu root.users.test(Specify YARN queue) \
-ynm DataAPI-MarvelTradeMonitorJob(Set a custom name for the application on YARN) \
-c com.ppdai.tsflink.druidHS.DruidLogToES6 \
-yD env.java.opts="-Dlogback.configurationFile=file:///var/lib/hadoop-hdfs/jar_flink/test/logback.xml" \
/var/lib/hadoop-hdfs/jar_flink/test/tsflink-druids-1.0-SNAPSHOT-jar-with-dependencies.jar
/var/lib/hadoop-hdfs/flink-1.9.0/bin/flink run -d -m yarn-cluster -p 2 -ys 1 -yjm 1024m -ytm 4096m -yqu root.users.test\
-ynm DataAPI-{JobName} \
-s hdfs://caasnameservice/flink-checkpoints/test/{JobName}/34375eb4cd7f24b8f4d8fc9d8913a7f0/chk-2460/_metadata \
-c com.test.realtime.{JobName} \
-yD env.java.opts="-Dlogback.configurationFile=file:///var/lib/hadoop-hdfs/jar_flink/test/logback.xml" \
/var/lib/hadoop-hdfs/jar_flink/test/{jarName}.jar hdfs://caasnameservice /user/hdfs/test/{JobName}.properties
/var/lib/hadoop-hdfs/flink-1.9.0/bin/flink run \
-d \
-m yarn-cluster \
-p 2 \
-ys 1 \
-yjm 1024m \
-ytm 2048m \
-yqu root.users.test\
-ynm DataAPI-MarvelTradeMonitorJob \
-c com.test.datacloud.MarvelTradeMonitorJob \
-yD env.java.opts="-Dlogback.configurationFile=file:///var/lib/hadoop-hdfs/jar_flink/test/logback.xml" \
/var/lib/hadoop-hdfs/jar_flink/test/datacloud-query-task-1.0-SNAPSHOT-jar-with-dependencies.jar hdfs://nameservice1:8020 /user/hdfs/test/MarvelTradeMonitorJob.properties
savepoint
save到指定目录,并停止job:
flink stop -p [savepointDir] jobId
/var/lib/hadoop-hdfs/flink-1.9.0/bin/flink savepoint -yid application_1578367242038_3489 bcb89804600c227efc4d30d8af3d3d00 hdfs://nameservice1/flink-savepoints/test/TBillRepaymentTradeJob
从指定point恢复job:
flink run -s [savepointDir] xxxx.jar
flink run -s hdfs://namenode01.td.com/tmp/flink/savepoints/savepoint-40dcc6-a90008f0f82f flink-app-jobs.jar
/var/lib/hadoop-hdfs/flink-1.9.0/bin/flink run -d -m yarn-cluster -p 2 -yjm 1024m -ytm 2048m -yqu root.users.test\
-ynm DataAPI-RepayResultRecordNewJob \
-s hdfs://caasnameservice/flink-checkpoints/test/RepayResultRecordNewJob/d2d5322888094e7ab16911dc01d927ef/chk-9/_metadata \
-c com.test.realtime.RepayResultRecordNewJob \
/var/lib/hadoop-hdfs/jar_flink/test/user-rt-compute-1.1-jar-with-dependencies.jar hdfs://caasnameservice /user/hdfs/test/RepayResultRecordNewJob.properties
flink stop
/var/lib/hadoop-hdfs/flink-1.9.0/bin/flink stop -p hdfs://nameservice1/flink_savepoints/test/TBillRepaymentTradeJob -yid application_1578367242038_3489 bcb89804600c227efc4d30d8af3d3d00
yarn
查看task manager日志
yarn logs -applicationId application_1575946844259_176643
查看job
yarn application --list
杀死job
yarn application -kill
kafka
查看kafka运行状态
ps -ef|grep server.properties
关闭kafka
bin/kafka-server-stop.sh
启动kafka
bin/kafka-server-start.sh -daemon config/server.properties
创建topic
bin/kafka-topics.sh --create --zookeeper ip1:2181,ip2:2181,ip3:2181/kafka --replication-factor 2 --partitions 2 --topic test
查看主题/消息
bin/kafka-topics.sh --list --zookeeper ip:2181
bin/kafka-topics.sh --describe --zookeeper ip:2181 --topic test
删除topic
bin/kafka-topics.sh --delete --zookeeper ip:2181 --topic test
发送主题消息,不显示消息,创建生产者
bin/kafka-console-producer.sh --broker-list ip1:9092,ip2:9092,ip3:9092 --topic test
消费主题消息,显示消息,创建消费者
bin/kafka-console-consumer.sh --bootstrap-server ip1:9092,ip2:9092,ip3:9092 --topic test