dolphinscheduler 集成flink

本文介绍了在dolphinscheduler 1.3.6版本中集成flink 1.13.1时遇到的主要问题,包括classloader.check-leaked-classloader错误以及HADOOP_CLASSPATH环境变量的设置。解决方法包括在flink-conf.yaml中禁用classloader检查,并在dolphinscheduler worker节点上安装和配置hadoop,以及在提交任务前设置HADOOP_CLASSPATH。此外,还提到在遇到非叶节点队列错误时,需要通过自定义参数来指定正确的YARN队列。
摘要由CSDN通过智能技术生成

版本

  • dolphinscheduler 1.3.6
  • hadoop 3.2.1
  • flink 1.13.1

主要问题

classloader.check-leaked-classloader

使用默认配置(将flink copy到worker节点的/opt/soft/flink下),会遇到这个错误

Exception in thread "Thread-5" java.lang.IllegalStateException: Trying to access closed classloader. Please check if you store classloaders directly or indirectly in static fields. If the stacktrace suggests that the leak occurs in a third party library and cannot be fixed immediately, you can disable this check with the configuration 'classloader.check-leaked-classloader'.
        at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
        at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:183)
        at org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2780)
        at org.apache.hadoop.conf.Configuration.getStreamReader(Configuration.java:3036)
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2995)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968)
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:1200)
        at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1812)
        at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1789)

只需要按提示在flink-conf.yaml里面增加“classloader.check-leaked-classloader: false”就可以

[root@37e3e6d56452 flink]# tail  conf/flink*.yaml  
# The port under which the web-based HistoryServer listens.
#historyserver.web.port: 8082

# Comma separated list of directories to monitor for completed jobs.
#historyserver.archive.fs.dir: hdfs:///completed-jobs/

# Interval in milliseconds for refreshing the monitored directories.
#historyserver.archive.fs.refresh-interval: 10000
classloader.check-leaked-classloader: false

HADOOP_CLASSPATH environment

在1.13.1 flink 版本,需要设置HADOOP_CLASSPATH 才可以提交到yarn,需要做2步

  1. dolphinscheduler worker 节点同步安装hadoop并正确配置
  2. 提交前需要设置HADOOP_CLASSPATH,建议修改bin/flink 脚本,在第一行增加export HADOOP_CLASSPATH=hadoop classpath

以下为修改后的 bin/flink文件示意

[root@37e3e6d56452 flink]# head -n 30 bin/flink
#!/usr/bin/env bash
################################################################################
#  Licensed to the Apache Software Foundation (ASF) under one
#  or more contributor license agreements.  See the NOTICE file
#  distributed with this work for additional information
#  regarding copyright ownership.  The ASF licenses this file
#  to you under the Apache License, Version 2.0 (the
#  "License"); you may not use this file except in compliance
#  with the License.  You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
# limitations under the License.
################################################################################

export HADOOP_CLASSPATH=`hadoop classpath`
target="$0"
# For the case, the executable has been directly symlinked, figure out
# the correct bin path by following its symlink up to an upper bound.
# Note: we can't use the readlink utility here if we want to be POSIX
# compatible.
iteration=0
while [ -L "$target" ]; do
    if [ "$iteration" -gt 100 ]; then
        echo "Cannot resolve path: You have a cyclic symlink in $target."
        break

non-leaf queue

还可能遇到如下错误

Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1626093777623_0001 to YARN : Application application_1626093777623_0001 submitted by user : root to non-leaf queue : root
		at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:327)
		at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:1178)
		at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:593)
		at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:474)

分析日志,可以看到如下flink执行代码

flink run -m yarn-cluster -ys 1 -ynm WordCount -yjm 1G -ytm 1G -yqu root -p 1 -sae -c org.apache.flink.examples.java.wordcount.WordCount WordCount.jar --input  hdfs:///griffin/env.json    --output hdfs:///opt/dol-wcoutput/

注意-yqu,执行bin/flink run -h可以看到如下解释

 -yqu,--yarnqueue <arg>               Specify YARN queue.

查看yarn queue配置
在这里插入图片描述
可以看到root下面有子queue,因此报 non-leaf 错误,可以通过自定义参数解决
在这里插入图片描述
配置后查看日志,可以看到提交yarn的命令修改为

flink run -m yarn-cluster -ys 1 -ynm WordCount -yjm 1G -ytm 1G -p 1 -sae -yqu default -c org.apache.flink.examples.java.wordcount.WordCount WordCount.jar --input  hdfs:///griffin/env.json    --output hdfs:///opt/dol-wcoutput/
Apache DolphinScheduler是一个新一代分布式大数据工作流任务调度系统,致力于“解决大数据任务之间错综复杂的依赖关系,整个数据处理开箱即用”。它以 DAG(有向无环图) 的方式将任务连接起来,可实时监控任务的运行状态,同时支持重试、从指定节点恢复失败、暂停及 Kill任务等操作。目前已经有像IBM、腾讯、美团、360等400多家公司生产上使用。 调度系统现在市面上的调度系统那么多,比如老牌的Airflow, Oozie,Kettle,xxl-job ,Spring Batch等等, 为什么要选DolphinSchedulerDolphinScheduler 的定位是大数据工作流调度。通过把大数据和工作流做了重点标注. 从而可以知道DolphinScheduler的定位是针对于大数据体系。 DolphinScheduler是非常强大的大数据调度工具,有以下一些特点:1、通过拖拽以DAG 图的方式将 Task 按照任务的依赖关系关联起来,可实时可视化监控任务的运行状态;2、支持丰富的任务类型;3、支持工作流定时调度、依赖调度、手动调度、手动暂停/停止/恢复,同时支持失败重试/告警、从指定节点恢复失败、Kill 任务等操作;4、支持工作流全局参数及节点自定义参数设置;5、支持集群HA,通过 Zookeeper实现 Master 集群和 Worker 集群去中心化;6、支持工作流运行历史树形/甘特图展示、支持任务状态统计、流程状态统计;7、支持补数,并行或串行回填数据。课程会带大家构建DolphinScheduler大数据调度平台,实战讲解多种任务调度配置,会基于案例讲解DolphinScheduler使用,让大家在实战中掌握DolphinScheduler。 DolphinScheduler 发展很快 很多公司调度都切换到了DolphinScheduler,掌握DolphinScheduler调度使用势在必行,抓住新技术机遇,为跳巢涨薪做好准备。
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

weixin_40455124

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值