Flink发布了新版本1.11.0,增加了很多重要新特性,包括增加了对Hadoop3.0.0以及更高版本Hadoop的支持,不再提供“flink-shaded-hadoop-*” jars,而是通过配置YARN_CONF_DIR或者HADOOP_CONF_DIR和HADOOP_CLASSPATH环境变量完成与yarn集群的对接。
具体步骤如下:
- 下载Flink1.11.0安装包
flink安装包下载地址 - 确保安装有Hadoop集群,版本至少Hadoop 2.4.1
- 配置环境变量
增加环境变量如下:
HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_CLASSPATH=`hadoop classpath`
- 启动Hadoop集群(HDFS和Yarn)
- 解压Flink安装包,并进入conf目录,修改flink-conf.yaml文件,修改以下配置,若在提交命令中不特定指明,这些配置将作为默认配置
# The total process memory size for the JobManager.
#
# Note this accounts for all memory usage within the JobManager process, including JVM metaspace and other overhead.
jobmanager.memory.process.size: 1600m
# The total process memory size for the TaskManager.
#
# Note this accounts for all memory usage within the TaskManager process, including JVM metaspace and other overhead.
taskmanager.memory.process.size: 1728m
# To exclude JVM metaspace and overhead, please, use total Flink memory size instead of 'taskmanager.memory.process.size'.
# It is not recommended to set both 'taskmanager.memory.process.size' and Flink memory.
#
# taskmanager.memory.flink.size: 1280m
# The number of task slots that each TaskManager offers. Each slot runs one parallel pipeline.
taskmanager.numberOfTaskSlots: 8
# The parallelism used for programs that did not specify and other parallelism.
parallelism.default: 1
- 向yarn集群申请资源
执行以下命令向yarn集群申请资源
bin/yarn-session -nm test
可以使用以下参数:
-D <arg> Dynamic properties
-d,--detached Start detached
-jm,--jobManagerMemory <arg> Memory for JobManager Container with optional unit (default: MB)
-nm,--name Set a custom name for the application on YARN
-at,--applicationType Set a custom application type on YARN
-q,--query Display available YARN resources (memory, cores)
-qu,--queue <arg> Specify YARN queue.
-s,--slots <arg> Number of slots per TaskManager
-tm,--taskManagerMemory <arg> Memory per TaskManager Container with optional unit (default: MB)
-z,--zookeeperNamespace <arg> Namespace to create the Zookeeper sub-paths for HA mode
- 提交任务
yarn session启动之后会给出一个web UI地址以及一个yarn application ID,用户可以通过web UI或者命令行两种方式提交任务。
web UI提交任务比较简单,通过命令行提交需要执行以下命令
bin/flink run ./examples/batch/WordCount.jar
可选参数如下:
-c,--class <classname> Class with the program entry point ("main"
method or "getPlan()" method. Only needed
if the JAR file does not specify the class
in its manifest.
-m,--jobmanager <host:port> Address of the JobManager to
which to connect. Use this flag to connect
to a different JobManager than the one
specified in the configuration.
-p,--parallelism <parallelism> The parallelism with which to run the
program. Optional flag to override the
default value specified in the
configuration
客户端可以自行确定JobManager的地址,也可以通过-m或者–jobmanager 参数指定JobManager的地址,JobManager的地址在yarn session的启动页面中可以找到。