【推荐】Java for flink 学习系列:从入门到精通

Flink 官方doc地址:  https://ci.apache.org/projects/flink/flink-docs-release-1.8/

官方下载地址:https://flink.apache.org/downloads.html

A. 运行部署部份

一般来说,flink部署常见的有三种模式,下面就举例就如何启动最简单的example wordcount来说说。

其中最多的是第三种。

  •  在Linux/Windows上 单机运行

 1)Flink入门系列1----windows安装flink并启动

 2) Flink入门系列2----Linux安装flink并启动

 

  • 在Linux 上 以 standalone 集群 方式运行

 1) Flink入门系列3----多机部署standalone集群

2) Flink Standalone 集群部署详细步骤

 

  • 在Hadoop Yarn 运行

Yarn上面配置HA模式有两种:session方式和单个job方式。

  1. session 方式

首先,你必须设置  YARN_CONF_DIR 或者 HADOOP_CONF_DIR 环境变量去指向hadoop 的配置文件目录。

#vi /etc/profile

然后加入下面的一行

export HADOOP_CONF_DIR=/etc/hadoop/conf

# . /etc/profile

或者直接在shell console中,执行下面的语句也行。

#  export HADOOP_CONF_DIR=/etc/hadoop/conf

目的就是让flink yarn client能够找到 hadoop集群的配置信息。

 

其次:要如果在CDH,你要设置用户为hdfs (因为CDH就是用这个用户管理hdfs的,否则就会出现读写HDFS 权限问题)

 

# vi yarn-session.sh

# add user begin
export HADOOP_USER_NAME=hdfs
#add user end

JVM_ARGS="$JVM_ARGS -Xmx512m"

CC_CLASSPATH=`manglePathList $(constructFlinkClassPath):$INTERNAL_HADOOP_CLASSPATHS`

  Step 1: 生成session

// cd <你的flink 安装目录>
# cd flink-1.8.1
# bin/yarn-session.sh -n 3 -s 3 -jm 1024  -tm 1024 -nm pinpoint-flink-job

//example output 如下

[root@sltuenym6bh flink-1.8.1]# bin/yarn-session.sh -n 3 -s 3 -jm 1024  -tm 1024 -nm pinpoint-flink-job
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/tanghb2/flink/flink-1.8.1/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2019-08-25 11:02:49,561 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, localhost
2019-08-25 11:02:49,564 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2019-08-25 11:02:49,564 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.size, 1024m
2019-08-25 11:02:49,564 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.size, 1024m
2019-08-25 11:02:49,564 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2019-08-25 11:02:49,564 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 1
2019-08-25 11:02:50,121 INFO  org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user set to hdfs (auth:SIMPLE)
2019-08-25 11:02:50,346 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - The argument n is deprecated in will be ignored.
2019-08-25 11:02:50,494 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Cluster specification: ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1024, numberTaskManagers=3, slotsPerTaskManager=3}
2019-08-25 11:02:50,950 WARN  org.apache.hadoop.util.NativeCodeLoader                       - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-08-25 11:02:50,967 WARN  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - The configuration directory ('/home/tanghb2/flink/flink-1.8.1/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them.
2019-08-25 11:02:52,178 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Submitting application master application_1565019835797_0516
2019-08-25 11:02:52,204 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1565019835797_0516
2019-08-25 11:02:52,204 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Waiting for the cluster to be allocated
2019-08-25 11:02:52,206 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Deploying cluster, current state ACCEPTED
2019-08-25 11:02:57,250 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - YARN application has been deployed successfully.
2019-08-25 11:02:57,771 INFO  org.apache.flink.runtime.rest.RestClient                      - Rest client endpoint started.
Flink JobManager is now running on sltlrdwgty2.novalocal:44648 with leader id 00000000-0000-0000-0000-000000000000.
JobManager Web Interface: http://sltlrdwgty2.novalocal:44648

注意上面给出了Job manager GUI地址:

Flink JobManager is now running on sltlrdwgty2.novalocal:44648 with leader id 00000000-0000-0000-0000-000000000000.
JobManager Web Interface: http://sltlrdwgty2.novalocal:44648

你的输出和你的YARN 集群配置有关。你可以根据这个GUI 地址去查看Flink Job mgr的信息了,样子如下。

 

注意:部署长期运行的flink on yarn实例后,在flink web上看到的TaskManager以及Slots都为0。只有在提交任务的时候,才会依据分配资源给对应的任务执行。 

Step 2: 提交Job到长期运行的flink on yarn实例上   

# cd <你的flink安装目录>
# wget -O LICENSE-2.0.txt http://www.apache.org/licenses/LICENSE-2.0.txt
# hadoop fs -put ./LICENSE-2.0.txt hdfs://nameservice1:/billflink/
# bin/flink run ./examples/batch/WordCount.jar --input hdfs://nameservice1:/billflink/LICENSE-2.0.txt --output  hdfs://nameservice1:/billflink/wordcount_result.txt

 

输出的样子如下

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/tanghb2/flink/flink-1.8.1/lib/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2019-08-25 11:22:22,965 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - Found Yarn properties file under /tmp/.yarn-properties-root.
2019-08-25 11:22:22,965 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - Found Yarn properties file under /tmp/.yarn-properties-root.
2019-08-25 11:22:23,385 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - YARN properties set default parallelism to 9
2019-08-25 11:22:23,385 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - YARN properties set default parallelism to 9
YARN properties set default parallelism to 9
2019-08-25 11:22:23,523 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2019-08-25 11:22:23,523 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2019-08-25 11:22:23,588 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Found application JobManager host name 'sltlrdwgty2.novalocal' and port '44648' from supplied application id 'application_1565019835797_0516'
Starting execution of program
Program execution finished
Job with JobID f391357c52e0be2d518d743f858fb15e has finished.
Job Runtime: 10516 ms

注:输出有下面的信息:2019-08-25 11:33:40,546 INFO  org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Found application JobManager host name 'sltlrdwgty2.novalocal' and port '44648' from supplied application id 'application_1565019835797_0516'。

所以上面提交job时候,不需要指定job manager的地址,应该flink clinet能找到:job manager 和 application master是在同一个主机上。

去观察flink job manager GUI,如下所示。可以看到有作业提交和完成了。

# hadoop fs -text hdfs://nameservice1:/billflink/wordcount_result.txt
0 3
1 2
2 4
2004 1
3 1
4 1
5 1

可以看到有统计的结果了。

然后可以删除掉上次的统计结果,再次提交作业

# hadoop fs -rm hdfs://nameservice1:/billflink/wordcount_result.txt
# bin/flink run ./examples/batch/WordCount.jar --input hdfs://nameservice1:/billflink/LICENSE-2.0.txt --output  hdfs://nameservice1:/billflink/wordcount_result.txt

这时候,flink job manager 的GUI如下所示,显示有两个作业完成了。

演示完毕,更多的信息可以看下面的文章。

1. 官方讲解如何配置FLINK YARN HA

2. 官方讲解如何在YARN上运行flink

一篇讲得还可以的文章如下:

3)  Apache Flink On Yarn模式高可用(HA)集群部署

 

请注意: client必须要设置YARN_CONF_DIR或者HADOOP_CONF_DIR环境变量,通过这个环境变量来读取YARN和HDFS的配置信息,否则启动会失败。经试验发现,其实如果配置的有HADOOP_HOME环境变量的话也是可以的。HADOOP_HOME ,YARN_CONF_DIR,HADOOP_CONF_DIR 只要配置的有任何一个即可
 

当然,flink还支持在 Kubernetes (k8s),Docker, AWS, Aliyun等上面运行,详见情况见(点击这里的官方文档)。

 

B:系统学习部分

整个系统学习,见下面的文章,感觉不错,就当键盘侠收藏一下!

flink学习之一---准备工作

flink学习之二-入门版概念

flink学习之三--引入spring

flink学习之四-使用kafka作为数据源

flink学习之五-数据持久化to-mysql

flink学习之六-数据持久化to-kafka

flink学习之八-keyby&reduce

flink学习之九-window & Time概念理解

flink学习之十-window&ProcessingTime实例

flink学习之十一-window&EventTime实例

flink学习之十二-eventTime中的watermark

官方资料:

https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/yarn_setup.html#start-a-long-running-flink-cluster-on-yarn

flink的运行方式:

https://www.jianshu.com/p/c47e8f438291

C:flink1.8填坑笔记

1. Flink-1.8 在 Hadoop yarn cluster 的坑

 

 

  • 4
    点赞
  • 29
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值