【Flink原理和应用】: Flink Standalone模式的高可用性搭建

1. 前言

Flink的standalone模式的高可用部署,需要依赖于ZooKeeper和Hadoop的hdfs。本次集群部署选用的三台服务器ip为:

10.35.108.81
10.35.108.82
10.35.108.83

前期准备工作:

  1. ZooKeeper的部署工作完成:三台服务器集群,其中81是leader节点。82和83是follower节点。
  2. HDFS安装完成:83节点开启的NameNode和SecondaryNode。81和82开启DataNode。数据块的备份数是2。

搭好这些准备工作之后,我们就开始Flink的集群搭建了。

2. Flink standalone模式的集群搭建

安装目标

  1. 在81,82,83这三台服务器各自开启TaskManager。
  2. 在81,82,83这三台服务器各自开启JobManager。这三个JobManager要支持容错。

2.1. 配置域名

  1. 三台服务器都配置/etc/hosts:
10.35.108.83 master
10.35.108.82 node1
10.35.108.81 node2
  1. 三台服务器都配置./etc/hostname。
    这是每台分别配置。
  • 83配上:master
  • 82配上:node1
  • 81配上:node2

配上之后域名不会直接生效。需要reboot机器。

2.2. Flink安装包下载

官网下载Flink的hadoop版本:flink-1.7.0-bin-hadoop28-scala_2.11.tgz。并且解压:

tar zvxf flink-1.7.0-bin-hadoop28-scala_2.11.tgz
cd flink-1.7.0

三台服务器都要放置安装包

2.3 配置

公共配置说明

三台服务器做相同的配置如下:

  1. 配置安装目录里的conf/masters:
master:8081
node1:8081
node2:8081
  1. 配置安装目录里的conf/slaves:
master
node1
node2

各自配置说明
这里主要是配置下conf/flink-conf.yaml。

  1. 10.35.108.81服务器配置:
#============== 常用配置=========
jobmanager.rpc.address: 10.35.108.81
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
taskmanager.heap.mb: 4096
taskmanager.numberOfTaskSlots: 24
parallelism.default: 24

#================Web端配置========
jobmanager.web.address: 10.35.108.81
web.port: 8081

#===============高可用性配置=========
high-availability: zookeeper
high-availability.storageDir: hdfs:///flink/ha
high-availability.zookeeper.quorum: master:2181, node1:2181, node2:2181
high-availability.zookeeper.path.root: ./flink
high-availability.cluster-id: /cluster_one

#===============checkpoint========== 配置
state.backend: filesystem 
state.checkpoints.dir: hdfs:///flink-checkpoints
state.savepoints.dir: hdfs:///flink-savepoints
state.checkpoints.num-retained: 20
  1. 10.35.108.82服务器配置:
#============== 常用配置=========
jobmanager.rpc.address: 10.35.108.82
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
taskmanager.heap.mb: 4096
taskmanager.numberOfTaskSlots: 24
parallelism.default: 24

#================Web端配置========
jobmanager.web.address: 10.35.108.82
web.port: 8081

#===============高可用性配置=========
high-availability: zookeeper
high-availability.storageDir: hdfs:///flink/ha
high-availability.zookeeper.quorum: master:2181, node1:2181, node2:2181
high-availability.zookeeper.path.root: ./flink
high-availability.cluster-id: /cluster_one

#===============checkpoint========== 配置
state.backend: filesystem 
state.checkpoints.dir: hdfs:///flink-checkpoints
state.savepoints.dir: hdfs:///flink-savepoints
state.checkpoints.num-retained: 20
  1. 10.35.108.83服务器配置:
#============== 常用配置=========
jobmanager.rpc.address: 10.35.108.83
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
taskmanager.heap.mb: 4096
taskmanager.numberOfTaskSlots: 24
parallelism.default: 24

#================Web端配置========
jobmanager.web.address: 10.35.108.83
web.port: 8081

#===============高可用性配置=========
high-availability: zookeeper
high-availability.storageDir: hdfs:///flink/ha
high-availability.zookeeper.quorum: master:2181, node1:2181, node2:2181
high-availability.zookeeper.path.root: ./flink
high-availability.cluster-id: /cluster_one

#===============checkpoint========== 配置
state.backend: filesystem 
state.checkpoints.dir: hdfs:///flink-checkpoints
state.savepoints.dir: hdfs:///flink-savepoints
state.checkpoints.num-retained: 20

对于flink-conf.yaml文件的配置。除了jobmanager.rpc.addressjobmanager.web.address都各自配置自己的ip之外,其他的配置一模一样。

这里特别要注意下对于高可用性配置部分。其中high-availability,high-availability.storageDirhigh-availability.zookeeper.quorum这三项是必须配置的。后两项high-availability.zookeeper.path.roothigh-availability.cluster-id配置是可选的,但是建议最后是配置上固定的值。

2.4. 启动

在83服务器执行:

./bin/start-cluster.sh

打印日志如下:

Starting HA cluster with 3 masters.
Starting standalonesession daemon on host master.
Starting standalonesession daemon on host node1.
Starting standalonesession daemon on host node2.
Starting taskexecutor daemon on host node2.
Starting taskexecutor daemon on host node1.
Starting taskexecutor daemon on host master.

从日志可以清楚地看到,83节点启动flink集群,他会远程操作把82和81的jobmanager和taskmanager都起来。

但是要注意,如果这一步启动失败了,那请配置下免密登录。就是83启动程序时要免密登录到82和81,进行远程操作。

启动完成之后,通过访问Flink WebUI来查看flink的运行状态。

分别访问:http://10.35.108.81:8081, http://10.35.108.82:8081和http://10.35.108.83:8081发现。界面都会自动跳转到http://10.35.108.83:8081, 说明此时83是主节点。 可以发现jobmanager下面管理的3个jobmanager都运行正常。

总结

集群服务的高可用性环境搭建,这里其实还可以优化,那就是域名的优化。这里利用linux服务器自带的/etc/hosts来映射ip和域名。有如下的几点缺陷:

  1. 每台服务器都需要维护域名和ip的映射关系。这其实是一件很麻烦的事情。
  2. flink的jobmanager的主节点宕机之后,虽然备节点可以继续充当主节点的角色。但是对于上面的应用层还是无感知的。当jobmanager的主备发生切换了,上层应用不能通过一个固定的域名来访问flink服务。

对此。笔者,搭建了一套私有域名解析服务,对于构建集群环境大有裨益。可以参考:etcd+skydns构建私有域名解析服务器

有了这套私有域名解析服务之后,一方面,通过公共的域名管理来维护集群环境内的域名IP映射关系非常方便。另一方面,通过etcd的watch机制,可以让flink的jobmanager在主备切换的时候,让flink对外提供服务的域名所指向的ip切换到新的主节点,从而无缝地继续为上层业务提供服务。上层业务只要记住这个固定的flink服务域名即可。

已标记关键词 清除标记
##### 根据教程部署的hdfs,zookeeper,flink 集群。 ##### HDFS , zookeeper,工作正常,flink-standalone 启动正常。在搭建HA集群时,集群启动未报错,查看jps时发现没有进程,查看日志出现如下内容。(启动顺序zkServer.sh start ==> start-dfs.sh, start-yarn.sh bin/start-cluster.sh ) ##### 集群环境: Centos 7.3, Hadoop-2.8.5, java 1.8, scala-2.12, flink-1.9.0-2.12, zookeeper 3.4.14 ``` 2019-09-05 21:38:02,658 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -------------------------------------------------------------------------------- 2019-09-05 21:38:02,660 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting StandaloneSessionClusterEntrypoint (Version: 1.9.0, Rev:9c32ed9, Date:19.08.2019 @ 16:16:55 UTC) 2019-09-05 21:38:02,660 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS current user: root 2019-09-05 21:38:02,660 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current Hadoop/Kerberos user: <no hadoop dependency found> 2019-09-05 21:38:02,660 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.221-b11 2019-09-05 21:38:02,660 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum heap size: 989 MiBytes 2019-09-05 21:38:02,660 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JAVA_HOME: /usr/java/jdk1.8.0_221-amd64 2019-09-05 21:38:02,661 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - No Hadoop Dependency available 2019-09-05 21:38:02,661 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options: 2019-09-05 21:38:02,661 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xms1024m 2019-09-05 21:38:02,661 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx1024m 2019-09-05 21:38:02,661 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog.file=/opt/software/flink/log/flink-root-standalonesession-14-master.log 2019-09-05 21:38:02,661 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:/opt/software/flink/conf/log4j.properties 2019-09-05 21:38:02,661 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlogback.configurationFile=file:/opt/software/flink/conf/logback.xml 2019-09-05 21:38:02,661 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program Arguments: 2019-09-05 21:38:02,661 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --configDir 2019-09-05 21:38:02,661 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - /opt/software/flink/conf 2019-09-05 21:38:02,661 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --executionMode 2019-09-05 21:38:02,661 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - cluster 2019-09-05 21:38:02,661 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --host 2019-09-05 21:38:02,662 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - master 2019-09-05 21:38:02,662 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --webui-port 2019-09-05 21:38:02,662 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 8081 2019-09-05 21:38:02,662 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Classpath: /opt/software/flink/lib/flink-table_2.12-1.9.0.jar:/opt/software/flink/lib/flink-table-blink_2.12-1.9.0.jar:/opt/software/flink/lib/log4j-1.2.17.jar:/opt/software/flink/lib/slf4j-log4j12-1.7.15.jar:/opt/software/flink/lib/flink-dist_2.12-1.9.0.jar::/usr/hadoop/hadoop/etc/hadoop: 2019-09-05 21:38:02,662 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -------------------------------------------------------------------------------- 2019-09-05 21:38:02,663 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered UNIX signal handlers for [TERM, HUP, INT] 2019-09-05 21:38:02,696 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: env.java.home, /usr/java/jdk1.8.0_221-amd64 2019-09-05 21:38:02,696 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, master 2019-09-05 21:38:02,696 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2019-09-05 21:38:02,696 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m 2019-09-05 21:38:02,697 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m 2019-09-05 21:38:02,697 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2019-09-05 21:38:02,697 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2019-09-05 21:38:02,697 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability, zookeeper 2019-09-05 21:38:02,697 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.storageDir, hdfs:///flink/ha/ 2019-09-05 21:38:02,697 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.quorum, master:2181,slave02:2181,slave03:2181 2019-09-05 21:38:02,698 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.path.root, /flink 2019-09-05 21:38:02,698 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.cluster-id, /cluster_one 2019-09-05 21:38:02,698 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.client.acl, open 2019-09-05 21:38:02,698 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.execution.failover-strategy, region 2019-09-05 21:38:02,698 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.port, 8081 2019-09-05 21:38:02,699 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.address, master,slave03 2019-09-05 21:38:02,699 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.bind-port, 8080-8090 2019-09-05 21:38:02,699 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.bind-address, master,slave03 2019-09-05 21:38:02,699 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: web.submit.enable, false 2019-09-05 21:38:02,842 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting StandaloneSessionClusterEntrypoint. 2019-09-05 21:38:02,842 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install default filesystem. 2019-09-05 21:38:02,877 INFO org.apache.flink.core.fs.FileSystem - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available. 2019-09-05 21:38:02,903 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install security context. 2019-09-05 21:38:02,914 INFO org.apache.flink.runtime.security.modules.HadoopModuleFactory - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath. 2019-09-05 21:38:02,926 INFO org.apache.flink.runtime.security.SecurityUtils - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath. 2019-09-05 21:38:02,927 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Initializing cluster services. 2019-09-05 21:38:03,430 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Trying to start actor system at master:0 2019-09-05 21:38:04,268 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2019-09-05 21:38:04,314 INFO akka.remote.Remoting - Starting remoting 2019-09-05 21:38:04,566 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink@master:36882] 2019-09-05 21:38:04,674 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor system started at akka.tcp://flink@master:36882 2019-09-05 21:38:04,701 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Shutting StandaloneSessionClusterEntrypoint down with application status FAILED. Diagnostics java.io.IOException: Could not create FileSystem for highly available storage (high-availability.storageDir) at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:119) at org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:92) at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:120) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:292) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:257) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:202) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:164) at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:163) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:501) at org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint.main(StandaloneSessionClusterEntrypoint.java:65) Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:447) at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:359) at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298) at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:116) ... 10 more Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies. at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:58) at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:443) ... 13 more . 2019-09-05 21:38:04,708 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service. 2019-09-05 21:38:04,738 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon. 2019-09-05 21:38:04,738 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports. 2019-09-05 21:38:04,765 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down. 2019-09-05 21:38:04,815 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service. 2019-09-05 21:38:04,816 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Could not start cluster entrypoint StandaloneSessionClusterEntrypoint. org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint StandaloneSessionClusterEntrypoint. at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:182) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:501) at org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint.main(StandaloneSessionClusterEntrypoint.java:65) Caused by: java.io.IOException: Could not create FileSystem for highly available storage (high-availability.storageDir) at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:119) at org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:92) at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:120) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:292) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:257) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:202) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:164) at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:163) ... 2 more Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:447) at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:359) at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298) at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:116) ... 10 more Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies. at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:58) at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:443) ... 13 more [root@master log]# vim flink-root-standalonesession-14-master.log 2019-09-05 21:38:02,696 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: env.java.home, /usr/java/jdk1.8.0_221-amd64 2019-09-05 21:38:02,696 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, master 2019-09-05 21:38:02,696 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2019-09-05 21:38:02,696 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m 2019-09-05 21:38:02,697 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m 2019-09-05 21:38:02,697 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2019-09-05 21:38:02,697 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2019-09-05 21:38:02,697 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability, zookeeper 2019-09-05 21:38:02,697 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.storageDir, hdfs:///flink/ha/ 2019-09-05 21:38:02,697 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.quorum, master:2181,slave02:2181,slave03:2181 2019-09-05 21:38:02,698 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.path.root, /flink 2019-09-05 21:38:02,698 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.cluster-id, /cluster_one 2019-09-05 21:38:02,698 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.client.acl, open 2019-09-05 21:38:02,698 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.execution.failover-strategy, region 2019-09-05 21:38:02,698 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.port, 8081 2019-09-05 21:38:02,699 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.address, master,slave03 2019-09-05 21:38:02,699 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.bind-port, 8080-8090 2019-09-05 21:38:02,699 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.bind-address, master,slave03 2019-09-05 21:38:02,699 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: web.submit.enable, false 2019-09-05 21:38:02,842 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting StandaloneSessionClusterEntrypoint. 2019-09-05 21:38:02,842 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install default filesystem. 2019-09-05 21:38:02,877 INFO org.apache.flink.core.fs.FileSystem - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available. 2019-09-05 21:38:02,903 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install security context. 2019-09-05 21:38:02,914 INFO org.apache.flink.runtime.security.modules.HadoopModuleFactory - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath. 2019-09-05 21:38:02,926 INFO org.apache.flink.runtime.security.SecurityUtils - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath. 2019-09-05 21:38:02,927 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Initializing cluster services. 2019-09-05 21:38:03,430 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Trying to start actor system at master:0 2019-09-05 21:38:04,268 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2019-09-05 21:38:04,314 INFO akka.remote.Remoting - Starting remoting 2019-09-05 21:38:04,566 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink@master:36882] 2019-09-05 21:38:04,674 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor system started at akka.tcp://flink@master:36882 2019-09-05 21:38:04,701 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Shutting StandaloneSessionClusterEntrypoint down with application status FAILED. Diagnostics java.io.IOException: Could not create FileSystem for highly available storage (high-availability.storageDir) at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:119) at org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:92) at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:120) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:292) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:257) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:202) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:164) at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:163) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:501) at org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint.main(StandaloneSessionClusterEntrypoint.java:65) Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:447) at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:359) at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298) at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:116) ... 10 more Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies. at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:58) at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:443) ... 13 more . 2019-09-05 21:38:04,708 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service. 2019-09-05 21:38:04,738 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon. 2019-09-05 21:38:04,738 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports. 2019-09-05 21:38:04,765 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down. 2019-09-05 21:38:04,815 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service. 2019-09-05 21:38:04,816 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Could not start cluster entrypoint StandaloneSessionClusterEntrypoint. org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint StandaloneSessionClusterEntrypoint. at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:182) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:501) at org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint.main(StandaloneSessionClusterEntrypoint.java:65) Caused by: java.io.IOException: Could not create FileSystem for highly available storage (high-availability.storageDir) at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:119) at org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:92) at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:120) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:292) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:257) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:202) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:164) at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:163) ... 2 more Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:447) at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:359) at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298) at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:116) ... 10 more Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies. at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:58) at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:443) ... 13 more ``` ##### 环境变量已经配置如下(/etc/profile): ``` export JAVA_HOME=/usr/java/jdk1.8.0_221-amd64 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:/lib export PATH=$PATH:$JAVA_HOME/bin:. export ZOOKEEPER_HOME=/opt/software/zookeeper export PATH=$PATH:$ZOOKEEPER_HOME/bin:. export HADOOP_HOME=/usr/hadoop/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:. export PYTHON_HOME=/usr/local/python3 export PATH=$PATH:$PYTHON_HOME/bin:. export SCALA_HOME=/usr/local/scala export PATH=$PATH:/usr/local/scala/bin export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_HOME=$HADOOP_HOME export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server:/usr/local/lib:/usr/hadoop/hadoop/lib/native ``` ##### flinl-conf.yaml HA配置: ``` high-availability: zookeeper high-availability.storageDir: hdfs:///flink/ha/ high-availability.zookeeper.quorum: master:2181,slave02:2181,slave03:2181 high-availability.zookeeper.path.root: /flink high-availability.cluster-id: /cluster_one ``` ##### 因日志中提到如下信息,猜测可能是环境变量或者hadoop 依赖路径的问题。 ``` 2019-09-06 11:44:39,820 INFO org.apache.flink.core.fs.FileSystem - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available. …… 2019-09-06 11:44:41,943 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Shutting StandaloneSessionClusterEntrypoint down with application status FAILED. Diagnostics java.io.IOException: Could not create FileSystem for highly available storage (high-availability.storageDir) …… Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. ^ Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies. …… 2019-09-06 11:44:42,075 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Could not start cluster entrypoint StandaloneSessionClusterEntrypoint. org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint StandaloneSessionClusterEntrypoint. …… Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. …… Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies. ``` ##### 接下来在flink-conf.yaml文件中添加hdfs配置。 ``` fs.hdfs.hadoopconf: /usr/hadoop/hadoop/etc/hadoop fs.hdfs.hdfsdefault: /usr/hadoop/hadoop/etc/hadoop/hdfs-default.xml fs.hdfs.hdfssite: /usr/hadoop/hadoop/etc/hadoop/hdfs-site.xml ``` ##### flink集群依然无法按启动,日志内容与之前没有差别。 ##### 小弟在此向社区各路大神求教,如果有遇到相关情况,是否有解决办法。 ##### 非常感谢 ##### ps:flink on yarn 搭建方式尚未尝试。
©️2020 CSDN 皮肤主题: 技术黑板 设计师:CSDN官方博客 返回首页