Failed to get physical plan for topology ‘WordCountTopology’
yitian@ubuntu:~/.heron/conf/aurora$ heron activate aurora/yitian/devel WordCountTopology
[2018-02-18 08:18:37 +0000] [INFO]: Using cluster definition in /home/yitian/.heron/conf/aurora
[2018-02-18 08:18:37 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Starting Curator client connecting to: heron01:2181
[2018-02-18 08:18:37 -0800] [INFO] org.apache.curator.framework.imps.CuratorFrameworkImpl: Starting
[2018-02-18 08:18:37 -0800] [INFO] org.apache.curator.framework.state.ConnectionStateManager: State change: CONNECTED
[2018-02-18 08:18:37 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Directory tree initialized.
[2018-02-18 08:18:37 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Checking existence of path: /home/yitian/heron/state/topologies/WordCountTopology
[2018-02-18 08:18:38 -0800] [WARNING] com.twitter.heron.spi.statemgr.SchedulerStateManagerAdaptor: Exception processing future: java.lang.RuntimeException: Failed to fetch data from path: /home/yitian/heron/state/pplans/WordCountTopology
[2018-02-18 08:18:38 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the CuratorClient to: heron01:2181
[2018-02-18 08:18:38 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the tunnel processes
[2018-02-18 08:18:38 +0000] [ERROR]: Failed to get physical plan for topology 'WordCountTopology'
[2018-02-18 08:18:38 +0000] [ERROR]: Failed to activate topology: WordCountTopology
解决方法:这里是因为heron submit失败。由于aurora instance处于PENDING状态,因此没有成功对heron task进行分配,造成Physical Plan没有生成。在拓扑成功提交之后,即可正确acitvated该拓扑。
HDFS Error
Master log:
18/02/18 07:16:09 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /home/yitian/heron/topologies/aurora/WordCountTopology-yitian-tag-0--590937850643635237.tar.gz._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1628)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3121)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3045)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:725)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:493)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2213)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1413)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1588)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1373)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:554)
copyFromLocal: File /home/yitian/heron/topologies/aurora/WordCountTopology-yitian-tag-0--590937850643635237.tar.gz._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
[2018-02-18 07:16:10 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the CuratorClient to: heron01:2181
[2018-02-18 07:16:10 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the tunnel processes
[2018-02-18 07:16:10 +0000] [ERROR]: Failed to upload the topology package at '/tmp/tmpcJkdON/topology.tar.gz' to: '/home/yitian/heron/topologies/aurora/WordCountTopology-yitian-tag-0--590937850643635237.tar.gz'
[2018-02-18 07:16:10 +0000] [ERROR]: Failed to launch topology 'WordCountTopology'
Slave log:
[DISK]file:/home/yitian/hadoop/hadoop-2.7.4/tmp/dfs/data/
java.io.IOException: Incompatible clusterIDs in /home/yitian/hadoop/hadoop-2.7.4/tmp/dfs/data: namenode clusterID = CID-f04bfe7d-c4b6-4ae5-9a08-cf8d55692d7a; datanode clusterID = CID-635be1f6-eabf-4245-ba0e-1bccc1f45b11
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:777)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:300)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:416)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:395)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:573)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1386)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1351)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:313)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:637)
at java.lang.Thread.run(Thread.java:748)
2018-02-18 07:25:36,446 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to heron01/192.168.201.131:9000. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:574)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1386)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1351)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:313)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:637)
at java.lang.Thread.run(Thread.java:748)
2018-02-18 07:25:36,447 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to heron01/192.168.201.131:9000
2018-02-18 07:25:36,550 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)
2018-02-18 07:25:38,551 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2018-02-18 07:25:38,554 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
解决方法: delete the tmp/ of hadoop in heron02
Mesos quorum error
Log file created at: 2018/02/17 08:25:45
Running on machine: ubuntu
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0217 08:25:45.890030 6973 main.cpp:369] EXIT with status 1: Need to specify --quorum for replicated log based registry when using ZooKeeper
解决方法:在集群中使用zookeeper作为state主机时,mesos中的配置quorum必须设置。
Vagrant up error
yitian@ubuntu:~/auroraclone/aurora$ vagrant up
Bringing machine 'devcluster' up with 'virtualbox' provider...
==> devcluster: Box 'apache-aurora/dev-environment' could not be found. Attempting to find and install...
devcluster: Box Provider: virtualbox
devcluster: Box Version: 0.0.14
==> devcluster: Loading metadata for box 'apache-aurora/dev-environment'
devcluster: URL: https://vagrantcloud.com/apache-aurora/dev-environment
==> devcluster: Adding box 'apache-aurora/dev-environment' (v0.0.14) for provider: virtualbox
devcluster: Downloading: https://vagrantcloud.com/apache-aurora/boxes/dev-environment/versions/0.0.14/providers/virtualbox.box
==> devcluster: Box download is resuming from prior download progress
devcluster: Progress: 99% (Rate: 3239k/s, Estimated time remaining: --:--:--==> devcluster: Successfully added box 'apache-aurora/dev-environment' (v0.0.14) for 'virtualbox'!
==> devcluster: Importing base box 'apache-aurora/dev-environment'...
==> devcluster: Matching MAC address for NAT networking...
==> devcluster: Checking if box 'apache-aurora/dev-environment' is up to date...
==> devcluster: Setting the name of the VM: aurora_devcluster_1518401875602_20147
==> devcluster: Clearing any previously set forwarded ports...
Vagrant is currently configured to create VirtualBox synced folders with
the `SharedFoldersEnableSymlinksCreate` option enabled. If the Vagrant
guest is not trusted, you may want to disable this option. For more
information on this option, please refer to the VirtualBox manual:
https://www.virtualbox.org/manual/ch04.html#sharedfolders
This option can be disabled globally with an environment variable:
VAGRANT_DISABLE_VBOXSYMLINKCREATE=1
or on a per folder basis within the Vagrantfile:
config.vm.synced_folder '/host/path', '/guest/path', SharedFoldersEnableSymlinksCreate: false
==> devcluster: Clearing any previously set network interfaces...
==> devcluster: Preparing network interfaces based on configuration...
devcluster: Adapter 1: nat
devcluster: Adapter 2: hostonly
==> devcluster: Forwarding ports...
devcluster: 22 (guest) => 2222 (host) (adapter 1)
==> devcluster: Running 'pre-boot' VM customizations...
==> devcluster: Booting VM...
There was an error while executing `VBoxManage`, a CLI used by Vagrant
for controlling VirtualBox. The command and stderr is shown below.
Command: ["startvm", "cb997990-9f9e-40c1-97f8-ca8ac7422288", "--type", "headless"]
Stderr: VBoxManage: error: VT-x is not available (VERR_VMX_NO_VMX)
VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component ConsoleWrap, interface IConsole
解决方法:运行如下命令进行检查,该主机是否支持虚拟技术
yitian@ubuntu:~/auroraclone/aurora$ sudo apt-get install cpu-checker
[sudo] password for yitian:
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
bsdtar bundler fonts-lato javascript-common libgmp-dev libgmpxx4ldbl
libjs-jquery libruby2.3 rake ruby ruby-bundler ruby-childprocess ruby-dev
ruby-did-you-mean ruby-domain-name ruby-erubis ruby-ffi ruby-http-cookie
ruby-i18n ruby-listen ruby-log4r ruby-mime-types ruby-minitest
ruby-molinillo ruby-net-http-persistent ruby-net-scp ruby-net-sftp
ruby-net-ssh ruby-net-telnet ruby-netrc ruby-nokogiri ruby-power-assert
ruby-rb-inotify ruby-rest-client ruby-sqlite3 ruby-test-unit ruby-thor
ruby-unf ruby-unf-ext ruby2.3 ruby2.3-dev rubygems-integration sqlite3
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
msr-tools
The following NEW packages will be installed:
cpu-checker msr-tools
0 upgraded, 2 newly installed, 0 to remove and 156 not upgraded.
Need to get 17.5 kB of archives.
After this operation, 87.0 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://us.archive.ubuntu.com/ubuntu xenial/main amd64 msr-tools amd64 1.3-2 [10.6 kB]
Get:2 http://us.archive.ubuntu.com/ubuntu xenial/main amd64 cpu-checker amd64 0.7-0ubuntu7 [6,862 B]
Fetched 17.5 kB in 1s (12.7 kB/s)
Selecting previously unselected package msr-tools.
(Reading database ... 222442 files and directories currently installed.)
Preparing to unpack .../msr-tools_1.3-2_amd64.deb ...
Unpacking msr-tools (1.3-2) ...
Selecting previously unselected package cpu-checker.
Preparing to unpack .../cpu-checker_0.7-0ubuntu7_amd64.deb ...
Unpacking cpu-checker (0.7-0ubuntu7) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up msr-tools (1.3-2) ...
Setting up cpu-checker (0.7-0ubuntu7) ...
yitian@ubuntu:~/auroraclone/aurora$ sudo kvm-ok
INFO: Your CPU does not support KVM extensions
KVM acceleration can NOT be used
结果为不支持。
Heron Uploader using HDFS error
yitian@ubuntu:~/heron$ heron submit aurora/yitian/devel --config-path ~/.heron/conf ~/.heron/examples/heron-api-examples.jar com.twitter.heron.examples.api.WordCountTopology WordCountTopology
[2018-02-18 04:00:19 +0000] [INFO]: Using cluster definition in /home/yitian/.heron/conf/aurora
[2018-02-18 04:00:19 +0000] [INFO]: Launching topology: 'WordCountTopology'
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/yitian/.heron/lib/uploader/heron-dlog-uploader.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/yitian/.heron/lib/statemgr/heron-zookeeper-statemgr.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.JDK14LoggerFactory]
[2018-02-18 04:00:20 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Starting Curator client connecting to: heron01:2181
[2018-02-18 04:00:20 -0800] [INFO] org.apache.curator.framework.imps.CuratorFrameworkImpl: Starting
[2018-02-18 04:00:20 -0800] [INFO] org.apache.curator.framework.state.ConnectionStateManager: State change: CONNECTED
[2018-02-18 04:00:20 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Directory tree initialized.
[2018-02-18 04:00:20 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Checking existence of path: /home/yitian/heron/state/topologies/WordCountTopology
-test: java.net.UnknownHostException: heron
Usage: hadoop fs [generic options] -test -[defsz] <path>
[2018-02-18 04:00:23 -0800] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: The destination directory does not exist. Creating it now at URI 'hdfs://heron/topologies'
-mkdir: java.net.UnknownHostException: heron
Usage: hadoop fs [generic options] -mkdir [-p] <path> ...
[2018-02-18 04:00:26 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the CuratorClient to: heron01:2181
[2018-02-18 04:00:26 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the tunnel processes
[2018-02-18 04:00:26 +0000] [ERROR]: Failed to create directory for topology package at URI 'hdfs://heron/topologies'
[2018-02-18 04:00:26 +0000] [ERROR]: Failed to launch topology 'WordCountTopology'
yitian@ubuntu:~/heron$
解决方法:
[2018-02-18 04:00:23 -0800] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: The destination directory does not exist. Creating it now at URI 'hdfs://heron/topologies'
-mkdir: java.net.UnknownHostException: heron
注意此句错误提示,在heron的配置文件中,即使使用了hdfs作为文件系统,但其中的路径不需要设置为:hdfs://前缀,只需要/../……这种形式。
HDFS安全模式
解决方法:
yitian@heron01:~/hadoop/hadoop-2.7.4$ bin/hadoop dfsadmin -safemode leave
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Safe mode is OFF