提交Heron Topology后Auroa, Mesos的运行情况梳理(问题及解决)

集群中Mesos和Aurora的配置情况

拓扑提交命令

yitian@heron04:~$ heron submit aurora/yitian/devel --config-path ~/.heron/conf ~/.heron/examples/heron-api-examples.jar com.twitter.heron.examples.api.WordCountTopology WordCountTopology --deploy-deactivated
[2018-03-12 06:35:51 +0000] [INFO]: Using cluster definition in /home/yitian/.heron/conf/aurora
[2018-03-12 06:35:52 +0000] [INFO]: Launching topology: 'WordCountTopology'
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/yitian/.heron/lib/uploader/heron-dlog-uploader.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/yitian/.heron/lib/statemgr/heron-zookeeper-statemgr.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.JDK14LoggerFactory]
[2018-03-12 06:35:53 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Starting Curator client connecting to: heron04:2181 
[2018-03-12 06:35:53 -0700] [INFO] org.apache.curator.framework.imps.CuratorFrameworkImpl: Starting 
[2018-03-12 06:35:53 -0700] [INFO] org.apache.curator.framework.state.ConnectionStateManager: State change: CONNECTED 
[2018-03-12 06:35:53 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Directory tree initialized. 
[2018-03-12 06:35:53 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Checking existence of path: /heron/topologies/WordCountTopology 
[2018-03-12 06:35:57 -0700] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: Target topology file already exists at '/heron/topologies/aurora/WordCountTopology-yitian-tag-0-4050739266681926687.tar.gz'. Overwriting it now 
[2018-03-12 06:35:57 -0700] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: Uploading topology package at '/tmp/tmpSkEzuj/topology.tar.gz' to target HDFS at '/heron/topologies/aurora/WordCountTopology-yitian-tag-0-4050739266681926687.tar.gz' 
[2018-03-12 06:36:01 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/topologies/WordCountTopology 
[2018-03-12 06:36:01 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/packingplans/WordCountTopology 
[2018-03-12 06:36:01 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/executionstate/WordCountTopology 
[2018-03-12 06:36:02 -0700] [INFO] com.twitter.heron.scheduler.aurora.AuroraLauncher: Launching topology in aurora 
[2018-03-12 06:36:02 -0700] [INFO] com.twitter.heron.scheduler.utils.SchedulerUtils: Updating scheduled-resource in packing plan: WordCountTopology 
[2018-03-12 06:36:02 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/packingplans/WordCountTopology 
[2018-03-12 06:36:02 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/packingplans/WordCountTopology  
  INFO] Creating job WordCountTopology

问题解决:Heron集群搭建完成后拓扑提交问题及解决

Mesos运行情况

查看agent主机中的运行情况:

image

其中kill的命令是最后运行的。在kill之前的任务状态都是failed。

image

在browse中可以查看该任务的详细运行日志,如下:

image

在SandBox中可以查看该任务的运行详情日志:

image

点击查看stderr文件,内容如下:

Log file created at: 2018/03/12 05:05:25
Running on machine: heron06
[DIWEF]mmdd hh:mm:ss.uuuuuu pid file:line] msg
Command line: /home/yitian/mesosdata/run/slaves/0f1a6aac-4d22-40f6-a8a1-1044bcd0a605-S0/frameworks/6663765c-74c6-4af4-8d75-18a8e11ad493-0000/executors/thermos-yitian-devel-WordCountTopology-0-ddf849f3-c077-48ba-b2f4-bc3d8b943156/runs/75bd9591-854a-47d0-8f16-6f96e2aa1cee/thermos_runner.pex --setuid=yitian --task_id=yitian-devel-WordCountTopology-0-ddf849f3-c077-48ba-b2f4-bc3d8b943156 --log_to_disk=DEBUG --hostname=heron06 --thermos_json=/home/yitian/mesosdata/run/slaves/0f1a6aac-4d22-40f6-a8a1-1044bcd0a605-S0/frameworks/6663765c-74c6-4af4-8d75-18a8e11ad493-0000/executors/thermos-yitian-devel-WordCountTopology-0-ddf849f3-c077-48ba-b2f4-bc3d8b943156/runs/75bd9591-854a-47d0-8f16-6f96e2aa1cee/task.json --sandbox=/home/yitian/mesosdata/run/slaves/0f1a6aac-4d22-40f6-a8a1-1044bcd0a605-S0/frameworks/6663765c-74c6-4af4-8d75-18a8e11ad493-0000/executors/thermos-yitian-devel-WordCountTopology-0-ddf849f3-c077-48ba-b2f4-bc3d8b943156/runs/75bd9591-854a-47d0-8f16-6f96e2aa1cee/sandbox --log_dir=/home/yitian/mesosdata/run/slaves/0f1a6aac-4d22-40f6-a8a1-1044bcd0a605-S0/frameworks/6663765c-74c6-4af4-8d75-18a8e11ad493-0000/executors/thermos-yitian-devel-WordCountTopology-0-ddf849f3-c077-48ba-b2f4-bc3d8b943156/runs/75bd9591-854a-47d0-8f16-6f96e2aa1cee --checkpoint_root=/home/yitian/mesosdata/run/slaves/0f1a6aac-4d22-40f6-a8a1-1044bcd0a605-S0/frameworks/6663765c-74c6-4af4-8d75-18a8e11ad493-0000/executors/thermos-yitian-devel-WordCountTopology-0-ddf849f3-c077-48ba-b2f4-bc3d8b943156/runs/75bd9591-854a-47d0-8f16-6f96e2aa1cee/checkpoints --container_sandbox=/home/yitian/mesosdata/run/slaves/0f1a6aac-4d22-40f6-a8a1-1044bcd0a605-S0/frameworks/6663765c-74c6-4af4-8d75-18a8e11ad493-0000/executors/thermos-yitian-devel-WordCountTopology-0-ddf849f3-c077-48ba-b2f4-bc3d8b943156/runs/75bd9591-854a-47d0-8f16-6f96e2aa1cee/sandbox --port=port4:31665 --port=http:31795 --port=metricscachemgr_masterport:31451 --port=yourkit:31819 --port=aurora:31795 --port=metricscachemgr_statsport:31052 --port=scheduler:31768 --port=ckptmgr_port:31438 --port=port2:31209 --port=port3:31829 --port=port1:31471
Log file created at: 2018/03/12 05:05:25
Running on machine: heron06
[DIWEF]mmdd hh:mm:ss.uuuuuu pid file:line] msg
Command line: /home/yitian/mesosdata/run/slaves/0f1a6aac-4d22-40f6-a8a1-1044bcd0a605-S0/frameworks/6663765c-74c6-4af4-8d75-18a8e11ad493-0000/executors/thermos-yitian-devel-WordCountTopology-0-ddf849f3-c077-48ba-b2f4-bc3d8b943156/runs/75bd9591-854a-47d0-8f16-6f96e2aa1cee/thermos_runner.pex --setuid=yitian --task_id=yitian-devel-WordCountTopology-0-ddf849f3-c077-48ba-b2f4-bc3d8b943156 --log_to_disk=DEBUG --hostname=heron06 --thermos_json=/home/yitian/mesosdata/run/slaves/0f1a6aac-4d22-40f6-a8a1-1044bcd0a605-S0/frameworks/6663765c-74c6-4af4-8d75-18a8e11ad493-0000/executors/thermos-yitian-devel-WordCountTopology-0-ddf849f3-c077-48ba-b2f4-bc3d8b943156/runs/75bd9591-854a-47d0-8f16-6f96e2aa1cee/task.json --sandbox=/home/yitian/mesosdata/run/slaves/0f1a6aac-4d22-40f6-a8a1-1044bcd0a605-S0/frameworks/6663765c-74c6-4af4-8d75-18a8e11ad493-0000/executors/thermos-yitian-devel-WordCountTopology-0-ddf849f3-c077-48ba-b2f4-bc3d8b943156/runs/75bd9591-854a-47d0-8f16-6f96e2aa1cee/sandbox --log_dir=/home/yitian/mesosdata/run/slaves/0f1a6aac-4d22-40f6-a8a1-1044bcd0a605-S0/frameworks/6663765c-74c6-4af4-8d75-18a8e11ad493-0000/executors/thermos-yitian-devel-WordCountTopology-0-ddf849f3-c077-48ba-b2f4-bc3d8b943156/runs/75bd9591-854a-47d0-8f16-6f96e2aa1cee --checkpoint_root=/home/yitian/mesosdata/run/slaves/0f1a6aac-4d22-40f6-a8a1-1044bcd0a605-S0/frameworks/6663765c-74c6-4af4-8d75-18a8e11ad493-0000/executors/thermos-yitian-devel-WordCountTopology-0-ddf849f3-c077-48ba-b2f4-bc3d8b943156/runs/75bd9591-854a-47d0-8f16-6f96e2aa1cee/checkpoints --container_sandbox=/home/yitian/mesosdata/run/slaves/0f1a6aac-4d22-40f6-a8a1-1044bcd0a605-S0/frameworks/6663765c-74c6-4af4-8d75-18a8e11ad493-0000/executors/thermos-yitian-devel-WordCountTopology-0-ddf849f3-c077-48ba-b2f4-bc3d8b943156/runs/75bd9591-854a-47d0-8f16-6f96e2aa1cee/sandbox --port=port4:31665 --port=http:31795 --port=metricscachemgr_masterport:31451 --port=yourkit:31819 --port=aurora:31795 --port=metricscachemgr_statsport:31052 --port=scheduler:31768 --port=ckptmgr_port:31438 --port=port2:31209 --port=port3:31829 --port=port1:31471
E0312 05:05:38.146308 2767 runner.py:299] Regular plan unhealthy!

image

在点击sandbox中,最深层目录中的错误日志内容,如上。为什么?

该问题解决方法:成功启动集群-解决“Regular plan unhealthy!” 问题

Aurora运行情况

查看heron04:8081:

image

拓扑中包含的两个instance处于如下状态:

image

THEROTTLED为“节流状态”,这个状态的意义是什么?为什么会出现这个状态?

查看已完成的任务:

image

上图中可以看到,instance 0的运行状态的改变,但最终为Failed状态。

  • 其中的No health-check defined, task is assumed healthy.是什么意思?
  • 而且右侧的heron06点击后,找不到页面?WHY?

image

问题解决:

Heron运行情况

heron-tracker运行情况:

image

heron-ui运行情况:

页面长时间无响应,其实时相应时间很长,在解决了上述问题之后,仍然相应很慢?WHY?

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值