hdp3.1.5 ambari自定义组件重启报错问题

ambari默认是无法管理elasticsearch和flink的,在网上能搜到相关的第三方自定义组件。

es自定义组件安装参考链接:ElasticAmbari/README.md at master · ChengYingOpenSource/ElasticAmbari · GitHub

flink自定义组件安装参考链接:Ambari 2.7.5安装Flink1.13.2_韦不二的博客-CSDN博客

一、pid文件报错

但是在实际使用的过程中发现上述两个自定义组件都存在一个共同的问题,每次如果重启服务的时候都会报错,大概的报错内容如下:

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.1/services/FLINK/package/scripts/flink.py", line 173, in <module>
    Master().execute()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
    method(env)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 980, in restart
    self.stop(env)
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.1/services/FLINK/package/scripts/flink.py", line 98, in stop
    pid = str(sudo.read_file(status_params.flink_pid_file))
  File "/usr/lib/ambari-agent/lib/resource_management/core/sudo.py", line 151, in read_file
    with open(filename, "rb") as fp:
IOError: [Errno 2] No such file or directory: u'/var/run/flink/flink.pid'

 上述错误显示是存储pid的目录缺失了,如果尝试手动创建该目录则可以正常启动程序。因此初步猜测有可能是hdp3.1.5中ambari有所调整,自定义组件没有兼容新版本的变更。在实际应用中我们肯定不希望每次重启都手动创建一个pid存储目录。为了解决这个简单的小问题,只需要修改一下自定义组件的启动脚本即可。

定位到报错的脚本文件,在该脚本文件的启动函数中增加一行创建pid目录的命令即可:

Directory([status_params.flink_pid_dir],owner=params.flink_user,group=params.flink_group)
修改位置如下图所示/var/lib/ambari-agent/cache/stacks/HDP/3.1/services/FLINK/package/scripts/flink.py

elasticsearch组件同理:

我们可以在脚本路径/var/lib/ambari-agent/cache/common-services/ELASTICSEARCH/7.13.4/package/scripts/ElasticSearchService.py文件中,增加"self.__creatPidDirectory()"

    def __creatPidDirectory(self):
        import params
        name = os.path.dirname(params.elasticSearchPidFile)
        if not os.path.exists(name):
            os.makedirs(name, mode=0o755)
            Utils.chown(name, params.elasticSearchUser,params.elasticSearchGroup)

二、es的network.host参数的问题

默认es配置的network.host为0.0.0.0,在这种情况下,集群会出现如下的错误:

[2022-01-17T11:15:21,112][INFO ][o.e.g.DanglingIndicesState] [ubdi-hdp102] gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually
[2022-01-17T11:15:21,727][INFO ][o.e.n.Node               ] [ubdi-hdp102] initialized
[2022-01-17T11:15:21,727][INFO ][o.e.n.Node               ] [ubdi-hdp102] starting ...
[2022-01-17T11:15:21,742][INFO ][o.e.x.s.c.f.PersistentCache] [ubdi-hdp102] persistent cache index loaded
[2022-01-17T11:15:21,891][INFO ][o.e.t.TransportService   ] [ubdi-hdp102] publish_address {172.17.0.1:9300}, bound_addresses {[::]:9300}
[2022-01-17T11:15:22,232][INFO ][o.e.b.BootstrapChecks    ] [ubdi-hdp102] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2022-01-17T11:15:22,275][INFO ][o.e.c.c.Coordinator      ] [ubdi-hdp102] cluster UUID [0B9dcTUzTDyNUT0m77mFOQ]
[2022-01-17T11:15:23,322][WARN ][o.e.d.HandshakingTransportAddressConnector] [ubdi-hdp102] [connectToRemoteMasterNode[192.168.32.101:9300]] completed handshake with [{ubdi-hdp101}{f8Ug3BceQVO9OIWObkUN_Q}{TJFhBm6QQVCOrSrJ-GJGIQ}{172.17.0.1}{172.17.0.1:9300}{cdfhilmrstw}{ml.machine_memory=33736523776, rack=r1, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=1037959168, transform.node=true}] but followup connection failed
org.elasticsearch.transport.ConnectTransportException: [ubdi-hdp101][172.17.0.1:9300] handshake failed. unexpected remote node {ubdi-hdp102}{xOK0Nnj5TBCKo8P1g8fYsw}{iS9woTDDSRSr3kSOmErNJQ}{172.17.0.1}{172.17.0.1:9300}{cdfhilmrstw}{ml.machine_memory=33736523776, rack=r1, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=1037959168, transform.node=true}
        at org.elasticsearch.transport.TransportService.lambda$connectionValidator$5(TransportService.java:421) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.action.ActionListener$MappedActionListener.onResponse(ActionListener.java:95) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.transport.TransportService.lambda$handshake$8(TransportService.java:504) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.action.ActionListener$DelegatingFailureActionListener.onResponse(ActionListener.java:217) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:43) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1273) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1273) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:291) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:279) ~[elasticsearch-7.13.4.jar:7.13.4]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:673) [elasticsearch-7.13.4.jar:7.13.4]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_221]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_221]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_221]

查询相关资料发现这种情况是需要将network.host改成对应节点的ip,这种情况直接修改配置文件肯定是不行的(修改配置文件是将ip写死,不会随着不同机器改变),因此需要修改启动脚本

............................

 

主要修改了以上两处,目的是增加里一个读取当前机器ip的函数,并将“network.host” 设置成当前机器的iP地址,至此解决了该问题。截图增加代码代码如下:

import socket
import fcntl
import struct

def get_ip_address(ifname):
    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    return socket.inet_ntoa(fcntl.ioctl(
        s.fileno(),
        0x8915,  # SIOCGIFADDR
        struct.pack('256s', ifname[:15])
    )[20:24])


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值