情景:
当节点群互连时,会通过心跳包检查所连接节点是不是连接正常,这个心跳时间默认为60s,可以通过
net_kernel:set_net_ticktime(600).
来重设这个时间值,怎么测试?
每次我把其中一个节点kill掉后,与之想连的节点就会立即收到nodedown消息,根本无法测试这个ticktime是不是生效。
原因:
在Erl Doc里面关于节点互连时有一个关于节点间心跳检查的片段:
http://www.erlang.org/doc/man/kernel_app.html
Specifies the net_kernel tick time. TickTime is given in seconds. Once every TickTime/4 second, all connected nodes are ticked (if anything else has been written to a node) and if nothing has been received from another node within the last four (4) tick times that node is considered to be down. This ensures that nodes which are not responding, for reasons such as hardware errors, are considered to be down.
The time T, in which a node that is not responding is detected, is calculated as: MinT < T < MaxT where:
MinT = TickTime - TickTime / 4 MaxT = TickTime + TickTime / 4
TickTime is by default 60 (seconds). Thus, 45 < T < 75 seconds.
Note: All communicating nodes should have the same TickTime value specified.
Note: Normally, a terminating node is detected immediately.
从文档中可以看出,节点会在每TickTime/4 秒检查一下连接的节点是不是正常,如果连续4次没有收到(TickTime时间内)没有收到心跳消息,就会认为这个节点挂了,
可是如果节点是被终结掉(terminating node),其它节点会马上收到down通知.
为了测试这个心跳时间,必须要对Erlang节点互连的基本epmd和net_kernel的工作原理有所了解【Google真是万能!】。
Erlang节点使用TCP连接通讯,当开启一个新节点,它会随机选择一个端口(大约在52300左右),在EPMD(Erlang Port Mapper Daemo)注册,EPMD默认使用4369端口对外连接,