airflow remote worker log hostname 问题
当你的 worker 节点不是跟 webserver 部署在同一台机器的时候,有时从 webserver 查看该 worker 节点日志,出现如下错误:
*** Log file isn't local.
*** Fetching here: http://kaimanas.serveriai.lt:8793/log/.../1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='kaimanas.serveriai.lt', port=8793): Max retries exceeded with url: /log/.../1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f64da2fab38>: Failed to establish a new connection: [Errno 111] Connection refused',))
kaimanas.serveriai.lt (或是其他)不是该 worker 节点的 hostname。
翻看 airflow 源码 net.py:
def get_hostname():
"""
Fetch the hostname using the callable from the config or using
`socket.getfqdn` as a fallback.
"""
# First we attempt to fetch the callable path from the config.
try:
callable_path = conf.get('core', 'hostname_callable')
except AirflowConfigException:
callable_path = None
# Then we handle the case when the config is missing or empty. This is the
# default behavior.
if not callable_path:
return socket.getfqdn()
# Since we have a callable path, we try to import and run it next.
module_path, attr_name = callable_path.split(':')
module = importlib.import_module(module_path)
callable = getattr(module, attr_name)
return callable()
会调用 socket.getfqdn(),其作用如下:
def getfqdn(name):
"""
.返回一个name对应的完全合格的域名。如果name被忽略,将会被解释为本地主机。为了找到合格的域名,将会检查gethostbyaddr()返回的主机名称,以及随之而来的别名。
如果可用,第一个名称将会被选中。当没有任何一个合格的域名可用时,将会把gethostname()的返回值作为返回值
"""
如此,解决办法就是配置 worker 节点的 /etc/hosts 的 hostname 映射,把 worker 节点的 ip 映射为本机的 hostname:
10.xxx.xxx.xxx hostname
这时候,webserver 会以该 hostname 来请求 worker 节点的日志:
http://hostname.lt:8793/log/...