遇到的问题
在安装大数据平台时(Ambari2.5.2+HDP2.6.4)过程中,出现了一个莫明其妙的问题,
Execution of ‘useradd –m –u 1002 –G Hadoop –g Hadoop zookeeper’ return 4. useradd:UID 1002 is not unique.
这个问题在之前的部署过程中都没有遇到过,查看错误信息,发现是创建hdfs、zookeeper一系列用户时,遇到了UID的冲突。
错误信息
stderr:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY/scripts/hook.py", line 35, in <module>
BeforeAnyHook().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY/scripts/hook.py", line 29, in hook
setup_users()
File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY/scripts/shared_initialization.py", line 51, in setup_users
fetch_nonlocal_groups = params.fetch_nonlocal_groups,
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/accounts.py", line 84, in action_create
shell.checked_call(command, sudo=True)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'useradd -m -u 1002 -G hadoop -g hadoop zookeeper' returned 4. useradd: UID 1002 is not unique
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-81.json', '/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-81.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
stdout:
2019-05-31 17:41:26,817 - Stack Feature Version Info: Cluster Stack=2.6, Cluster Current Version=None, Command Stack=None, Command Version=None -> 2.6
2019-05-31 17:41:26,824 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
User Group mapping (user_group) is missing in the hostLevelParams
2019-05-31 17:41:26,825 - Group['hadoop'] {}
2019-05-31 17:41:26,826 - Group['users'] {}
2019-05-31 17:41:26,826 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2019-05-31 17:41:26,828 - call['/var/lib/ambari-agent/tmp/changeUid.sh httpfs'] {}
2019-05-31 17:41:26,837 - call returned (0, '1001')
2019-05-31 17:41:26,837 - User['httpfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': 1001}
2019-05-31 17:41:26,843 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2019-05-31 17:41:26,844 - call['/var/lib/ambari-agent/tmp/changeUid.sh zookeeper'] {}
2019-05-31 17:41:26,854 - call returned (0, '1002')
2019-05-31 17:41:26,855 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop'], 'uid': 1002}
2019-05-31 17:41:26,856 - Adding user User['zookeeper']
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-81.json', '/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-81.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
Command failed after 1 tries
解决办法
首先确认一下UID1002是否被占用。
id 1002
发现,确实1002的UID已经被其它用户(这里是test用户)创建了,那此有两种解决方案,一种是把test用户的UID进行修改(或者删除),另一种是修改ambari的脚本。
先尝试修改UID。
usermod –u 1102 test
一般修改后可以解决问题,但是我这里没有找到用户信息,这是由于客户对安全的需求,一些用户信息做了特别的处理,所以在/etc/passwd中看不到这些用户和组信息。
没有办法采用修改ambari脚本的方法。
根据错误信息,先定位到var/lib/ambari-agent/tmp/changeUid.sh文件
vim /var/lib/ambari-agent/tmp/changeUid.sh
修改1001和2000,修改为没有被占用的UID号段,并且其它agent也做了相同的修改,重启ambari-agent restart后,再次查看文件,发现修改的地方被恢复了。于是想到这个文件应该是从ambari-server上提取的信息,于是再次定位文件。
定位到/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY/scripts/shared_initialization.py,查看脚本,发现有提到changeToSecureUid.sh这个文件,查找该文件,发现文件位于/var/lib/ambari-server/resources/stacks/HDP/2.0.6/hooks/before-ANY/files/changeToSecureUid.sh。
vim /var/lib/ambari-server/resources/stacks/HDP/2.0.6/hooks/before-ANY/files/changeToSecureUid.sh
发现该文件和agent中的内容一致,修改该文件,将1001和2000修改为空闲UID段,重启ambari-server和ambari-agent,再次安装组件,发现问题已经解决了。