问题描述
在hadoop3.2.1的版本中,配置cgroup对yarn的cpu资源进行隔离之后,发现,使用root用户在yarn上提交任务时,无法提交成功,并会报错:
Runing as root is not allowed!
最后将这些错误在源码中搜索发现以下内容:
/**
* Is the user a real user account?
* Checks:
* 1. Not root
* 2. UID is above the minimum configured.
* 3. Not in banned user list
* Returns NULL on failure
*/
struct passwd* check_user(const char *user) {
if (strcmp(user, "root") == 0) {
fprintf(LOGFILE, "Running as root is not allowed\n");
fflush(LOGFILE);
return NULL;
}
原来是在源码里写明了在开启了LCE(LinuxContainerExecutor)后。LCE不能使用root用户提交任务,会检查user信息。
问题本质
user信息的检查会在org.apache.hadoop.mapreduce.job的init阶段执行。
Container启动的过程大致如下:
资源本地化——启动container——运行container——资源回收
问题解决
在\hadoop-3.2.1-src\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\src\main\native\container-executor\impl\container-executor.c 中,找到上述的代码片段,并注释如下:
/**
* Is the user a real user account?
* Checks:
* 1. Not root
* 2. UID is above the minimum configured.
* 3. Not in banned user list
* Returns NULL on failure
*/
struct passwd* check_user(const char *user) {
// if (strcmp(user, "root") == 0) {
// fprintf(LOGFILE, "Running as root is not allowed\n");
// fflush(LOGFILE);
// return NULL;
// }
char *min_uid_str = get_value(MIN_USERID_KEY);
int min_uid = DEFAULT_MIN_USERID;
if (min_uid_str != NULL) {
char *end_ptr = NULL;
。。。。。。
。。。。
然后将代码重新编译即可。