来源:http://blog.csdn.net/wf1982/article/details/6720043
Clouder CDH3B3开始后hadoop.job.ugi不再生效!
困扰了我好几天的,终于找到了原因。以前公司用的原版hadoop-0.20.2,使用java设置 hadoop.job.ugi为正确的hadoop用户和组即可正常访问hdfs并可创建删除等。
更新到CDH3B4后,再这样搞不成,找了很多资料,无有原因。终于找到了 请看:
The hadoop.job.ugi configuration no longer has any effect. Instead, please use the UserGroupInformation.doAs API to impersonate other users on a non-secured cluster. (As of CDH3b3)
hadoop.job.ugi配置不再生效。取而代之的,请使用UserGroupInformation.doAs 方法 来使用其他用户操作,这时集群不认为是安全的。
与之前不兼容的更改:
- The TaskTracker configuration parameter mapreduce.tasktracker.local.cache.numberdirectories has been renamed to mapreduce.tasktracker.cache.local.numberdirectories. (As of CDH3u0)
- The Job-level configuration parameters mapred.max.maps.per.node, mapred.max.reduces.per.node,mapred.running.map.limit, and mapred.running.reduce.limit configurations have been removed. (As of CDH3b4)
- CDH3 no longer contains packages for Debian Lenny, Ubuntu Hardy, Jaunty, or Karmic. Checkout these upgrade instructions if you are using an Ubuntu release past its end of life. If you are using a release for which Cloudera's Debian or RPM packages are not available, you can always use the tarballs from the CDH download page. (As of CDH3b4)
- The hadoop.job.ugi configuration no longer has any effect. Instead, please use theUserGroupInformation.doAs API to impersonate other users on a non-secured cluster. (As of CDH3b3)
- The UnixUserGroupInformation class has been removed. Please see the new methods in theUserGroupInformation class. (As of CDH3b3)
- The resolution of groups for a user is now performed on the server side. For a user's group membership to take effect, it must be visible on the NameNode and JobTracker machines. (As of CDH3b3)
- The mapred.tasktracker.procfsbasedprocesstree.sleeptime-before-sigkill configuration has been renamed to mapred.tasktracker.tasks.sleeptime-before-sigkill. (As of CDH3b3)
- The HDFS and MapReduce daemons no longer run as a single shared hadoop user. Instead, the HDFS daemons run as hdfs and the MapReduce daemons run as mapred. See Changes in User Accounts and Groups in CDH3. (As of CDH3b3)
- Due to a change in the internal compression APIs, CDH3 is incompatible with versions of the hadoop-lzo open source project prior to 0.4.9. (As of CDH3b3)
- CDH3 changes the wire format for Hadoop's RPC mechanism. Thus, you must upgrade any existing client software at the same time as the cluster is upgraded. (All versions)
- Zero values for the dfs.socket.timeout and dfs.datanode.socket.write.timeout configuration parameters are now respected. Previously zero values for these parameters resulted in a 5 second timeout. (As of CDH3u1)
- When Hadoop's Kerberos integration is enabled, it is now required that either kinit be on the path for user accounts running the Hadoop client, or that the hadoop.kerberos.kinit.command configuration option be manually set to the absolute path to kinit. (As of CDH3u1)
Hive
- The upgrade of Hive from CDH2 to CDH3 requires several manual steps. Please be sure to follow the upgrade guide closely. See Upgrading Hive and Hue in CDH3.
- ......
- UserGroupInformation ugi =
- UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser());
- ugi.doAs(new PrivilegedExceptionAction<Void>() {
- public Void run() throws Exception {
- //Submit a job
- JobClient jc = new JobClient(conf);
- jc.submitJob(conf);
- //OR access hdfs
- FileSystem fs = FileSystem.get(conf);
- fs.mkdir(someFilePath);
- }
- }
需要在 namenode and jobtracker 上配置如下:
- <property>
- <name>hadoop.proxyuser.oozie.groups</name>
- <value>group1,group2</value>
- <description>Allow the superuser oozie to impersonate any members of the group group1 and group2</description>
- </property>
- <property>
- <name>hadoop.proxyuser.oozie.hosts</name>
- <value>host1,host2</value>
- <description>The superuser can connect only from host1 and host2 to impersonate a user</description>
- </property>
Caveats
The superuser must have kerberos credentials to be able to impersonate another user. It cannot use delegation tokens for this feature. It would be wrong if superuser adds its own delegation token to the proxy user ugi, as it will allow the proxy user to connect to the service with the privileges of the superuser.
However, if the superuser does want to give a delegation token to joe, it must first impersonate joe and get a delegation token for joe, in the same way as the code example above, and add it to the ugi of joe. In this way the delegation token will have the owner as joe.