最近开始使用hadoop,在hadoop所在的服务器上使用帐号hadoop来访问服务,一切正常。
现在想在windows上写一个程序往hdfs上写数据,程序如下:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
/**
* 从本地读取数据放到远程hdfs中
**/
public class PutMerge {
public static void main(String[] args) throws IOException {
if (args.length < 2) {
args = new String[2];
args[0] = "C:/hadoop/test";
args[1] = "merge";
}
System.err.println(System.getenv());
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://master:9000");
// conf.set("hadoop.job.ugi", "hadoop,hadoop"); // 已经没有效果
System.err.println(conf);
FileSystem hdfs = FileSystem.get(conf);
FileSystem local = FileSystem.getLocal(conf);
Path inputDir = new Path(args[0]);
Path hdfsFile = new Path(args[1]);
try {
FileStatus[] inputFiles = local.listStatus(inputDir);
FSDataOutputStream out = hdfs.create(hdfsFile);
for (int i=0; i<inputFiles.length; i++) {
if (!inputFiles[i].isDir()) {
System.out.println(inputFiles[i].getPath().getName());
FSDataInputStream in = local.open(inputFiles[i].getPath());
byte buffer[] = new byte[256];
int bytesRead = 0;
while( (bytesRead = in.read(buffer)) > 0) {
out.write(buffer, 0, bytesRead);
}
in.close();
}
}
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
程序出现以下错误:
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=Administrator, access=WRITE, inode="hadoop":hadoop:supergroup:rwxr-xr-x
网上有文章说可以通过以下代码可以解决,但是没有效果。
conf.set("hadoop.job.ugi", "hadoop,hadoop");
于是下载代码进行研究,发现UserGroupInformation中有以下代码:
public boolean commit() throws LoginException {
if (LOG.isDebugEnabled()) {
LOG.debug("hadoop login commit");
}
// if we already have a user, we are done.
if (!subject.getPrincipals(User.class).isEmpty()) {
if (LOG.isDebugEnabled()) {
LOG.debug("using existing subject:"+subject.getPrincipals());
}
return true;
}
Principal user = null;
// if we are using kerberos, try it out
if (useKerberos) {
user = getCanonicalUser(KerberosPrincipal.class);
if (LOG.isDebugEnabled()) {
LOG.debug("using kerberos user:"+user);
}
}
//If we don't have a kerberos user and security is disabled, check
//if user is specified in the environment or properties
if (!isSecurityEnabled() && (user == null)) {
//就是这段代码来从环境变量中取值
String envUser = System.getenv(HADOOP_USER_NAME);
if (envUser == null) {
envUser = System.getProperty(HADOOP_USER_NAME);
}
user = envUser == null ? null : new User(envUser);
}
// use the OS user
if (user == null) {
user = getCanonicalUser(OS_PRINCIPAL_CLASS);
if (LOG.isDebugEnabled()) {
LOG.debug("using local user:"+user);
}
}
// if we found the user, add our principal
if (user != null) {
subject.getPrincipals().add(new User(user.getName()));
return true;
}
LOG.error("Can't find user in " + subject);
throw new LoginException("Can't find user name");
}
从中我们可以发现,在hadoop1.0.3中,并不从属性hadoop.job.ugi中获取登录用户,而是可以使用环境变量HADOOP_USER_NAME来获取使用哪个用户身份访问hadoop服务,于是将该环境变量设置为hadoop,程序正常运行。
以下基于hadoop1.0.3,不知道其它版本是不是也一样,其它同学可以自行验证之。
这是成为架构师以后的第一篇博文,特此说明,以作纪念。