1,程序代码如下:
- package wc;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class W2 {
public static class TokenizerMapper extends Mapper {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer extends
Reducer {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
System.setProperty("hadoop.home.dir", "E:/hadoop/hadoop-2.3.0");
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount ");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(W2.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
2,运行方式:
在eclipse中W2.java代码区点击右键,点击里面的run on hadoop即可运行该程序。
3,运行报错(1):
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/base/Preconditions
at org.apache.hadoop.conf.Configuration$DeprecationDelta.(Configuration.java:314)
at org.apache.hadoop.conf.Configuration$DeprecationDelta.(Configuration.java:327)
at org.apache.hadoop.conf.Configuration.(Configuration.java:409)
at wc.WordCount.main(WordCount.java:82)
Caused by: java.lang.ClassNotFoundException: com.google.common.base.Preconditions
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 4 more
少了guava-r07.jar包。
4,运行报错(2):
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName
缺少hadoop-auth-2.2.0.jar包,这个包在. /eclipse/configuration/org.eclipse.osgi/bundles/230/1/.cp/lib/hadoop-auth-2.2.0.jar里面
5,运行报错(3):
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
缺少2个包:
/usr/local/eclipse/configuration/org.eclipse.osgi/bundles/230/1/.cp/lib/slf4j-api-1.7.5.jar
/usr/local/eclipse/configuration/org.eclipse.osgi/bundles/230/1/.cp/lib/slf4j-log4j12-1.7.5.jar
6,运行报错(4):
在Eclipse运行hadoop报错:
2014-12-11 20:12:01,750 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(996)) - fs.default.name is deprecated. Instead, use fs.defaultFS
SLF4J: This version of SLF4J requires log4j version 1.2.12 or later. See also http://www.slf4j.org/codes.html#log4j_version
2014-12-11 20:12:02,760 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2014-12-11 20:12:02,812 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(336)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
解决:
代码里加上 System.setProperty("hadoop.home.dir", "d:/hadoop");并查看Windows环境下Hadoop目录下的bin目录下有没有winutils.exe,没有就下一个拷贝过去。
7,运行报错(5):
报错:
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/protobuf/ServiceException
at org.apache.hadoop.ipc.ProtobufRpcEngine.(ProtobufRpcEngine.java:69)
at java.lang.Class.forName0(Native Method)
缺乏/usr/local/app/apache-tomcat-6.0.37_9090/webapps/solr/WEB-INF/lib/protobuf-java-2.4.0a.jar
Exception in thread "main" java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AppendRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
需要换成protobuf-java-2.5.0.jar包。
8,运行报错(6):
Caused by: java.lang.ClassNotFoundException: com.google.common.cache.CacheBuilder
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 12 more
少guava-11.0.2.jar包
9,运行报错(7):
Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=Administrator, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwx------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:187)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:150)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5433)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5415)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOwner(FSNamesystem.java:5371)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermissionInt(FSNamesystem.java:1462)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1443)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:536)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:368)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
10,运行报错(8):
报错如下:
2014-12-16 10:16:09,632 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(996)) - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-12-16 10:16:11,597 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Job start!
2014-12-16 10:16:28,819 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager at /192.168.52.128:8032
2014-12-16 10:16:29,714 WARN [main] security.UserGroupInformation (UserGroupInformation.java:doAs(1551)) - PriviledgedActionException as:Administrator (auth:SIMPLE) cause:java.io.IOException: The ownership on the staging directory /tmp/hadoop-yarn/staging/Administrator/.staging is not as expected. It is owned by hadoop. The directory must be owned by the submitter Administrator or by Administrator
Exception in thread "main" java.io.IOException: The ownership on the staging directory /tmp/hadoop-yarn/staging/Administrator/.staging is not as expected. It is owned by hadoop. The directory must be owned by the submitter Administrator or by Administrator
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:112)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:348)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at wc.WordCount.main(WordCount.java:147)
(permission denied: use r=administrator)解决方法:
接着选择"本地用户和组",展开"用户",找到系统管理员"Administrator",修改其为"hadoop",操作结果如下图:
配置系统环境变量,新建HADOOP_USER_NAME,后面的值是你的用户名。以hadoop为例。
项目根路径下新建hdfs-site.xml:
<?xml version="1.0"?>
<configuration>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
最后,把电脑进行"注销"或者"重启电脑",这样才能使管理员才能用这个名字。再次运行之后,显示正常,能连接到linux下的hadoop服务了,控制台信息如下显示:
2014-12-16 11:01:07,009 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(996)) - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-12-16 11:01:12,938 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Job start!
2014-12-16 11:01:39,646 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager at /192.168.52.128:8032
2014-12-16 11:01:49,297 INFO [main] mapreduce.JobSubmissionFiles (JobSubmissionFiles.java:getStagingDir(119)) - Permissions on staging directory /tmp/hadoop-yarn/staging/hadoop/.staging are incorrect: rwxrwxrwx. Fixing permissions to correct value rwx------
2014-12-16 11:01:56,366 WARN [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2014-12-16 11:02:14,657 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(287)) - Total input paths to process : 1
2014-12-16 11:02:15,781 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(396)) - number of splits:1
2014-12-16 11:02:16,057 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(996)) - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-12-16 11:02:16,711 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(479)) - Submitting tokens for job: job_1418698686855_0001
2014-12-16 11:02:20,493 INFO [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(166)) - Submitted application application_1418698686855_0001
2014-12-16 11:02:21,353 INFO [main] mapreduce.Job (Job.java:submit(1289)) - The url to track the job: http://name01:8088/proxy/application_1418698686855_0001/
2014-12-16 11:02:21,393 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1334)) - Running job: job_1418698686855_0001
2014-12-16 11:02:45,306 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1355)) - Job job_1418698686855_0001 running in uber mode : false
2014-12-16 11:02:45,392 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1362)) - map 0% reduce 0%
2014-12-16 11:02:45,543 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1375)) - Job job_1418698686855_0001 failed with state FAILED due to: Application application_1418698686855_0001 failed 2 times due to AM Container for appattempt_1418698686855_0001_000002 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 0: fg: no job control
org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 0: fg: no job control
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.
2014-12-16 11:02:45,955 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Counters: 0
error!
11,运行报错(9):
2014-12-16 15:31:45,980 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(996)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2014-12-16 15:31:45,986 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2014-12-16 15:31:46,213 WARN [main] security.UserGroupInformation (UserGroupInformation.java:doAs(1551)) - PriviledgedActionException as:hadoop (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://192.168.52.128:9000/data/output already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://192.168.52.128:9000/data/output already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458)
删除原来的/data/output目录
12,运行报错(10):
Could not locate executable null\bin\winutils.exe in the Hadoop binaries
老掉牙的问题了,系统变量未设置HADOOP_HOME,系统变量设置HADOOP_HOME,或者直接加一句代码指定路径地址:
System.setProperty("hadoop.home.dir", "E:/hadoop/hadoop-2.3.0");
13,运行报错(11):
2014-12-16 14:28:58,589 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2014-12-16 14:29:08,664 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(996)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2014-12-16 14:29:08,665 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2014-12-16 14:29:10,026 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(287)) - Total input paths to process : 1
2014-12-16 14:29:11,164 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(396)) - number of splits:1
2014-12-16 14:29:11,761 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(479)) - Submitting tokens for job: job_local1985238633_0001
2014-12-16 14:29:11,810 WARN [main] conf.Configuration (Configuration.java:loadProperty(2345)) - file:/tmp/hadoop-hadoop/mapred/staging/hadoop1985238633/.staging/job_local1985238633_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-12-16 14:29:11,811 WARN [main] conf.Configuration (Configuration.java:loadProperty(2345)) - file:/tmp/hadoop-hadoop/mapred/staging/hadoop1985238633/.staging/job_local1985238633_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-12-16 14:29:11,916 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(441)) - Cleaning up the staging area file:/tmp/hadoop-hadoop/mapred/staging/hadoop1985238633/.staging/job_local1985238633_0001
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:560)
at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977)
at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:177)
at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:164)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:98)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:285)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:344)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)
at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:131)
at org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163)
at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at wc.W2.main(W2.java:111)
缺乏hadoop.dll,下载hadoop.dll放到hadoop/bin目录下即可,但是之后运行依然报错,还需要手动设置下hadoop在windows下的运行路径,于是在Eclipse运行环境中,在运行的WordCount.java中,右键点击在下拉菜单栏里面选择Run Configurations,然后加上path的设置,Run顺利通过。参数如下图所示:
之后调试通过,运行结果如下:
2014-12-16 15:34:01,303 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(996)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2014-12-16 15:34:01,309 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2014-12-16 15:34:02,047 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(287)) - Total input paths to process : 1
2014-12-16 15:34:02,120 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(396)) - number of splits:1
2014-12-16 15:34:02,323 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(479)) - Submitting tokens for job: job_local1764589720_0001
2014-12-16 15:34:02,367 WARN [main] conf.Configuration (Configuration.java:loadProperty(2345)) - file:/tmp/hadoop-hadoop/mapred/staging/hadoop1764589720/.staging/job_local1764589720_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-12-16 15:34:02,368 WARN [main] conf.Configuration (Configuration.java:loadProperty(2345)) - file:/tmp/hadoop-hadoop/mapred/staging/hadoop1764589720/.staging/job_local1764589720_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-12-16 15:34:02,682 WARN [main] conf.Configuration (Configuration.java:loadProperty(2345)) - file:/tmp/hadoop-hadoop/mapred/local/localRunner/hadoop/job_local1764589720_0001/job_local1764589720_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-12-16 15:34:02,682 WARN [main] conf.Configuration (Configuration.java:loadProperty(2345)) - file:/tmp/hadoop-hadoop/mapred/local/localRunner/hadoop/job_local1764589720_0001/job_local1764589720_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-12-16 15:34:02,703 INFO [main] mapreduce.Job (Job.java:submit(1289)) - The url to track the job: http://localhost:8080/
2014-12-16 15:34:02,704 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1334)) - Running job: job_local1764589720_0001
2014-12-16 15:34:02,707 INFO [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(471)) - OutputCommitter set in config null
2014-12-16 15:34:02,719 INFO [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(489)) - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2014-12-16 15:34:02,853 INFO [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(448)) - Waiting for map tasks
2014-12-16 15:34:02,857 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:run(224)) - Starting task: attempt_local1764589720_0001_m_000000_0
2014-12-16 15:34:02,919 INFO [LocalJobRunner Map Task Executor #0] util.ProcfsBasedProcessTree (ProcfsBasedProcessTree.java:isAvailable(129)) - ProcfsBasedProcessTree currently is supported only on Linux.
2014-12-16 15:34:03,281 INFO [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:initialize(581)) - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@2e1022ec
2014-12-16 15:34:03,287 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:runNewMapper(733)) - Processing split: hdfs://192.168.52.128:9000/data/input/README.txt:0+1366
2014-12-16 15:34:03,304 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:createSortingCollector(388)) - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2014-12-16 15:34:03,340 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:setEquator(1181)) - (EQUATOR) 0 kvi 26214396(104857584)
2014-12-16 15:34:03,341 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(975)) - mapreduce.task.io.sort.mb: 100
2014-12-16 15:34:03,341 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(976)) - soft limit at 83886080
2014-12-16 15:34:03,341 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(977)) - bufstart = 0; bufvoid = 104857600
2014-12-16 15:34:03,341 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:init(978)) - kvstart = 26214396; length = 6553600
2014-12-16 15:34:03,708 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1355)) - Job job_local1764589720_0001 running in uber mode : false
2014-12-16 15:34:03,710 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1362)) - map 0% reduce 0%
2014-12-16 15:34:04,121 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) -
2014-12-16 15:34:04,128 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1435)) - Starting flush of map output
2014-12-16 15:34:04,128 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1453)) - Spilling map output
2014-12-16 15:34:04,128 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1454)) - bufstart = 0; bufend = 2055; bufvoid = 104857600
2014-12-16 15:34:04,128 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:flush(1456)) - kvstart = 26214396(104857584); kvend = 26213684(104854736); length = 713/6553600
2014-12-16 15:34:04,179 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask (MapTask.java:sortAndSpill(1639)) - Finished spill 0
2014-12-16 15:34:04,194 INFO [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:done(995)) - Task:attempt_local1764589720_0001_m_000000_0 is done. And is in the process of committing
2014-12-16 15:34:04,207 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - map
2014-12-16 15:34:04,208 INFO [LocalJobRunner Map Task Executor #0] mapred.Task (Task.java:sendDone(1115)) - Task 'attempt_local1764589720_0001_m_000000_0' done.
2014-12-16 15:34:04,208 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner (LocalJobRunner.java:run(249)) - Finishing task: attempt_local1764589720_0001_m_000000_0
2014-12-16 15:34:04,208 INFO [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - map task executor complete.
2014-12-16 15:34:04,211 INFO [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(448)) - Waiting for reduce tasks
2014-12-16 15:34:04,211 INFO [pool-6-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:run(302)) - Starting task: attempt_local1764589720_0001_r_000000_0
2014-12-16 15:34:04,221 INFO [pool-6-thread-1] util.ProcfsBasedProcessTree (ProcfsBasedProcessTree.java:isAvailable(129)) - ProcfsBasedProcessTree currently is supported only on Linux.
2014-12-16 15:34:04,478 INFO [pool-6-thread-1] mapred.Task (Task.java:initialize(581)) - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@36154615
2014-12-16 15:34:04,483 INFO [pool-6-thread-1] mapred.ReduceTask (ReduceTask.java:run(362)) - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@e2b02a3
2014-12-16 15:34:04,500 INFO [pool-6-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:(193)) - MergerManager: memoryLimit=949983616, maxSingleShuffleLimit=237495904, mergeThreshold=626989184, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2014-12-16 15:34:04,503 INFO [EventFetcher for fetching Map Completion Events] reduce.EventFetcher (EventFetcher.java:run(61)) - attempt_local1764589720_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2014-12-16 15:34:04,543 INFO [localfetcher#1] reduce.LocalFetcher (LocalFetcher.java:copyMapOutput(140)) - localfetcher#1 about to shuffle output of map attempt_local1764589720_0001_m_000000_0 decomp: 1832 len: 1836 to MEMORY
2014-12-16 15:34:04,548 INFO [localfetcher#1] reduce.InMemoryMapOutput (InMemoryMapOutput.java:shuffle(100)) - Read 1832 bytes from map-output for attempt_local1764589720_0001_m_000000_0
2014-12-16 15:34:04,553 INFO [localfetcher#1] reduce.MergeManagerImpl (MergeManagerImpl.java:closeInMemoryFile(307)) - closeInMemoryFile -> map-output of size: 1832, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->1832
2014-12-16 15:34:04,564 INFO [EventFetcher for fetching Map Completion Events] reduce.EventFetcher (EventFetcher.java:run(76)) - EventFetcher is interrupted.. Returning
2014-12-16 15:34:04,566 INFO [pool-6-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2014-12-16 15:34:04,566 INFO [pool-6-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(667)) - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2014-12-16 15:34:04,585 INFO [pool-6-thread-1] mapred.Merger (Merger.java:merge(589)) - Merging 1 sorted segments
2014-12-16 15:34:04,585 INFO [pool-6-thread-1] mapred.Merger (Merger.java:merge(688)) - Down to the last merge-pass, with 1 segments left of total size: 1823 bytes
2014-12-16 15:34:04,605 INFO [pool-6-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(742)) - Merged 1 segments, 1832 bytes to disk to satisfy reduce memory limit
2014-12-16 15:34:04,605 INFO [pool-6-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(772)) - Merging 1 files, 1836 bytes from disk
2014-12-16 15:34:04,606 INFO [pool-6-thread-1] reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(787)) - Merging 0 segments, 0 bytes from memory into reduce
2014-12-16 15:34:04,607 INFO [pool-6-thread-1] mapred.Merger (Merger.java:merge(589)) - Merging 1 sorted segments
2014-12-16 15:34:04,608 INFO [pool-6-thread-1] mapred.Merger (Merger.java:merge(688)) - Down to the last merge-pass, with 1 segments left of total size: 1823 bytes
2014-12-16 15:34:04,608 INFO [pool-6-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2014-12-16 15:34:04,643 INFO [pool-6-thread-1] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(996)) - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2014-12-16 15:34:04,714 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1362)) - map 100% reduce 0%
2014-12-16 15:34:04,842 INFO [pool-6-thread-1] mapred.Task (Task.java:done(995)) - Task:attempt_local1764589720_0001_r_000000_0 is done. And is in the process of committing
2014-12-16 15:34:04,850 INFO [pool-6-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2014-12-16 15:34:04,850 INFO [pool-6-thread-1] mapred.Task (Task.java:commit(1156)) - Task attempt_local1764589720_0001_r_000000_0 is allowed to commit now
2014-12-16 15:34:04,881 INFO [pool-6-thread-1] output.FileOutputCommitter (FileOutputCommitter.java:commitTask(439)) - Saved output of task 'attempt_local1764589720_0001_r_000000_0' to hdfs://192.168.52.128:9000/data/output/_temporary/0/task_local1764589720_0001_r_000000
2014-12-16 15:34:04,884 INFO [pool-6-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - reduce > reduce
2014-12-16 15:34:04,884 INFO [pool-6-thread-1] mapred.Task (Task.java:sendDone(1115)) - Task 'attempt_local1764589720_0001_r_000000_0' done.
2014-12-16 15:34:04,885 INFO [pool-6-thread-1] mapred.LocalJobRunner (LocalJobRunner.java:run(325)) - Finishing task: attempt_local1764589720_0001_r_000000_0
2014-12-16 15:34:04,885 INFO [Thread-4] mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - reduce task executor complete.
2014-12-16 15:34:05,714 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1362)) - map 100% reduce 100%
2014-12-16 15:34:05,714 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1373)) - Job job_local1764589720_0001 completed successfully
2014-12-16 15:34:05,733 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Counters: 38
File System Counters
FILE: Number of bytes read=34542
FILE: Number of bytes written=470650
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2732
HDFS: Number of bytes written=1306
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=31
Map output records=179
Map output bytes=2055
Map output materialized bytes=1836
Input split bytes=113
Combine input records=179
Combine output records=131
Reduce input groups=131
Reduce shuffle bytes=1836
Reduce input records=131
Reduce output records=131
Spilled Records=262
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=13
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=440664064
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1366
File Output Format Counters
Bytes Written=1306
描述:在Windows下使用Eclipse进行Hadoop的程序编写,然后Run on hadoop 后,出现如下错误:
11/10/28 16:05:53 INFO mapred.JobClient: Running job: job_201110281103_0003
11/10/28 16:05:54 INFO mapred.JobClient: map 0% reduce 0%
11/10/28 16:06:05 INFO mapred.JobClient: Task Id : attempt_201110281103_0003_m_000002_0, Status : FAILED
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=DrWho, access=WRITE, inode="hadoop":hadoop:supergroup:rwxr-xr-x
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
解决方法:
到服务器上修改hadoop的配置文件:conf/hdfs-core.xml, 找到 dfs.permissions 的配置项 , 将value值改为 false
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>
If "true", enable permission checking in HDFS.
If "false", permission checking is turned off,
but all other behavior is unchanged.
Switching from one parameter value to the other does not change the mode,
owner or group of files or directories.
</description>
</property>
修改完貌似要重启下hadoop的进程才能生效
开发环境:win xp sp3 , Eclipse 3.3 , hadoop-0.20.2
hadoop服务器部署环境: Ubuntu 10.10 , hadoop-0.20.2
小结: 接触Hadoop没多久,不知道这样修改对集群的安全性有啥影响。
//补充:
因为Eclipse使用hadoop插件提交作业时,会默认以 DrWho 身份去将作业写入hdfs文件系统中,对应的也就是 HDFS 上的/user/xxx , 我的为/user/hadoop , 由于 DrWho 用户对hadoop目录并没有写入权限,所以导致异常的发生。提供的解决方法为:放开 hadoop 目录的权限 , 命令如下 :$ hadoop fs -chmod 777 /user/hadoop
在运行某个Spark Application的时候,需要向Hdfs写入文件,控制台会输出以下错误信息:
- 1
从中很容易看出是因为当前执行Spark Application的用户没有Hdfs“/user”目录的写入权限。这个问题无论是在Windows下还是Linux下提交Spark Application都经常会遇到。常见的解决方法有以下几种。
- 关闭Hdfs的安全检查(permission checking):将hdfs-xml中 dfs.permissions 属性的值设置为 false 。但是这种方法的弊端是会导致Hdfs系统中所有的安全特性都被禁用,使Hdfs的安全性降低。
-
Hdfs的用户权限是与本地文件系统的用户权限绑定在一起的,根据错误中的
Permission denied: user=Administrator, access=WRITE, inode="/user":root:supergroup:drwxr-xr-x
我们可以发现,Hdfs中的/user目录是属于supergroup组里的root用户的。因此我们可以想到用两种方法解决这个问题:
修改执行操作的用户为该目录所属的用户。但是这种方法的弊端在于,与Hdfs进行交互的用户可能有很多,这会导致经常修改执行类似操作的用户。因此,个人推荐使用第三种方法:
-
如果是Linux环境,将执行操作的用户添加到supergroup用户组。
- 1
- 2
如果是Windows用户,在hdfs namenode所在机器添加新用户,用户名为执行操作的Windows用户名,然后将此用户添加到supergroup用户组。
- 1
- 2
- 3
这样,以后每次执行类似操作可以将文件写入Hdfs中属于Administrator用户的目录内,而不会出现上面的Exception。
-------------------------------------------------------------------------------------------------------------------
描述:在window下使用Eclipse进行hadoop的程序编写,然后Run on hadoop 后,出现如下错误:
11/10/28 16:05:53 INFO mapred.JobClient: Running job: job_201110281103_0003
11/10/28 16:05:54 INFO mapred.JobClient: map 0% reduce 0%
11/10/28 16:06:05 INFO mapred.JobClient: Task Id : attempt_201110281103_0003_m_000002_0, Status : FAILED
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=DrWho, access=WRITE, inode="hadoop":hadoop:supergroup:rwxr-xr-x
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
解决方法:
到服务器上修改hadoop的配置文件:conf/hdfs-core.xml , 找到 dfs.permissions 的配置项 , 将value值改为 false
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>
If "true", enable permission checking in HDFS.
If "false", permission checking is turned off,
but all other behavior is unchanged.
Switching from one parameter value to the other does not change the mode,
owner or group of files or directories.
</description>
</property>
修改完貌似要重启下hadoop的进程才能生效
开发环境:win xp sp3 , Eclipse 3.3 , hadoop-0.20.2
hadoop服务器部署环境: ubuntu 10.10 , hadoop-0.20.2
小结: 接触Hadoop没多久,不知道这样修改对集群的安全性有啥影响。
1、在系统的环境变量或java JVM变量里面添加HADOOP_USER_NAME,这个值具体等于多少看自己的情况,以后会运行HADOOP上的Linux的用户名。(修改完重启eclipse,不然可能不生效)
2、将当前系统的帐号修改为hadoop
3、使用HDFS的命令行接口修改相应目录的权限,hadoop fs -chmod 777 /user,后面的/user是要上传文件的路径,不同的情况可能不一样,比如要上传的文件路径为hdfs://namenode/user/xxx.doc,则这样的修改可以,如果要上传的文件路径为hdfs://namenode/java/xxx.doc,则要修改的为hadoop fs -chmod 777 /java或者hadoop fs -chmod 777 /,java的那个需要先在HDFS里面建立Java目录,后面的这个是为根目录调整权限。