Hadoop 1.x的Task,ReduceTask,MapTask随想

本文深入解析了Hadoop的技术体系,包括HDFS的架构及其解耦特性,以及MapReduce的工作原理和内部机制。详细介绍了各类节点的功能及交互方式,强调了其在海量数据处理中的重要作用。

Hadoop的技术体系,最令人称赞的是细节。它的基本原理是非常容易理解的,细节是魔鬼。


hadoop的hdfs是文件系统存储,它有三类节点namenode, scondraynamenode, datanode,前两种在集群分别只有一个节点,而datanode在集群有很多个。hdfs的解耦做的非常好,以至于它可以单独运行,做一个海量数据的文件存储系统。它可以跟mapreduce分别运行。


对mapreduce任务来说,它有两类节点, jobtracker,tasktracker。前者每个集群之后一个,后者有许多个。顾名思义,tasktracker就是运行任务task。task有两种,maptask和reducertask。


一个mapreduce任务job,要做拆分,拆分成若干个inputsplit。每个inputsplit对应一个maptask。maptask执行完,将结果传给reducetask。然后reduecetask处理后将最终结果输出到hdfs存储。


MapTask和ReducerTask的基类是抽象类Task,它们在抽象的层次上近似,只是处理数据的流程不同。每个tasktracker节点可以同时运行这两种task。


这里有复杂的细节。tasktracker和jobtracker通过远程rpc的方式进行心跳服务。心跳服务调用会带上各种信息,有些是tasktracker报告自己的状态和任务执行情况,有些是jobtracker在应答里让tasktracker执行任务,不一而足。


每个job有jobid,拆分成若干个maptask和reducertask之后,又有taskid。每个maptask执行结束,将结果写入hdfs,又通过http的方式传递给reducertask。


于是:

1. 一切通讯由远程rpc调用实现。

2. hdfs是存储,可单独运行。

3. mapreduce是分布式计算,它使用hdfs。

4. task是节点计算的核心。

5. 大量的细节实现以保证可靠性和稳定性。

2025-06-21 13:57:16,366 INFO client.DefaultNoHARMFailoverProxyProvider: Connect ing to ResourceManager at Masterxd243234025/192.168.40.25:8032 2025-06-21 13:57:16,459 INFO client.DefaultNoHARMFailoverProxyProvider: Connect ing to ResourceManager at Masterxd243234025/192.168.40.25:8032 2025-06-21 13:57:16,698 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your ap plication with ToolRunner to remedy this. 2025-06-21 13:57:16,733 INFO mapreduce.JobResourceUploader: Disabling Erasure C oding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1750408986691_0005 2025-06-21 13:57:16,944 INFO mapred.FileInputFormat: Total input files to proce ss : 1 2025-06-21 13:57:17,002 INFO mapreduce.JobSubmitter: number of splits:2 2025-06-21 13:57:17,133 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1750408986691_0005 2025-06-21 13:57:17,133 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2025-06-21 13:57:17,890 INFO conf.Configuration: resource-types.xml not found 2025-06-21 13:57:17,890 INFO resource.ResourceUtils: Unable to find 'resource-t ypes.xml'. 2025-06-21 13:57:17,944 INFO impl.YarnClientImpl: Submitted application applica tion_1750408986691_0005 2025-06-21 13:57:17,964 INFO mapreduce.Job: The url to track the job: http://Ma sterXD243234025:8088/proxy/application_1750408986691_0005/ 2025-06-21 13:57:17,965 INFO mapreduce.Job: Running job: job_1750408986691_0005 2025-06-21 13:57:23,092 INFO mapreduce.Job: Job job_1750408986691_0005 running in uber mode : false 2025-06-21 13:57:23,093 INFO mapreduce.Job: map 0% reduce 0% 2025-06-21 13:57:28,292 INFO mapreduce.Job: Task Id : attempt_1750408986691_0005_m_000001_0, Status : FAILED Error: java.util.NoSuchElementException: iterate past last value at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1625) at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:1710) at com.xd.Reduce.reduce(Reduce.java:21) at com.xd.Reduce.reduce(Reduce.java:10) at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1839) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1665) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1506) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:473) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) 2025-06-21 13:57:28,346 INFO mapreduce.Job: Task Id : attempt_1750408986691_0005_m_000000_0, Status : FAILED Error: java.util.NoSuchElementException: iterate past last value at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1625) at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:1710) at com.xd.Reduce.reduce(Reduce.java:21) at com.xd.Reduce.reduce(Reduce.java:10) at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1839) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1665) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1506) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:473) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) 2025-06-21 13:57:31,396 INFO mapreduce.Job: Task Id : attempt_1750408986691_0005_m_000000_1, Status : FAILED Error: java.util.NoSuchElementException: iterate past last value at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1625) at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:1710) at com.xd.Reduce.reduce(Reduce.java:21) at com.xd.Reduce.reduce(Reduce.java:10) at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1839) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1665) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1506) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:473) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) 2025-06-21 13:57:31,399 INFO mapreduce.Job: Task Id : attempt_1750408986691_0005_m_000001_1, Status : FAILED Error: java.util.NoSuchElementException: iterate past last value at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1625) at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:1710) at com.xd.Reduce.reduce(Reduce.java:21) at com.xd.Reduce.reduce(Reduce.java:10) at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1839) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1665) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1506) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:473) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) 2025-06-21 13:57:34,474 INFO mapreduce.Job: Task Id : attempt_1750408986691_0005_m_000000_2, Status : FAILED Error: java.util.NoSuchElementException: iterate past last value at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1625) at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:1710) at com.xd.Reduce.reduce(Reduce.java:21) at com.xd.Reduce.reduce(Reduce.java:10) at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1839) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1665) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1506) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:473) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) 2025-06-21 13:57:34,477 INFO mapreduce.Job: Task Id : attempt_1750408986691_0005_m_000001_2, Status : FAILED Error: java.util.NoSuchElementException: iterate past last value at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1625) at org.apache.hadoop.mapred.Task$CombineValuesIterator.next(Task.java:1710) at com.xd.Reduce.reduce(Reduce.java:21) at com.xd.Reduce.reduce(Reduce.java:10) at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1839) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1665) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1506) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:473) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) 2025-06-21 13:57:38,534 INFO mapreduce.Job: map 100% reduce 100% 2025-06-21 13:57:38,553 INFO mapreduce.Job: Job job_1750408986691_0005 failed with state FAILED due to: Task failed task_1750408986691_0005_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0 2025-06-21 13:57:38,654 INFO mapreduce.Job: Counters: 11 Job Counters Failed map tasks=7 Killed map tasks=1 Killed reduce tasks=1 Launched map tasks=8 Other local map tasks=6 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=16733 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=16733 Total vcore-milliseconds taken by all map tasks=16733 Total megabyte-milliseconds taken by all map tasks=17134592 Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:876) at com.xd.WordCount.main(WordCount.java:24) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:323) at org.apache.hadoop.util.RunJar.main(RunJar.java:236) 什么错误怎么解决
最新发布
06-22
### Hadoop MapReduce中`java.util.NoSuchElementException: iterate past last value`错误的解决方案 在Hadoop MapReduce任务中,`java.util.NoSuchElementException: iterate past last value`错误通常是由于迭代器(Iterator)被多次调用`next()`方法导致的。当迭代器已经遍历完所有元素后,再次调用`next()`方法时,会抛出此异常[^4]。 以下是解决该问题的具体方法: #### 1. 确保正确使用迭代器 在MapReduce的`reduce()`方法中,通常通过`context`对象获取键值对的迭代器。例如,在`ReduceContextImpl.ValueIterator`中,`hasNext()``next()`方法用于遍历值列表。如果在同一循环中多次调用`next()`,会导致迭代器指针跳过某些元素[^2]。 正确的做法是将每次调用`iterator.next()`的结果存储到一个临时变量中,然后对该变量进行操作。例如: ```java while (iterator.hasNext()) { String value = iterator.next().toString(); set.add(value); list.add(value); } ``` 通过这种方式,可以确保每个值只被迭代一次,并且不会跳过任何元素。 #### 2. 避免重复创建迭代器 在某些情况下,开发人员可能在同一段代码中多次调用`context.nextKeyValue()`或`context.nextKey()`。这会导致新的迭代器被创建,从而破坏原有的遍历逻辑[^3]。 因此,应确保在`reduce()`方法中仅调用一次`context.nextKeyValue()`,并将结果存储在局部变量中以供后续使用。 #### 3. 检查输入数据格式 如果输入数据格式不正确,可能会导致迭代器在处理过程中遇到意外情况。例如,如果某个键对应的值列表为空,调用`iterator.next()`时会直接抛出异常。 在编写代码时,建议在遍历前检查迭代器是否包含有效值: ```java if (!iterator.hasNext()) { return; } ``` #### 4. 调试与日志记录 为了更好地定位问题,可以在关键位置添加日志记录,输出当前迭代器的状态值。例如: ```java while (iterator.hasNext()) { String value = iterator.next().toString(); System.out.println("Processing value: " + value); set.add(value); list.add(value); } ``` 通过日志信息,可以更清楚地了解迭代器的行为以及异常发生的原因。 --- ### 示例代码 以下是一个完整的`reduce()`方法实现示例,展示了如何正确使用迭代器避免`NoSuchElementException`: ```java public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { int sum = 0; for (Text value : values) { // 使用增强for循环替代手动迭代器 sum += Integer.parseInt(value.toString()); } context.write(key, new IntWritable(sum)); } ``` 在此示例中,使用了增强型`for`循环来遍历`values`,从而避免了手动管理迭代器带来的潜在问题。 --- ### 总结 `java.util.NoSuchElementException: iterate past last value`错误的根本原因是迭代器被不当使用。通过确保每次调用`next()`的结果被正确存储、避免重复创建迭代器以及检查输入数据格式,可以有效避免此类问题。 ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值