本地运行模式
1.mapreduce程序是被提交给LocalJobRunner在本地以单进程的形式运行
2.而处理的数据及输出结果可以在本地文件系统,也可以在hdfs上
3.怎样实现本地运行?程序不要带集群配置文件
本质是程序的conf中是否有mapreduce.framework.name=local以及yarn.resourcemanager.hostname参数
4.本地模式非常便于进行业务逻辑的debug,只要在eclipse/idea中打断点即可
5.案例(wordcount)
/**
* 这个类是MR程序运行的主类,本类中组装了一些程序运行时候所需的信息
* 比如:使用的是哪个Mapper类,哪个Reducer类,输入数据在哪儿,输出数据在什么地方
* <p>
* Created by caimh on 2019/9/10.
*/
public class WordCountDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
//通过Job对象来封装本次的mr的相关信息
final Configuration conf = new Configuration();
//指定mapreduce本地运行,也可以不指定,因为本地mapreduce-default.xml默认是local
conf.set("mapreduce.framwork.name","local");
final Job job = Job.getInstance();
//指定本次mr job jar包运行主类
job.setJarByClass(WordCountDriver.class);
//指定本次mr 所用的mapper reducer类分别是什么
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
//指定本次mr mapper阶段的输出 k v 类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
//指定本次mr 最终输出的 k v类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//指定本次mr 数据输入路径及最终结果输出路径
FileInputFormat.setInputPaths(job, new Path("E:/hdfsClient/cmh.txt"));
FileOutputFormat.setOutputPath(job, new Path("E:/hdfsClient/wordcount"));
//提交job
final boolean bool = job.waitForCompletion(true);
//退出程序
System.exit(bool ? 0 : 1);
}
}
6.程序执行日志
2019-09-10 15:36:06,719 INFO [org.apache.hadoop.conf.Configuration.deprecation] - session.id is deprecated. Instead, use dfs.metrics.session-id
2019-09-10 15:36:06,721 INFO [org.apache.hadoop.metrics.jvm.JvmMetrics] - Initializing JVM Metrics with processName=JobTracker, sessionId=
2019-09-10 15:36:07,611 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2019-09-10 15:36:07,756 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2019-09-10 15:36:07,770 INFO [org.apache.hadoop.mapreduce.lib.input.FileInputFormat] - Total input paths to process : 1
2019-09-10 15:36:08,011 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - number of splits:1
2019-09-10 15:36:08,384 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - Submitting tokens for job: job_local725653020_0001
2019-09-10 15:36:08,700 INFO [org.apache.hadoop.mapreduce.Job] - The url to track the job: http://localhost:8080/
2019-09-10 15:36:08,701 INFO [org.apache.hadoop.mapreduce.Job] - Running job: job_local725653020_0001
2019-09-10 15:36:08,704 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter set in config null
2019-09-10 15:36:08,715 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2019-09-10 15:36:08,717 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2019-09-10 15:36:08,826 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Waiting for map tasks
2019-09-10 15:36:08,827 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Starting task: attempt_local725653020_0001_m_000000_0
2019-09-10 15:36:08,950 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2019-09-10 15:36:08,971 INFO [org.apache.hadoop.yarn.util.ProcfsBasedProcessTree] - ProcfsBasedProcessTree currently is supported only on Linux.
2019-09-10 15:36:09,145 INFO [org.apache.hadoop.mapred.Task] - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@562f5251
2019-09-10 15:36:09,152 INFO [org.apache.hadoop.mapred.MapTask] - Processing split: file:/E:/hdfsClient/cmh.txt:0+3820
2019-09-10 15:36:09,292 INFO [org.apache.hadoop.mapred.MapTask] - (EQUATOR) 0 kvi 26214396(104857584)
2019-09-10 15:36:09,292 INFO [org.apache.hadoop.mapred.MapTask] - mapreduce.task.io.sort.mb: 100
2019-09-10 15:36:09,292 INFO [org.apache.hadoop.mapred.MapTask] - soft limit at 83886080
2019-09-10 15:36:09,292 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufvoid = 104857600
2019-09-10 15:36:09,292 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396; length = 6553600
2019-09-10 15:36:09,297 INFO [org.apache.hadoop.mapred.MapTask] - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2019-09-10 15:36:09,349 INFO [org.apache.hadoop.mapred.LocalJobRunner] -
2019-09-10 15:36:09,349 INFO [org.apache.hadoop.mapred.MapTask] - Starting flush of map output
2019-09-10 15:36:09,349 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output
2019-09-10 15:36:09,349 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufend = 6460; bufvoid = 104857600
2019-09-10 15:36:09,349 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396(104857584); kvend = 26211760(104847040); length = 2637/6553600
2019-09-10 15:36:09,400 INFO [org.apache.hadoop.mapred.MapTask] - Finished spill 0
2019-09-10 15:36:09,420 INFO [org.apache.hadoop.mapred.Task] - Task:attempt_local725653020_0001_m_000000_0 is done. And is in the process of committing
2019-09-10 15:36:09,434 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map
2019-09-10 15:36:09,434 INFO [org.apache.hadoop.mapred.Task] - Task 'attempt_local725653020_0001_m_000000_0' done.
2019-09-10 15:36:09,434 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Finishing task: attempt_local725653020_0001_m_000000_0
2019-09-10 15:36:09,434 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map task executor complete.
2019-09-10 15:36:09,438 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Waiting for reduce tasks
2019-09-10 15:36:09,438 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Starting task: attempt_local725653020_0001_r_000000_0
2019-09-10 15:36:09,472 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2019-09-10 15:36:09,472 INFO [org.apache.hadoop.yarn.util.ProcfsBasedProcessTree] - ProcfsBasedProcessTree currently is supported only on Linux.
2019-09-10 15:36:09,528 INFO [org.apache.hadoop.mapred.Task] - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@734c86cf
2019-09-10 15:36:09,534 INFO [org.apache.hadoop.mapred.ReduceTask] - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@14e88cd2
2019-09-10 15:36:09,566 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - MergerManager: memoryLimit=1956118528, maxSingleShuffleLimit=489029632, mergeThreshold=1291038336, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2019-09-10 15:36:09,569 INFO [org.apache.hadoop.mapreduce.task.reduce.EventFetcher] - attempt_local725653020_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2019-09-10 15:36:09,641 INFO [org.apache.hadoop.mapreduce.task.reduce.LocalFetcher] - localfetcher#1 about to shuffle output of map attempt_local725653020_0001_m_000000_0 decomp: 7782 len: 7786 to MEMORY
2019-09-10 15:36:09,649 INFO [org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput] - Read 7782 bytes from map-output for attempt_local725653020_0001_m_000000_0
2019-09-10 15:36:09,651 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - closeInMemoryFile -> map-output of size: 7782, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->7782
2019-09-10 15:36:09,653 INFO [org.apache.hadoop.mapreduce.task.reduce.EventFetcher] - EventFetcher is interrupted.. Returning
2019-09-10 15:36:09,654 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied.
2019-09-10 15:36:09,654 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2019-09-10 15:36:09,669 INFO [org.apache.hadoop.mapred.Merger] - Merging 1 sorted segments
2019-09-10 15:36:09,669 INFO [org.apache.hadoop.mapred.Merger] - Down to the last merge-pass, with 1 segments left of total size: 7775 bytes
2019-09-10 15:36:09,679 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merged 1 segments, 7782 bytes to disk to satisfy reduce memory limit
2019-09-10 15:36:09,681 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merging 1 files, 7786 bytes from disk
2019-09-10 15:36:09,682 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merging 0 segments, 0 bytes from memory into reduce
2019-09-10 15:36:09,682 INFO [org.apache.hadoop.mapred.Merger] - Merging 1 sorted segments
2019-09-10 15:36:09,684 INFO [org.apache.hadoop.mapred.Merger] - Down to the last merge-pass, with 1 segments left of total size: 7775 bytes
2019-09-10 15:36:09,684 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied.
2019-09-10 15:36:09,690 INFO [org.apache.hadoop.conf.Configuration.deprecation] - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2019-09-10 15:36:09,719 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local725653020_0001 running in uber mode : false
2019-09-10 15:36:09,731 INFO [org.apache.hadoop.mapred.Task] - Task:attempt_local725653020_0001_r_000000_0 is done. And is in the process of committing
2019-09-10 15:36:09,733 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied.
2019-09-10 15:36:09,733 INFO [org.apache.hadoop.mapred.Task] - Task attempt_local725653020_0001_r_000000_0 is allowed to commit now
2019-09-10 15:36:09,736 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - Saved output of task 'attempt_local725653020_0001_r_000000_0' to file:/E:/hdfsClient/wordcount/_temporary/0/task_local725653020_0001_r_000000
2019-09-10 15:36:09,737 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce > reduce
2019-09-10 15:36:09,737 INFO [org.apache.hadoop.mapred.Task] - Task 'attempt_local725653020_0001_r_000000_0' done.
2019-09-10 15:36:09,737 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Finishing task: attempt_local725653020_0001_r_000000_0
2019-09-10 15:36:09,738 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce task executor complete.
2019-09-10 15:36:09,748 INFO [org.apache.hadoop.mapreduce.Job] - map 100% reduce 100%
2019-09-10 15:36:10,751 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local725653020_0001 completed successfully
2019-09-10 15:36:10,846 INFO [org.apache.hadoop.mapreduce.Job] - Counters: 30
File System Counters
FILE: Number of bytes read=23628
FILE: Number of bytes written=598323
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=110
Map output records=660
Map output bytes=6460
Map output materialized bytes=7786
Input split bytes=92
Combine input records=0
Combine output records=0
Reduce input groups=105
Reduce shuffle bytes=7786
Reduce input records=660
Reduce output records=105
Spilled Records=1320
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=378535936
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=3864
File Output Format Counters
Bytes Written=757
Process finished with exit code 0
集群运行模式
1.将mapreduce程序交给yarn集群,分发到很多的节点上并发执行
2.处理的数据和输出结果应该位于hdfs文件系统
3.提交集群的实现步骤
将程序打成JAR包,然后在集群的任意一个节点上用hadoop命令启动
hadoop jar HDFSClientDemo-1.0-SNAPSHOT.jar com.caimh.mr.WordCountDriver
4.执行程序&日志
[caimh@master-node software]$ hadoop jar HDFSClientDemo-1.0-SNAPSHOT.jar com.caimh.mr.WordCountDriver
19/09/10 02:09:28 INFO client.RMProxy: Connecting to ResourceManager at master-node/192.168.159.10:8032
19/09/10 02:09:30 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/09/10 02:09:31 INFO input.FileInputFormat: Total input paths to process : 1
19/09/10 02:09:31 INFO mapreduce.JobSubmitter: number of splits:1
19/09/10 02:09:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1568019398258_0001
19/09/10 02:09:34 INFO impl.YarnClientImpl: Submitted application application_1568019398258_0001
19/09/10 02:09:34 INFO mapreduce.Job: The url to track the job: http://master-node:8088/proxy/application_1568019398258_0001/
19/09/10 02:09:34 INFO mapreduce.Job: Running job: job_1568019398258_0001
19/09/10 02:09:53 INFO mapreduce.Job: Job job_1568019398258_0001 running in uber mode : false
19/09/10 02:09:53 INFO mapreduce.Job: map 0% reduce 0%
19/09/10 02:10:11 INFO mapreduce.Job: map 100% reduce 0%
19/09/10 02:10:26 INFO mapreduce.Job: map 100% reduce 100%
19/09/10 02:10:27 INFO mapreduce.Job: Job job_1568019398258_0001 completed successfully
19/09/10 02:10:27 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=7786
FILE: Number of bytes written=257229
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=3925
HDFS: Number of bytes written=741
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=13575
Total time spent by all reduces in occupied slots (ms)=11812
Total time spent by all map tasks (ms)=13575
Total time spent by all reduce tasks (ms)=11812
Total vcore-milliseconds taken by all map tasks=13575
Total vcore-milliseconds taken by all reduce tasks=11812
Total megabyte-milliseconds taken by all map tasks=13900800
Total megabyte-milliseconds taken by all reduce tasks=12095488
Map-Reduce Framework
Map input records=110
Map output records=660
Map output bytes=6460
Map output materialized bytes=7786
Input split bytes=105
Combine input records=0
Combine output records=0
Reduce input groups=105
Reduce shuffle bytes=7786
Reduce input records=660
Reduce output records=105
Spilled Records=1320
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=560
CPU time spent (ms)=6180
Physical memory (bytes) snapshot=424845312
Virtual memory (bytes) snapshot=4198445056
Total committed heap usage (bytes)=279969792
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=3820
File Output Format Counters
Bytes Written=741
[caimh@master-node software]$