MapReduce程序运行模式

最新推荐文章于 2022-06-01 16:25:02 发布

月正明

最新推荐文章于 2022-06-01 16:25:02 发布

阅读量585

点赞数

分类专栏：大数据文章标签： Hadoop mapreduce

本文链接：https://blog.csdn.net/weixin_38023225/article/details/100700411

版权

大数据专栏收录该内容

50 篇文章 0 订阅

订阅专栏

本地运行模式

1.mapreduce程序是被提交给LocalJobRunner在本地以单进程的形式运行

2.而处理的数据及输出结果可以在本地文件系统，也可以在hdfs上

3.怎样实现本地运行？程序不要带集群配置文件

本质是程序的conf中是否有mapreduce.framework.name=local以及yarn.resourcemanager.hostname参数

4.本地模式非常便于进行业务逻辑的debug，只要在eclipse/idea中打断点即可

5.案例（wordcount）

/**
 * 这个类是MR程序运行的主类，本类中组装了一些程序运行时候所需的信息
 * 比如：使用的是哪个Mapper类，哪个Reducer类，输入数据在哪儿，输出数据在什么地方
 * <p>
 * Created by caimh on 2019/9/10.
 */
public class WordCountDriver {

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        //通过Job对象来封装本次的mr的相关信息
        final Configuration conf = new Configuration();
        //指定mapreduce本地运行，也可以不指定，因为本地mapreduce-default.xml默认是local
        conf.set("mapreduce.framwork.name","local");

        final Job job = Job.getInstance();

        //指定本次mr job jar包运行主类
        job.setJarByClass(WordCountDriver.class);

        //指定本次mr 所用的mapper  reducer类分别是什么
        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReducer.class);

        //指定本次mr mapper阶段的输出  k  v 类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        //指定本次mr 最终输出的 k  v类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        //指定本次mr 数据输入路径及最终结果输出路径
        FileInputFormat.setInputPaths(job, new Path("E:/hdfsClient/cmh.txt"));
        FileOutputFormat.setOutputPath(job, new Path("E:/hdfsClient/wordcount"));

        //提交job
        final boolean bool = job.waitForCompletion(true);

        //退出程序
        System.exit(bool ? 0 : 1);
    }
}

6.程序执行日志

2019-09-10 15:36:06,719 INFO [org.apache.hadoop.conf.Configuration.deprecation] - session.id is deprecated. Instead, use dfs.metrics.session-id
2019-09-10 15:36:06,721 INFO [org.apache.hadoop.metrics.jvm.JvmMetrics] - Initializing JVM Metrics with processName=JobTracker, sessionId=
2019-09-10 15:36:07,611 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2019-09-10 15:36:07,756 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2019-09-10 15:36:07,770 INFO [org.apache.hadoop.mapreduce.lib.input.FileInputFormat] - Total input paths to process : 1
2019-09-10 15:36:08,011 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - number of splits:1
2019-09-10 15:36:08,384 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - Submitting tokens for job: job_local725653020_0001
2019-09-10 15:36:08,700 INFO [org.apache.hadoop.mapreduce.Job] - The url to track the job: http://localhost:8080/
2019-09-10 15:36:08,701 INFO [org.apache.hadoop.mapreduce.Job] - Running job: job_local725653020_0001
2019-09-10 15:36:08,704 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter set in config null
2019-09-10 15:36:08,715 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2019-09-10 15:36:08,717 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2019-09-10 15:36:08,826 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Waiting for map tasks
2019-09-10 15:36:08,827 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Starting task: attempt_local725653020_0001_m_000000_0
2019-09-10 15:36:08,950 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2019-09-10 15:36:08,971 INFO [org.apache.hadoop.yarn.util.ProcfsBasedProcessTree] - ProcfsBasedProcessTree currently is supported only on Linux.
2019-09-10 15:36:09,145 INFO [org.apache.hadoop.mapred.Task] -  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@562f5251
2019-09-10 15:36:09,152 INFO [org.apache.hadoop.mapred.MapTask] - Processing split: file:/E:/hdfsClient/cmh.txt:0+3820
2019-09-10 15:36:09,292 INFO [org.apache.hadoop.mapred.MapTask] - (EQUATOR) 0 kvi 26214396(104857584)
2019-09-10 15:36:09,292 INFO [org.apache.hadoop.mapred.MapTask] - mapreduce.task.io.sort.mb: 100
2019-09-10 15:36:09,292 INFO [org.apache.hadoop.mapred.MapTask] - soft limit at 83886080
2019-09-10 15:36:09,292 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufvoid = 104857600
2019-09-10 15:36:09,292 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396; length = 6553600
2019-09-10 15:36:09,297 INFO [org.apache.hadoop.mapred.MapTask] - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2019-09-10 15:36:09,349 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 
2019-09-10 15:36:09,349 INFO [org.apache.hadoop.mapred.MapTask] - Starting flush of map output
2019-09-10 15:36:09,349 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output
2019-09-10 15:36:09,349 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufend = 6460; bufvoid = 104857600
2019-09-10 15:36:09,349 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396(104857584); kvend = 26211760(104847040); length = 2637/6553600
2019-09-10 15:36:09,400 INFO [org.apache.hadoop.mapred.MapTask] - Finished spill 0
2019-09-10 15:36:09,420 INFO [org.apache.hadoop.mapred.Task] - Task:attempt_local725653020_0001_m_000000_0 is done. And is in the process of committing
2019-09-10 15:36:09,434 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map
2019-09-10 15:36:09,434 INFO [org.apache.hadoop.mapred.Task] - Task 'attempt_local725653020_0001_m_000000_0' done.
2019-09-10 15:36:09,434 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Finishing task: attempt_local725653020_0001_m_000000_0
2019-09-10 15:36:09,434 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map task executor complete.
2019-09-10 15:36:09,438 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Waiting for reduce tasks
2019-09-10 15:36:09,438 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Starting task: attempt_local725653020_0001_r_000000_0
2019-09-10 15:36:09,472 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2019-09-10 15:36:09,472 INFO [org.apache.hadoop.yarn.util.ProcfsBasedProcessTree] - ProcfsBasedProcessTree currently is supported only on Linux.
2019-09-10 15:36:09,528 INFO [org.apache.hadoop.mapred.Task] -  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@734c86cf
2019-09-10 15:36:09,534 INFO [org.apache.hadoop.mapred.ReduceTask] - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@14e88cd2
2019-09-10 15:36:09,566 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - MergerManager: memoryLimit=1956118528, maxSingleShuffleLimit=489029632, mergeThreshold=1291038336, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2019-09-10 15:36:09,569 INFO [org.apache.hadoop.mapreduce.task.reduce.EventFetcher] - attempt_local725653020_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2019-09-10 15:36:09,641 INFO [org.apache.hadoop.mapreduce.task.reduce.LocalFetcher] - localfetcher#1 about to shuffle output of map attempt_local725653020_0001_m_000000_0 decomp: 7782 len: 7786 to MEMORY
2019-09-10 15:36:09,649 INFO [org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput] - Read 7782 bytes from map-output for attempt_local725653020_0001_m_000000_0
2019-09-10 15:36:09,651 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - closeInMemoryFile -> map-output of size: 7782, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->7782
2019-09-10 15:36:09,653 INFO [org.apache.hadoop.mapreduce.task.reduce.EventFetcher] - EventFetcher is interrupted.. Returning
2019-09-10 15:36:09,654 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied.
2019-09-10 15:36:09,654 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2019-09-10 15:36:09,669 INFO [org.apache.hadoop.mapred.Merger] - Merging 1 sorted segments
2019-09-10 15:36:09,669 INFO [org.apache.hadoop.mapred.Merger] - Down to the last merge-pass, with 1 segments left of total size: 7775 bytes
2019-09-10 15:36:09,679 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merged 1 segments, 7782 bytes to disk to satisfy reduce memory limit
2019-09-10 15:36:09,681 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merging 1 files, 7786 bytes from disk
2019-09-10 15:36:09,682 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merging 0 segments, 0 bytes from memory into reduce
2019-09-10 15:36:09,682 INFO [org.apache.hadoop.mapred.Merger] - Merging 1 sorted segments
2019-09-10 15:36:09,684 INFO [org.apache.hadoop.mapred.Merger] - Down to the last merge-pass, with 1 segments left of total size: 7775 bytes
2019-09-10 15:36:09,684 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied.
2019-09-10 15:36:09,690 INFO [org.apache.hadoop.conf.Configuration.deprecation] - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2019-09-10 15:36:09,719 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local725653020_0001 running in uber mode : false
2019-09-10 15:36:09,731 INFO [org.apache.hadoop.mapred.Task] - Task:attempt_local725653020_0001_r_000000_0 is done. And is in the process of committing
2019-09-10 15:36:09,733 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied.
2019-09-10 15:36:09,733 INFO [org.apache.hadoop.mapred.Task] - Task attempt_local725653020_0001_r_000000_0 is allowed to commit now
2019-09-10 15:36:09,736 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - Saved output of task 'attempt_local725653020_0001_r_000000_0' to file:/E:/hdfsClient/wordcount/_temporary/0/task_local725653020_0001_r_000000
2019-09-10 15:36:09,737 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce > reduce
2019-09-10 15:36:09,737 INFO [org.apache.hadoop.mapred.Task] - Task 'attempt_local725653020_0001_r_000000_0' done.
2019-09-10 15:36:09,737 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Finishing task: attempt_local725653020_0001_r_000000_0
2019-09-10 15:36:09,738 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce task executor complete.
2019-09-10 15:36:09,748 INFO [org.apache.hadoop.mapreduce.Job] -  map 100% reduce 100%
2019-09-10 15:36:10,751 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local725653020_0001 completed successfully
2019-09-10 15:36:10,846 INFO [org.apache.hadoop.mapreduce.Job] - Counters: 30
	File System Counters
		FILE: Number of bytes read=23628
		FILE: Number of bytes written=598323
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
	Map-Reduce Framework
		Map input records=110
		Map output records=660
		Map output bytes=6460
		Map output materialized bytes=7786
		Input split bytes=92
		Combine input records=0
		Combine output records=0
		Reduce input groups=105
		Reduce shuffle bytes=7786
		Reduce input records=660
		Reduce output records=105
		Spilled Records=1320
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=378535936
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=3864
	File Output Format Counters 
		Bytes Written=757

Process finished with exit code 0

集群运行模式

1.将mapreduce程序交给yarn集群，分发到很多的节点上并发执行

2.处理的数据和输出结果应该位于hdfs文件系统

3.提交集群的实现步骤

将程序打成JAR包，然后在集群的任意一个节点上用hadoop命令启动

hadoop jar HDFSClientDemo-1.0-SNAPSHOT.jar com.caimh.mr.WordCountDriver

4.执行程序&日志

[caimh@master-node software]$ hadoop jar HDFSClientDemo-1.0-SNAPSHOT.jar com.caimh.mr.WordCountDriver
19/09/10 02:09:28 INFO client.RMProxy: Connecting to ResourceManager at master-node/192.168.159.10:8032
19/09/10 02:09:30 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/09/10 02:09:31 INFO input.FileInputFormat: Total input paths to process : 1
19/09/10 02:09:31 INFO mapreduce.JobSubmitter: number of splits:1
19/09/10 02:09:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1568019398258_0001
19/09/10 02:09:34 INFO impl.YarnClientImpl: Submitted application application_1568019398258_0001
19/09/10 02:09:34 INFO mapreduce.Job: The url to track the job: http://master-node:8088/proxy/application_1568019398258_0001/
19/09/10 02:09:34 INFO mapreduce.Job: Running job: job_1568019398258_0001
19/09/10 02:09:53 INFO mapreduce.Job: Job job_1568019398258_0001 running in uber mode : false
19/09/10 02:09:53 INFO mapreduce.Job:  map 0% reduce 0%
19/09/10 02:10:11 INFO mapreduce.Job:  map 100% reduce 0%
19/09/10 02:10:26 INFO mapreduce.Job:  map 100% reduce 100%
19/09/10 02:10:27 INFO mapreduce.Job: Job job_1568019398258_0001 completed successfully
19/09/10 02:10:27 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=7786
                FILE: Number of bytes written=257229
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=3925
                HDFS: Number of bytes written=741
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=13575
                Total time spent by all reduces in occupied slots (ms)=11812
                Total time spent by all map tasks (ms)=13575
                Total time spent by all reduce tasks (ms)=11812
                Total vcore-milliseconds taken by all map tasks=13575
                Total vcore-milliseconds taken by all reduce tasks=11812
                Total megabyte-milliseconds taken by all map tasks=13900800
                Total megabyte-milliseconds taken by all reduce tasks=12095488
        Map-Reduce Framework
                Map input records=110
                Map output records=660
                Map output bytes=6460
                Map output materialized bytes=7786
                Input split bytes=105
                Combine input records=0
                Combine output records=0
                Reduce input groups=105
                Reduce shuffle bytes=7786
                Reduce input records=660
                Reduce output records=105
                Spilled Records=1320
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=560
                CPU time spent (ms)=6180
                Physical memory (bytes) snapshot=424845312
                Virtual memory (bytes) snapshot=4198445056
                Total committed heap usage (bytes)=279969792
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=3820
        File Output Format Counters 
                Bytes Written=741
[caimh@master-node software]$

月正明

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
MapReduce程序运行模式

本地运行模式1.mapreduce程序是被提交给LocalJobRunner在本地以单进程的形式运行2.而处理的数据及输出结果可以在本地文件系统，也可以在hdfs上3.怎样实现本地运行？程序不要带集群配置文件本质是程序的conf中是否有mapreduce.framework.name=local以及yarn.resourcemanager.hostname参数4.本地模式非常...
复制链接

扫一扫