_00003 Hadoop MapReduce体系结构

博文作者: 妳那伊抹微笑
个性签名: 世界上最遥远的距离不是天涯,也不是海角,而是我站在妳的面前,妳却感觉不到我的存在
技术方向: Flume+Kafka+Storm+Redis/Hbase+Hadoop+Hive+Mahout+Spark ... 云计算技术
转载声明: 可以转载, 但必须以超链接形式标明文章原始出处和作者信息及版权声明,谢谢合作!
qq交流群: 214293307  云计算之嫣然伊笑(期待与你一起学习,共同进步)


# MapReduce的介绍

# MapReduce是Hadoop的分布式计算框架,由两个阶段组成,分别是map和reduce阶段,对于程序员而言,使用过程非常简单,只要覆盖map阶段中的map方法和reduce节点的reduce方法即可

# map和reduce阶段的形参的键值对的形式
# mapreduce的执行流程

瓶颈:磁盘IO

# mapreduce执行原理

1.1 读取输入文件内容,解析成key、value对。对输入文件的每一行,解析成key、value对。每一个键值对调用一次map函数。

1.2 写自己的逻辑,对输入的key、value处理,转换成新的key、value输出。

1.3 对输出的key、value进行分区。

1.4 对不同分区的数据,按照key进行排序、分组。相同key的value放到一个集合中。

1.5 (可选)分组后的数据进行归约。(Combine)

2.0 reduce任务处理

2.1 对多个map任务的输出,按照不同的分区,通过网络copy到不同的reduce节点。

2.2 对多个map任务的输出进行合并、排序。写reduce函数自己的逻辑,对输入的key、value处理,转换成新的key、value输出。

2.3 把reduce的输出保存到文件中。

例子:实现WordCountApp

# 第一个统计单词的java程序(hadoop自带的例子源码)

packageorg.apache.hadoop.examples;

 

importjava.io.IOException;

importjava.util.StringTokenizer;

 

importorg.apache.hadoop.conf.Configuration;

importorg.apache.hadoop.fs.Path;

importorg.apache.hadoop.io.IntWritable;

importorg.apache.hadoop.io.Text;

importorg.apache.hadoop.mapreduce.Job;

importorg.apache.hadoop.mapreduce.Mapper;

importorg.apache.hadoop.mapreduce.Reducer;

importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;

importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

importorg.apache.hadoop.util.GenericOptionsParser;

 

@SuppressWarnings("all")

public classWordCount {

 

       public static class TokenizerMapperextends Mapper<Object, Text, Text, IntWritable> {

 

              private final static IntWritableone = new IntWritable(1);

              private Text word = new Text();

 

              public void map(Object key, Textvalue, Context context) throws IOException, InterruptedException {

                     StringTokenizer itr = newStringTokenizer(value.toString());

                     while (itr.hasMoreTokens()){

                            word.set(itr.nextToken());

                            context.write(word,one);

                     }

              }

       }

 

       public static class IntSumReducer extendsReducer<Text, IntWritable, Text, IntWritable> {

              private IntWritable result = newIntWritable();

 

              public void reduce(Text key,Iterable<IntWritable> values,

                            Context context)throws IOException, InterruptedException {

                     int sum = 0;

                     for (IntWritable val :values) {

                            sum += val.get();

                     }

                     result.set(sum);

                     context.write(key, result);

              }

       }

 

       public static void main(String[] args)throws Exception {

              Configuration conf = newConfiguration();

              String[] otherArgs = newGenericOptionsParser(conf, args).getRemainingArgs();

              if (otherArgs.length != 2) {

                     System.err.println("Usage:wordcount <in> <out>");

                     System.exit(2);

              }

              Job job = new Job(conf, "wordcount");

              job.setJarByClass(WordCount.class);

              job.setMapperClass(TokenizerMapper.class);

              job.setCombinerClass(IntSumReducer.class);

              job.setReducerClass(IntSumReducer.class);

              job.setOutputKeyClass(Text.class);

              job.setOutputValueClass(IntWritable.class);

              FileInputFormat.addInputPath(job,new Path(otherArgs[0]));

              FileOutputFormat.setOutputPath(job,new Path(otherArgs[1]));

              System.exit(job.waitForCompletion(true)? 0 : 1);

       }

}

# 下面运行命令跟输出结果

[hadoop@masterhadoop-1.1.2]$ hadoop jar hadoop-yting-wordcounter.jarorg.apache.hadoop.examples.WordCount /user/hadoop/20140303/test.txt/user/hadoop/20140303/output001

14/03/0310:43:51 INFO input.FileInputFormat: Total input paths to process : 1

14/03/0310:43:52 INFO mapred.JobClient: Running job: job_201403020905_0001

14/03/0310:43:53 INFO mapred.JobClient:  map 0%reduce 0%

14/03/0310:44:12 INFO mapred.JobClient:  map 100%reduce 0%

14/03/03 10:44:25INFO mapred.JobClient:  map 100% reduce100%

14/03/0310:44:29 INFO mapred.JobClient: Job complete: job_201403020905_0001

14/03/0310:44:29 INFO mapred.JobClient: Counters: 29

14/03/0310:44:29 INFO mapred.JobClient:   JobCounters

14/03/03 10:44:29INFO mapred.JobClient:     Launchedreduce tasks=1

14/03/0310:44:29 INFO mapred.JobClient:    SLOTS_MILLIS_MAPS=19773

14/03/0310:44:29 INFO mapred.JobClient:     Totaltime spent by all reduces waiting after reserving slots (ms)=0

14/03/0310:44:29 INFO mapred.JobClient:     Totaltime spent by all maps waiting after reserving slots (ms)=0

14/03/0310:44:29 INFO mapred.JobClient:    Launched map tasks=1

14/03/0310:44:29 INFO mapred.JobClient:    Data-local map tasks=1

14/03/0310:44:29 INFO mapred.JobClient:    SLOTS_MILLIS_REDUCES=13148

14/03/0310:44:29 INFO mapred.JobClient:   FileOutput Format Counters

14/03/0310:44:29 INFO mapred.JobClient:     BytesWritten=188

14/03/0310:44:29 INFO mapred.JobClient:  FileSystemCounters

14/03/0310:44:29 INFO mapred.JobClient:    FILE_BYTES_READ=171

14/03/0310:44:29 INFO mapred.JobClient:    HDFS_BYTES_READ=310

14/03/0310:44:29 INFO mapred.JobClient:    FILE_BYTES_WRITTEN=101391

14/03/0310:44:29 INFO mapred.JobClient:    HDFS_BYTES_WRITTEN=188

14/03/0310:44:29 INFO mapred.JobClient:   FileInput Format Counters

14/03/0310:44:29 INFO mapred.JobClient:     BytesRead=197

14/03/0310:44:29 INFO mapred.JobClient:  Map-Reduce Framework

14/03/0310:44:29 INFO mapred.JobClient:     Mapoutput materialized bytes=163

14/03/0310:44:29 INFO mapred.JobClient:     Mapinput records=8

14/03/0310:44:29 INFO mapred.JobClient:    Reduce shuffle bytes=163

14/03/0310:44:29 INFO mapred.JobClient:    Spilled Records=56

14/03/0310:44:29 INFO mapred.JobClient:     Map output bytes=376

14/03/0310:44:29 INFO mapred.JobClient:     CPUtime spent (ms)=4940

14/03/0310:44:29 INFO mapred.JobClient:     Totalcommitted heap usage (bytes)=63926272

14/03/0310:44:29 INFO mapred.JobClient:    Combine input records=45

14/03/0310:44:29 INFO mapred.JobClient:    SPLIT_RAW_BYTES=113

14/03/0310:44:29 INFO mapred.JobClient:    Reduce input records=28

14/03/0310:44:29 INFO mapred.JobClient:    Reduce input groups=28

14/03/0310:44:29 INFO mapred.JobClient:    Combine output records=28

14/03/0310:44:29 INFO mapred.JobClient:    Physical memory (bytes) snapshot=111722496

14/03/0310:44:29 INFO mapred.JobClient:    Reduce output records=28

14/03/0310:44:29 INFO mapred.JobClient:    Virtual memory (bytes) snapshot=468000768

14/03/0310:44:29 INFO mapred.JobClient:     Mapoutput records=45

[hadoop@masterhadoop-1.1.2]$ hadoop fs -ls /user/hadoop/20140303/output001

Found 3 items

-rw-r--r--   1 hadoop supergroup          0 2014-03-03 10:44/user/hadoop/20140303/output001/_SUCCESS

drwxr-xr-x   - hadoop supergroup          0 2014-03-03 10:43/user/hadoop/20140303/output001/_logs

-rw-r--r--   1 hadoop supergroup        188 2014-03-03 10:44/user/hadoop/20140303/output001/part-r-00000

[hadoop@masterhadoop-1.1.2]$ hadoop fs -text /user/hadoop/20140303/output001/part-t-00000

text: File doesnot exist: /user/hadoop/20140303/output001/part-t-00000

[hadoop@masterhadoop-1.1.2]$ hadoop fs -text /user/hadoop/20140303/output001/part-r-00000

a      1

again       1

and  1

changce   1

easy 1

forever    1

give 1

hand       1

heart       2

hold 1

i      1

is     1

it     1

love 1

me   6

meimei    1

miss 1

see   1

show       1

smile       1

so    1

soul 1

take 3

the   2

to    4

until 1

what       1

you  6

# 最小的MapReduce(默认设置)

Configurationconfiguration = new Configuration();

Job job = newJob(configuration, "HelloWorld");

job.setInputFormat(TextInputFormat.class);

job.setMapperClass(IdentityMapper.class);

job.setMapOutputKeyClass(LongWritable.class);

job.setMapOutputValueClass(Text.class);

job.setPartitionerClass(HashPartitioner.class);

job.setNumReduceTasks(1);

job.setReducerClass(IdentityReducer.class);

job.setOutputKeyClass(LongWritable.class);

job.setOutputValueClass(Text.class);

job.setOutputFormat(TextOutputFormat.class);

job.waitForCompletion(true);

 

# 序列化

# Writable

# 数据流单向的

# LongWritable不能进行加减等操作(没必要,java的基本类型都已经弄了这些功能了)

# JobTracker,TaskTracker

# JobTracker

负责接收用户提交的作业,负责启动、跟踪任务执行。

JobSubmissionProtocol是JobClient与JobTracker通信的接口。

InterTrackerProtocol是TaskTracker与JobTracker通信的接口。

# TaskTracker

负责执行任务

# JobClient

是用户作业与JobTracker交互的主要接口。

负责提交作业的,负责启动、跟踪任务执行、访问任务状态和日志等。

# 执行过程

还想说这个图片上传不上来,大于2M,郁闷、、、


<p>
<a target="_blank" href="http://user.qzone.qq.com/1042658081" color="blue">妳那伊抹微笑</a>
</p>
<a target="_blank" href="http://user.qzone.qq.com/1042658081" color="blue">The you smile until forever 、、、、、、、、、、、、、、、、、、、、、</a>


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值