Hadoop代码分析（三）

最新推荐文章于 2017-08-07 23:55:00 发布

yhy19910223

最新推荐文章于 2017-08-07 23:55:00 发布

阅读量105

点赞数

分类专栏： hadoop 文章标签： Hadoop

本文链接：https://blog.csdn.net/yhy19910223/article/details/83810195

版权

hadoop 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

下面是关于LineRecoedReader的NextKeyValue代码：

public boolean nextKeyValue() throws IOException {
    if (key == null) {
      key = new LongWritable();
    }
    key.set(pos);
    if (value == null) {
      value = new Text();
    }
    int newSize = 0;
    while (pos < end) {
      newSize = in.readLine(value, maxLineLength,
                            Math.max((int)Math.min(Integer.MAX_VALUE, end-pos),
                                     maxLineLength));
      if (newSize == 0) {
        break;
      }
      pos += newSize;
      if (newSize < maxLineLength) {
        break;
      }

      // line too long. try again
      LOG.info("Skipped line of size " + newSize + " at pos " + 
               (pos - newSize));
    }
    if (newSize == 0) {
      key = null;
      value = null;
      return false;
    } else {
      return true;
    }
  }

在key.set(pos)中，pos是该Line的位置，value是该Line的内容，有一个例子说明，是权威指南中的：

On the top of the Crumpetty Tree

The Quangle Wangle sat,

But his face you could not see,

On account of his Beaver Hat.

该记录被LIneRecordReader处理为4条K/V对：

(0, On the top of the Crumpetty Tree)

(33, The Quangle Wangle sat,)

(57, But his face you could not see,)

(89, On account of his Beaver Hat.)

结合wordcount的例子，每一次mapper处理的K/V对，对value进行处理，StringTokenizer itr = new StringTokenizer(value.toString())，将value分割成一个一个的标记，经过mapper的处理。生成如下格式的中间体：

（On,1),(the,1),(top,1),(of,1).....再由job将这个中间体传给reducer进行排序和汇总；

所以，一个job的input从输入到mapper的输出大概是这样：

从FIleInputFormat.addInputPath(args),将input提交给FileInputFormat的getSplit()进行分块，在本例中，TextIputFormat获取每一行数据的LineRecordReader,用LineRecord进行从K/V对的读取，LineRecord其实是就像是读取器，具体的从输入流中读取数据的任务是它完成的，最后读取的K/V对交由mapper进行处理。

yhy19910223

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hadoop代码分析（三）

下面是关于LineRecoedReader的NextKeyValue代码：public boolean nextKeyValue() throws IOException { if (key == null) { key = new LongWritable(); } key.set(pos); if (value == nul...
复制链接

扫一扫

专栏目录