Hadoop---MapReduce

MapReduce

一、什么是MapReduce

---- 并行计算框架模型

Hadoop MapReduce是一个软件框架,基于该框架能够容易易地编写应⽤用程序,这些应用程序能够运行在由上千个商⽤用机器器组成的⼤大集群上,并以一种可靠的,具有容错能⼒力力的⽅方式并⾏行行地处理理上TB级别的海量数据集。这个定义里面有着这些关键词:
一是软件框架,二是并行处理,三是可靠且容错,四是大规模集群,五是海量数据集。

MapReduce长处理大数据,它为什么具有这种能力呢?这可由MapReduce的设计思想发觉。
MapReduce的思想就是“分而治之”。

  • Mapper负责“分”,即把复杂的任务分解为若干个“简单的任务”来处理理。“简单的任务”包含三层含义:

    • 是数据或计算的规模相对原任务要⼤大缩小;
    • 是就近计算原则,即任务会分配到存放着所需数据的节点上进⾏行行计算;
    • 是这些⼩小任务可以并⾏行行计算,彼此间⼏几乎没有依赖关系。
  • Reducer负责对map阶段的结果进行汇总。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7OPG4Pq9-1597307293861)(assets\1552610571718.png)]
在这里插入图片描述

二、什么是Yarn

-----分布式集群的资源管理和调度平台

https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
Apache Hadoop YARN (Yet Another Resource Negotiator,另一种资源协调者)是一种新的 Hadoop 资源管理器,它是一个通用资源管理系统,可为上层应⽤用提供统一的资源管理和调度,它的引入为集群在利用率、资源统一管理和数据共享等方面带来了巨大好处。Hbase、Hive、Spark On Yarn mapReduce 都可以在该框架上运行

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lSKSh7HL-1597307293863)(assets\1552610746289.png)]
在这里插入图片描述

  • ResourceManager资源管理器 负责集群管理和资源管理调度并接收NodeManger的汇报监控NodeManger

  • NodeManager是每台机器器框架代理理,负责容器器,监视其资源使⽤用情况(CPU,内存,磁盘,⽹网络)并将其报告给ResourceManager / Scheduler。

  • App Master :Master负责任务计算过程中的任务监控、故障转移,每个Job只有一个。管理这一个MR任务

  • **Container:**表示一个计算进程容器(打包一系列的计算资源) 默认大小1G

详解:
2.1、Resourcemanager
ResourceManager 拥有系统所有资源分配的决定权,负责集群中所有应用程序的资源分配,拥有集群资源主要、全局视图。因此为用户提供公平的,基于容量的,本地化资源调度。根据程序的需求,调度优先级以及可用资源情况,动态分配特定节点运行应用程序。它与每个节点上的NodeManager和每一个应用程序的ApplicationMaster协调工作。

ResourceManager的主要职责在于调度,即在竞争的应用程序之间分配系统中的可用资源,并不关注每个应用程序的状态管理。

ResourceManager主要有两个组件:Scheduler和ApplicationManager:Scheduler是一个资源调度器,它主要负责协调集群中各个应用的资源分配,保障整个集群的运行效率。Scheduler的角色是一个纯调度器,它只负责调度Containers,不会关心应用程序监控及其运行状态等信息。同样,它也不能重启因应用失败或者硬件错误而运行失败的任务。

2.1.1、Scheduler
Scheduler是一个可插拔的插件,负责各个运行中的应用的资源分配,受到资源容量,队列以及其他因素的影响。是一个纯粹的调度器,不负责应用程序的监控和状态追踪,不保证应用程序的失败或者硬件失败的情况对task重启,而是基于应用程序的资源需求执行其调度功能,使用了叫做资源container的概念,其中包括多种资源,比如,cpu,内存,磁盘,网络等。在Hadoop的MapReduce框架中主要有三种Scheduler:FIFO Scheduler,Capacity Scheduler和Fair Scheduler。

FIFO Scheduler:先进先出,不考虑作业优先级和范围,适合低负载集群。
Capacity Scheduler:将资源分为多个队列,允许共享集群,有保证每个队列最小资源的使用。
Fair Scheduler:公平的将资源分给应用的方式,使得所有应用在平均情况下随着时间得到相同的资源份额。

2.1.2、ApplicationManager
ApplicationManager主要负责接收job的提交请求,为应用分配第一个Container来运行ApplicationMaster,还有就是负责监控ApplicationMaster,在遇到失败时重启ApplicationMaster运行的Container

2.2、NodeManager
NodeManager是yarn节点的一个“工作进程”代理,管理hadoop集群中独立的计算节点,主要负责与ResourceManager通信,负责启动和管理应用程序的container的生命周期,监控它们的资源使用情况(cpu和内存),跟踪节点的监控状态,管理日志等。并报告给RM。

NodeManager在启动时,NodeManager向ResourceManager注册,然后发送心跳包来等待ResourceManager的指令,主要目的是管理resourcemanager分配给它的应用程序container。NodeManager只负责管理自身的Container,它并不知道运行在它上面应用的信息。在运行期,通过NodeManager和ResourceManager协同工作,这些信息会不断被更新并保障整个集群发挥出最佳状态

主要职责:
1、接收ResourceManager的请求,分配Container给应用的某个任务
2、和ResourceManager交换信息以确保整个集群平稳运行。ResourceManager就是通过收集每个NodeManager的报告信息来追踪整个集群健康状态的,而NodeManager负责监控自身的健康状态。
3、管理每个Container的生命周期
4、管理每个节点上的日志
5、执行Yarn上面应用的一些额外的服务,比如MapReduce的shuffle过程

2.2.1、Container
Container是Yarn框架的计算单元,是具体执行应用task(如map task、reduce task)的基本单位。Container和集群节点的关系是:一个节点会运行多个Container,但一个Container不会跨节点。

一个Container就是一组分配的系统资源,现阶段只包含两种系统资源(之后可能会增加磁盘、网络、GPU等资源),由NodeManager监控,Resourcemanager调度。

每一个应用程序从ApplicationMaster开始,它本身就是一个container(第0个),一旦启动,ApplicationMaster就会更加任务需求与Resourcemanager协商更多的container,在运行过程中,可以动态释放和申请container。

2.3、ApplicationMaster
ApplicationMaster负责与scheduler协商合适的container,跟踪应用程序的状态,以及监控它们的进度,ApplicationMaster是协调集群中应用程序执行的进程。每个应用程序都有自己的ApplicationMaster,负责与ResourceManager协商资源(container)和NodeManager协同工作来执行和监控任务 。

当一个ApplicationMaster启动后,会周期性的向resourcemanager发送心跳报告来确认其健康和所需的资源情况,在建好的需求模型中,ApplicationMaster在发往resourcemanager中的心跳信息中封装偏好和限制,在随后的心跳中,ApplicationMaster会对收到集群中特定节点上绑定了一定的资源的container的租约,根据Resourcemanager发来的container,ApplicationMaster可以更新它的执行计划以适应资源不足或者过剩,container可以动态的分配和释放资源。

三、架构篇

Application在Yarn中的执行过程如下图所示:

在这里插入图片描述

1、客户端程序向ResourceManager提交应用并请求一个ApplicationMaster实例,ResourceManager在应答中给出一个applicationID以及有助于客户端请求资源的资源容量信息。

2、ResourceManager找到可以运行一个Container的NodeManager,并在这个Container中启动ApplicationMaster实例

Application Submission Context发出响应,其中包含有:ApplicationID,用户名,队列以及其他启动ApplicationMaster的信息,
Container Launch Context(CLC)也会发给ResourceManager,CLC提供了资源的需求,作业文件,安全令牌以及在节点启动ApplicationMaster所需要的其他信息。
当ResourceManager接收到客户端提交的上下文,就会给ApplicationMaster调度一个可用的container(通常称为container0)。然后ResourceManager就会联系NodeManager启动ApplicationMaster,并建立ApplicationMaster的RPC端口和用于跟踪的URL,用来监控应用程序的状态。

3、ApplicationMaster向ResourceManager进行注册,注册之后客户端就可以查询ResourceManager获得自己ApplicationMaster的详细信息,以后就可以和自己的ApplicationMaster直接交互了。在注册响应中,ResourceManager会发送关于集群最大和最小容量信息,

4、在平常的操作过程中,ApplicationMaster根据resource-request协议向ResourceManager发送resource-request请求,ResourceManager会根据调度策略尽可能最优的为ApplicationMaster分配container资源,作为资源请求的应答发个ApplicationMaster

5、当Container被成功分配之后,ApplicationMaster通过向NodeManager发送container-launch-specification信息来启动Container, container-launch-specification信息包含了能够让Container和ApplicationMaster交流所需要的资料,一旦container启动成功之后,ApplicationMaster就可以检查他们的状态,Resourcemanager不在参与程序的执行,只处理调度和监控其他资源,Resourcemanager可以命令NodeManager杀死container,

6、应用程序的代码在启动的Container中运行,并把运行的进度、状态等信息通过application-specific协议发送给ApplicationMaster,随着作业的执行,ApplicationMaster将心跳和进度信息发给ResourceManager,在这些心跳信息中,ApplicationMaster还可以请求和释放一些container。

7、在应用程序运行期间,提交应用的客户端主动和ApplicationMaster交流获得应用的运行状态、进度更新等信息,交流的协议也是application-specific协议

8、一但应用程序执行完成并且所有相关工作也已经完成,ApplicationMaster向ResourceManager取消注册然后关闭,用到所有的Container也归还给系统,当container被杀死或者回收,Resourcemanager都会通知NodeManager聚合日志并清理container专用的文件。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Pr5ZOC2n-1597307293864)(assets\2017-08-22_140357.png)]
在这里插入图片描述

MapReduce工作流程

1.run job

2.get new application

3.copy job resouce

4.submit job

5.init container

6.init mrappmaster

7.retrieve input splits

8.allocate resource

9.init container(计算容器)

10·retrieve job resource(接受任务资源 代码 配置 数据)

11·run map任务或者reduce任务

12·result

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5GV1ebCO-1597307293866)(assets\Yarn 计算.png)]
在这里插入图片描述

四、环境搭建

在HDFS环境上进行修改

  • 修改 etc/hadoop/mapred-site.xml
    [root@node1 hadoop-2.6.0]# mv etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
    
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    
  • 修改 etc/hadoop/yarn-site.xml
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>主机名(hadoop)</value>
    </property>
    
  • 启动服务

    也需要把HDfs的服务启动

    [root@hadoop ~]# hdfs namenode -format
    #namenode格式化只需要在初次使⽤用hadoop的时候执行,以后无需每次启动执行
    [root@hadoop ~]# start-dfs.sh
    # 启动hdfs
    
    # 启动Yarn
    [root@hadoop hadoop-2.6.0]# start-yarn.sh
    

五、开发实例篇

  • Maven依赖
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.6.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.6.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-common</artifactId>
    <version>2.6.0</version>
    </dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-core</artifactId>
    <version>2.6.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
    <version>2.6.0</version>
</dependency>
MapReduce使用方法:

1.创建Maven项目

2.创建Mapper程序

package com.baizhi.yarn;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;


public class MyMapper extends Mapper<LongWritable,Text,Text,IntWritable>{
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String[] str = value.toString().split(" ");
        for (String s : str) {
            context.write(new Text(s),new IntWritable(1));
        }
    }
}

3.创建Reducer程序

package com.baizhi.yarn;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class MyReduce extends Reducer<Text,IntWritable,Text,IntWritable> {
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable value : values) {
            sum = value.get();
        }
        context.write(key,new IntWritable(sum));
    }
}

4,·定制入口类

package com.baizhi.yarn;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import java.io.IOException;

public class InitMR {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        // 1 初始化MR任务对象
        Configuration configuration = new Configuration();
        Job job = Job.getInstance(configuration, "Word COUNT");
        job.setJarByClass(InitMR.class);
        // 2 设置数据的输入类型和输出类型
        // inputFormat 决定了如何切割数据集 如何读取切割后的数据
        // outputFormat 如何输出计算结果
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        //3. 设置数据集的来源和计算结果的输出目的地
        /**
         * *************************** 一: 在虚拟机中运行
         */
        /*TextInputFormat.addInputPath(job,new Path("hdfs://hadoop:9000/WordCount.txt"));
        TextOutputFormat.setOutputPath(job,new Path("hdfs://hadoop:9000/result1"));*/

        /**
         * ************************** 一: 在idea中用主函数运行
         *  2.MR测试⽅方法⼆二:本地计算(用本地的Hadoop进行计算)+本地HDFS⽂建
         */
        /*TextInputFormat.addInputPath(job,new Path("file:///E://WordCount.txt"));
        TextOutputFormat.setOutputPath(job,new Path("file:///E://result"));
        */
        /**
         *  3.本地计算+远程HDFS⽂文件
         */
        TextInputFormat.addInputPath(job,new Path("hdfs://hadoop:9000/WordCount.txt"));
        TextOutputFormat.setOutputPath(job,new Path("hdfs://hadoop:9000/result2"));

        //4. 设置keyOut valueOut数据类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        //5. 其它
        // 设置初始化MR程序的Map任务的实现类和Reduce任务的实现类
        job.setMapperClass(MyMapper.class);
        job.setReducerClass(MyReduce.class);

        //6. 提交MR程序
        job.waitForCompletion(true);


    }
}

5.使用4种运行方式其中的一种运行

–1.在hadoop环境中通过运行jar包,测试运行行MapReduce程序

   将代码打包 拉入jvm 中运行
 //3. 设置数据集的来源和计算结果的输出目的地
        /**
         *  1 : 在虚拟机中运行
         */
        TextInputFormat.addInputPath(job,new Path("hdfs://hadoop:9000/WordCount.txt"));
        TextOutputFormat.setOutputPath(job,new Path("hdfs://hadoop:9000/result1"));
[root@node1 hadoop-2.6.0] bin/hadoop jar jar包路径(/mr_demo-1.0-SNAPSHOT.jar) 主函数名称(com.baizhi.yarn.InitMR)

  • –2.使用主函数运行
    • 本地计算+本地HDFS⽂文件

      在 InitMR.java 类中加⼊入以下代码

      /**
               *  2.MR测试⽅方法二:本地计算(用本地的Hadoop进行计算)+本地HDFS⽂建
               */
            TextInputFormat.addInputPath(job,new Path("file:///E://WordCount.txt"));
            TextOutputFormat.setOutputPath(job,new Path("file:///E://result"));
             
      

      需要修改yar的源码

      在项目录新建 org.apache.hadoop.io.nativeio包创建NativeIO类替换他的NativeIO类
      修改NativeIO源码的279行修改为true return access0(path, desiredAccess.accessRight()); 修改为 return true;

      //
      // Source code recreated from a .class file by IntelliJ IDEA
      // (powered by Fernflower decompiler)
      //
      
      package org.apache.hadoop.io.nativeio;
      
      import com.google.common.annotations.VisibleForTesting;
      import java.io.Closeable;
      import java.io.File;
      import java.io.FileDescriptor;
      import java.io.FileInputStream;
      import java.io.FileOutputStream;
      import java.io.IOException;
      import java.io.RandomAccessFile;
      import java.lang.reflect.Field;
      import java.nio.ByteBuffer;
      import java.nio.MappedByteBuffer;
      import java.nio.channels.FileChannel;
      import java.util.Map;
      import java.util.concurrent.ConcurrentHashMap;
      import org.apache.commons.logging.Log;
      import org.apache.commons.logging.LogFactory;
      import org.apache.hadoop.classification.InterfaceAudience.Private;
      import org.apache.hadoop.classification.InterfaceStability.Unstable;
      import org.apache.hadoop.conf.Configuration;
      import org.apache.hadoop.fs.HardLink;
      import org.apache.hadoop.io.IOUtils;
      import org.apache.hadoop.io.SecureIOUtils.AlreadyExistsException;
      import org.apache.hadoop.util.NativeCodeLoader;
      import org.apache.hadoop.util.PerformanceAdvisory;
      import org.apache.hadoop.util.Shell;
      import sun.misc.Cleaner;
      import sun.misc.Unsafe;
      import sun.nio.ch.DirectBuffer;
      
      @Private
      @Unstable
      public class NativeIO {
          private static boolean workaroundNonThreadSafePasswdCalls = false;
          private static final Log LOG = LogFactory.getLog(NativeIO.class);
          private static boolean nativeLoaded = false;
          private static final Map<Long, NativeIO.CachedUid> uidCache;
          private static long cacheTimeout;
          private static boolean initialized;
      
          public NativeIO() {
          }
      
          public static boolean isAvailable() {
              return NativeCodeLoader.isNativeCodeLoaded() && nativeLoaded;
          }
      
          private static native void initNative();
      
          static long getMemlockLimit() {
              return isAvailable() ? getMemlockLimit0() : 0L;
          }
      
          private static native long getMemlockLimit0();
      
          static long getOperatingSystemPageSize() {
              try {
                  Field f = Unsafe.class.getDeclaredField("theUnsafe");
                  f.setAccessible(true);
                  Unsafe unsafe = (Unsafe)f.get((Object)null);
                  return (long)unsafe.pageSize();
              } catch (Throwable var2) {
                  LOG.warn("Unable to get operating system page size.  Guessing 4096.", var2);
                  return 4096L;
              }
          }
      
          private static String stripDomain(String name) {
              int i = name.indexOf(92);
              if (i != -1) {
                  name = name.substring(i + 1);
              }
      
              return name;
          }
      
          public static String getOwner(FileDescriptor fd) throws IOException {
              ensureInitialized();
              if (Shell.WINDOWS) {
                  String owner = NativeIO.Windows.getOwner(fd);
                  owner = stripDomain(owner);
                  return owner;
              } else {
                  long uid = NativeIO.POSIX.getUIDforFDOwnerforOwner(fd);
                  NativeIO.CachedUid cUid = (NativeIO.CachedUid)uidCache.get(uid);
                  long now = System.currentTimeMillis();
                  if (cUid != null && cUid.timestamp + cacheTimeout > now) {
                      return cUid.username;
                  } else {
                      String user = NativeIO.POSIX.getUserName(uid);
                      LOG.info("Got UserName " + user + " for UID " + uid + " from the native implementation");
                      cUid = new NativeIO.CachedUid(user, now);
                      uidCache.put(uid, cUid);
                      return user;
                  }
              }
          }
      
          public static FileInputStream getShareDeleteFileInputStream(File f) throws IOException {
              if (!Shell.WINDOWS) {
                  return new FileInputStream(f);
              } else {
                  FileDescriptor fd = NativeIO.Windows.createFile(f.getAbsolutePath(), 2147483648L, 7L, 3L);
                  return new FileInputStream(fd);
              }
          }
      
          public static FileInputStream getShareDeleteFileInputStream(File f, long seekOffset) throws IOException {
              if (!Shell.WINDOWS) {
                  RandomAccessFile rf = new RandomAccessFile(f, "r");
                  if (seekOffset > 0L) {
                      rf.seek(seekOffset);
                  }
      
                  return new FileInputStream(rf.getFD());
              } else {
                  FileDescriptor fd = NativeIO.Windows.createFile(f.getAbsolutePath(), 2147483648L, 7L, 3L);
                  if (seekOffset > 0L) {
                      NativeIO.Windows.setFilePointer(fd, seekOffset, 0L);
                  }
      
                  return new FileInputStream(fd);
              }
          }
      
          public static FileOutputStream getCreateForWriteFileOutputStream(File f, int permissions) throws IOException {
              FileDescriptor fd;
              if (!Shell.WINDOWS) {
                  try {
                      fd = NativeIO.POSIX.open(f.getAbsolutePath(), 193, permissions);
                      return new FileOutputStream(fd);
                  } catch (NativeIOException var3) {
                      if (var3.getErrno() == Errno.EEXIST) {
                          throw new AlreadyExistsException(var3);
                      } else {
                          throw var3;
                      }
                  }
              } else {
                  try {
                      fd = NativeIO.Windows.createFile(f.getCanonicalPath(), 1073741824L, 7L, 1L);
                      NativeIO.POSIX.chmod(f.getCanonicalPath(), permissions);
                      return new FileOutputStream(fd);
                  } catch (NativeIOException var4) {
                      if (var4.getErrorCode() == 80L) {
                          throw new AlreadyExistsException(var4);
                      } else {
                          throw var4;
                      }
                  }
              }
          }
      
          private static synchronized void ensureInitialized() {
              if (!initialized) {
                  cacheTimeout = (new Configuration()).getLong("hadoop.security.uid.cache.secs", 14400L) * 1000L;
                  LOG.info("Initialized cache for UID to User mapping with a cache timeout of " + cacheTimeout / 1000L + " seconds.");
                  initialized = true;
              }
      
          }
      
          public static void renameTo(File src, File dst) throws IOException {
              if (!nativeLoaded) {
                  if (!src.renameTo(dst)) {
                      throw new IOException("renameTo(src=" + src + ", dst=" + dst + ") failed.");
                  }
              } else {
                  renameTo0(src.getAbsolutePath(), dst.getAbsolutePath());
              }
      
          }
      
          public static void link(File src, File dst) throws IOException {
              if (!nativeLoaded) {
                  HardLink.createHardLink(src, dst);
              } else {
                  link0(src.getAbsolutePath(), dst.getAbsolutePath());
              }
      
          }
      
          private static native void renameTo0(String var0, String var1) throws NativeIOException;
      
          private static native void link0(String var0, String var1) throws NativeIOException;
      
          public static void copyFileUnbuffered(File src, File dst) throws IOException {
              if (nativeLoaded && Shell.WINDOWS) {
                  copyFileUnbuffered0(src.getAbsolutePath(), dst.getAbsolutePath());
              } else {
                  FileInputStream fis = null;
                  FileOutputStream fos = null;
                  FileChannel input = null;
                  FileChannel output = null;
      
                  try {
                      fis = new FileInputStream(src);
                      fos = new FileOutputStream(dst);
                      input = fis.getChannel();
                      output = fos.getChannel();
                      long remaining = input.size();
                      long position = 0L;
      
                      for(long transferred = 0L; remaining > 0L; position += transferred) {
                          transferred = input.transferTo(position, remaining, output);
                          remaining -= transferred;
                      }
                  } finally {
                      IOUtils.cleanup(LOG, new Closeable[]{output});
                      IOUtils.cleanup(LOG, new Closeable[]{fos});
                      IOUtils.cleanup(LOG, new Closeable[]{input});
                      IOUtils.cleanup(LOG, new Closeable[]{fis});
                  }
              }
      
          }
      
          private static native void copyFileUnbuffered0(String var0, String var1) throws NativeIOException;
      
          static {
              if (NativeCodeLoader.isNativeCodeLoaded()) {
                  try {
                      initNative();
                      nativeLoaded = true;
                  } catch (Throwable var1) {
                      PerformanceAdvisory.LOG.debug("Unable to initialize NativeIO libraries", var1);
                  }
              }
      
              uidCache = new ConcurrentHashMap();
              initialized = false;
          }
      
          private static class CachedUid {
              final long timestamp;
              final String username;
      
              public CachedUid(String username, long timestamp) {
                  this.timestamp = timestamp;
                  this.username = username;
              }
          }
      
          public static class Windows {
              public static final long GENERIC_READ = 2147483648L;
              public static final long GENERIC_WRITE = 1073741824L;
              public static final long FILE_SHARE_READ = 1L;
              public static final long FILE_SHARE_WRITE = 2L;
              public static final long FILE_SHARE_DELETE = 4L;
              public static final long CREATE_NEW = 1L;
              public static final long CREATE_ALWAYS = 2L;
              public static final long OPEN_EXISTING = 3L;
              public static final long OPEN_ALWAYS = 4L;
              public static final long TRUNCATE_EXISTING = 5L;
              public static final long FILE_BEGIN = 0L;
              public static final long FILE_CURRENT = 1L;
              public static final long FILE_END = 2L;
              public static final long FILE_ATTRIBUTE_NORMAL = 128L;
      
              public Windows() {
              }
      
              public static native FileDescriptor createFile(String var0, long var1, long var3, long var5) throws IOException;
      
              public static native long setFilePointer(FileDescriptor var0, long var1, long var3) throws IOException;
      
              private static native String getOwner(FileDescriptor var0) throws IOException;
      
              private static native boolean access0(String var0, int var1);
      
              public static boolean access(String path, NativeIO.Windows.AccessRight desiredAccess) throws IOException {
                  // hadoop源码的错误
                  return true;
              }
      
              public static native void extendWorkingSetSize(long var0) throws IOException;
      
              static {
                  if (NativeCodeLoader.isNativeCodeLoaded()) {
                      try {
                          NativeIO.initNative();
                          NativeIO.nativeLoaded = true;
                      } catch (Throwable var1) {
                          PerformanceAdvisory.LOG.debug("Unable to initialize NativeIO libraries", var1);
                      }
                  }
      
              }
      
              public static enum AccessRight {
                  ACCESS_READ(1),
                  ACCESS_WRITE(2),
                  ACCESS_EXECUTE(32);
      
                  private final int accessRight;
      
                  private AccessRight(int access) {
                      this.accessRight = access;
                  }
      
                  public int accessRight() {
                      return this.accessRight;
                  }
              }
          }
      
          public static class POSIX {
              public static final int O_RDONLY = 0;
              public static final int O_WRONLY = 1;
              public static final int O_RDWR = 2;
              public static final int O_CREAT = 64;
              public static final int O_EXCL = 128;
              public static final int O_NOCTTY = 256;
              public static final int O_TRUNC = 512;
              public static final int O_APPEND = 1024;
              public static final int O_NONBLOCK = 2048;
              public static final int O_SYNC = 4096;
              public static final int O_ASYNC = 8192;
              public static final int O_FSYNC = 4096;
              public static final int O_NDELAY = 2048;
              public static final int POSIX_FADV_NORMAL = 0;
              public static final int POSIX_FADV_RANDOM = 1;
              public static final int POSIX_FADV_SEQUENTIAL = 2;
              public static final int POSIX_FADV_WILLNEED = 3;
              public static final int POSIX_FADV_DONTNEED = 4;
              public static final int POSIX_FADV_NOREUSE = 5;
              public static final int SYNC_FILE_RANGE_WAIT_BEFORE = 1;
              public static final int SYNC_FILE_RANGE_WRITE = 2;
              public static final int SYNC_FILE_RANGE_WAIT_AFTER = 4;
              private static final Log LOG = LogFactory.getLog(NativeIO.class);
              private static boolean nativeLoaded = false;
              private static boolean fadvisePossible = true;
              private static boolean syncFileRangePossible = true;
              static final String WORKAROUND_NON_THREADSAFE_CALLS_KEY = "hadoop.workaround.non.threadsafe.getpwuid";
              static final boolean WORKAROUND_NON_THREADSAFE_CALLS_DEFAULT = true;
              private static long cacheTimeout = -1L;
              private static NativeIO.POSIX.CacheManipulator cacheManipulator = new NativeIO.POSIX.CacheManipulator();
              private static final Map<Integer, NativeIO.POSIX.CachedName> USER_ID_NAME_CACHE;
              private static final Map<Integer, NativeIO.POSIX.CachedName> GROUP_ID_NAME_CACHE;
              public static final int MMAP_PROT_READ = 1;
              public static final int MMAP_PROT_WRITE = 2;
              public static final int MMAP_PROT_EXEC = 4;
      
              public POSIX() {
              }
      
              public static NativeIO.POSIX.CacheManipulator getCacheManipulator() {
                  return cacheManipulator;
              }
      
              public static void setCacheManipulator(NativeIO.POSIX.CacheManipulator cacheManipulator) {
                  cacheManipulator = cacheManipulator;
              }
      
              public static boolean isAvailable() {
                  return NativeCodeLoader.isNativeCodeLoaded() && nativeLoaded;
              }
      
              private static void assertCodeLoaded() throws IOException {
                  if (!isAvailable()) {
                      throw new IOException("NativeIO was not loaded");
                  }
              }
      
              public static native FileDescriptor open(String var0, int var1, int var2) throws IOException;
      
              private static native NativeIO.POSIX.Stat fstat(FileDescriptor var0) throws IOException;
      
              private static native void chmodImpl(String var0, int var1) throws IOException;
      
              public static void chmod(String path, int mode) throws IOException {
                  if (!Shell.WINDOWS) {
                      chmodImpl(path, mode);
                  } else {
                      try {
                          chmodImpl(path, mode);
                      } catch (NativeIOException var3) {
                          if (var3.getErrorCode() == 3L) {
                              throw new NativeIOException("No such file or directory", Errno.ENOENT);
                          }
      
                          LOG.warn(String.format("NativeIO.chmod error (%d): %s", var3.getErrorCode(), var3.getMessage()));
                          throw new NativeIOException("Unknown error", Errno.UNKNOWN);
                      }
                  }
      
              }
      
              static native void posix_fadvise(FileDescriptor var0, long var1, long var3, int var5) throws NativeIOException;
      
              static native void sync_file_range(FileDescriptor var0, long var1, long var3, int var5) throws NativeIOException;
      
              static void posixFadviseIfPossible(String identifier, FileDescriptor fd, long offset, long len, int flags) throws NativeIOException {
                  if (nativeLoaded && fadvisePossible) {
                      try {
                          posix_fadvise(fd, offset, len, flags);
                      } catch (UnsupportedOperationException var8) {
                          fadvisePossible = false;
                      } catch (UnsatisfiedLinkError var9) {
                          fadvisePossible = false;
                      }
                  }
      
              }
      
              public static void syncFileRangeIfPossible(FileDescriptor fd, long offset, long nbytes, int flags) throws NativeIOException {
                  if (nativeLoaded && syncFileRangePossible) {
                      try {
                          sync_file_range(fd, offset, nbytes, flags);
                      } catch (UnsupportedOperationException var7) {
                          syncFileRangePossible = false;
                      } catch (UnsatisfiedLinkError var8) {
                          syncFileRangePossible = false;
                      }
                  }
      
              }
      
              static native void mlock_native(ByteBuffer var0, long var1) throws NativeIOException;
      
              static void mlock(ByteBuffer buffer, long len) throws IOException {
                  assertCodeLoaded();
                  if (!buffer.isDirect()) {
                      throw new IOException("Cannot mlock a non-direct ByteBuffer");
                  } else {
                      mlock_native(buffer, len);
                  }
              }
      
              public static void munmap(MappedByteBuffer buffer) {
                  if (buffer instanceof DirectBuffer) {
                      Cleaner cleaner = ((DirectBuffer)buffer).cleaner();
                      cleaner.clean();
                  }
      
              }
      
              private static native long getUIDforFDOwnerforOwner(FileDescriptor var0) throws IOException;
      
              private static native String getUserName(long var0) throws IOException;
      
              public static NativeIO.POSIX.Stat getFstat(FileDescriptor fd) throws IOException {
                  NativeIO.POSIX.Stat stat = null;
                  if (!Shell.WINDOWS) {
                      stat = fstat(fd);
                      stat.owner = getName(NativeIO.POSIX.IdCache.USER, stat.ownerId);
                      stat.group = getName(NativeIO.POSIX.IdCache.GROUP, stat.groupId);
                  } else {
                      try {
                          stat = fstat(fd);
                      } catch (NativeIOException var3) {
                          if (var3.getErrorCode() == 6L) {
                              throw new NativeIOException("The handle is invalid.", Errno.EBADF);
                          }
      
                          LOG.warn(String.format("NativeIO.getFstat error (%d): %s", var3.getErrorCode(), var3.getMessage()));
                          throw new NativeIOException("Unknown error", Errno.UNKNOWN);
                      }
                  }
      
                  return stat;
              }
      
              private static String getName(NativeIO.POSIX.IdCache domain, int id) throws IOException {
                  Map<Integer, NativeIO.POSIX.CachedName> idNameCache = domain == NativeIO.POSIX.IdCache.USER ? USER_ID_NAME_CACHE : GROUP_ID_NAME_CACHE;
                  NativeIO.POSIX.CachedName cachedName = (NativeIO.POSIX.CachedName)idNameCache.get(id);
                  long now = System.currentTimeMillis();
                  String name;
                  if (cachedName != null && cachedName.timestamp + cacheTimeout > now) {
                      name = cachedName.name;
                  } else {
                      name = domain == NativeIO.POSIX.IdCache.USER ? getUserName(id) : getGroupName(id);
                      if (LOG.isDebugEnabled()) {
                          String type = domain == NativeIO.POSIX.IdCache.USER ? "UserName" : "GroupName";
                          LOG.debug("Got " + type + " " + name + " for ID " + id + " from the native implementation");
                      }
      
                      cachedName = new NativeIO.POSIX.CachedName(name, now);
                      idNameCache.put(id, cachedName);
                  }
      
                  return name;
              }
      
              static native String getUserName(int var0) throws IOException;
      
              static native String getGroupName(int var0) throws IOException;
      
              public static native long mmap(FileDescriptor var0, int var1, boolean var2, long var3) throws IOException;
      
              public static native void munmap(long var0, long var2) throws IOException;
      
              static {
                  if (NativeCodeLoader.isNativeCodeLoaded()) {
                      try {
                          Configuration conf = new Configuration();
                          NativeIO.workaroundNonThreadSafePasswdCalls = conf.getBoolean("hadoop.workaround.non.threadsafe.getpwuid", true);
                          NativeIO.initNative();
                          nativeLoaded = true;
                          cacheTimeout = conf.getLong("hadoop.security.uid.cache.secs", 14400L) * 1000L;
                          LOG.debug("Initialized cache for IDs to User/Group mapping with a  cache timeout of " + cacheTimeout / 1000L + " seconds.");
                      } catch (Throwable var1) {
                          PerformanceAdvisory.LOG.debug("Unable to initialize NativeIO libraries", var1);
                      }
                  }
      
                  USER_ID_NAME_CACHE = new ConcurrentHashMap();
                  GROUP_ID_NAME_CACHE = new ConcurrentHashMap();
              }
      
              private static enum IdCache {
                  USER,
                  GROUP;
      
                  private IdCache() {
                  }
              }
      
              private static class CachedName {
                  final long timestamp;
                  final String name;
      
                  public CachedName(String name, long timestamp) {
                      this.name = name;
                      this.timestamp = timestamp;
                  }
              }
      
              public static class Stat {
                  private int ownerId;
                  private int groupId;
                  private String owner;
                  private String group;
                  private int mode;
                  public static final int S_IFMT = 61440;
                  public static final int S_IFIFO = 4096;
                  public static final int S_IFCHR = 8192;
                  public static final int S_IFDIR = 16384;
                  public static final int S_IFBLK = 24576;
                  public static final int S_IFREG = 32768;
                  public static final int S_IFLNK = 40960;
                  public static final int S_IFSOCK = 49152;
                  public static final int S_IFWHT = 57344;
                  public static final int S_ISUID = 2048;
                  public static final int S_ISGID = 1024;
                  public static final int S_ISVTX = 512;
                  public static final int S_IRUSR = 256;
                  public static final int S_IWUSR = 128;
                  public static final int S_IXUSR = 64;
      
                  Stat(int ownerId, int groupId, int mode) {
                      this.ownerId = ownerId;
                      this.groupId = groupId;
                      this.mode = mode;
                  }
      
                  Stat(String owner, String group, int mode) {
                      if (!Shell.WINDOWS) {
                          this.owner = owner;
                      } else {
                          this.owner = NativeIO.stripDomain(owner);
                      }
      
                      if (!Shell.WINDOWS) {
                          this.group = group;
                      } else {
                          this.group = NativeIO.stripDomain(group);
                      }
      
                      this.mode = mode;
                  }
      
                  public String toString() {
                      return "Stat(owner='" + this.owner + "', group='" + this.group + "'" + ", mode=" + this.mode + ")";
                  }
      
                  public String getOwner() {
                      return this.owner;
                  }
      
                  public String getGroup() {
                      return this.group;
                  }
      
                  public int getMode() {
                      return this.mode;
                  }
              }
      
              @VisibleForTesting
              public static class NoMlockCacheManipulator extends NativeIO.POSIX.CacheManipulator {
                  public NoMlockCacheManipulator() {
                  }
      
                  public void mlock(String identifier, ByteBuffer buffer, long len) throws IOException {
                      NativeIO.POSIX.LOG.info("mlocking " + identifier);
                  }
      
                  public long getMemlockLimit() {
                      return 1125899906842624L;
                  }
      
                  public long getOperatingSystemPageSize() {
                      return 4096L;
                  }
      
                  public boolean verifyCanMlock() {
                      return true;
                  }
              }
      
              @VisibleForTesting
              public static class CacheManipulator {
                  public CacheManipulator() {
                  }
      
                  public void mlock(String identifier, ByteBuffer buffer, long len) throws IOException {
                      NativeIO.POSIX.mlock(buffer, len);
                  }
      
                  public long getMemlockLimit() {
                      return NativeIO.getMemlockLimit();
                  }
      
                  public long getOperatingSystemPageSize() {
                      return NativeIO.getOperatingSystemPageSize();
                  }
      
                  public void posixFadviseIfPossible(String identifier, FileDescriptor fd, long offset, long len, int flags) throws NativeIOException {
                      NativeIO.POSIX.posixFadviseIfPossible(identifier, fd, offset, len, flags);
                  }
      
                  public boolean verifyCanMlock() {
                      return NativeIO.isAvailable();
                  }
              }
          }
      }
      
      
      

      右键运行main方法,测试运⾏行行MapReduce程序

    • 本地计算+远程HDFS文件

      在 InitMR.java 类中加⼊入以下代码

       /**
               *  3.本地计算+远程HDFS⽂文件
               */
              TextInputFormat.addInputPath(job,new Path("hdfs://hadoop:9000/WordCount.txt"));
              TextOutputFormat.setOutputPath(job,new Path("hdfs://hadoop:9000/result2"));
      

      可能出现权限问题

      • 关闭HDFS权限检查
        • 修改hdfs-site.xml
          <property>
              <name>dfs.permissions.enabled</name>
              <value>false</value>
          </property>
          
        • 或指定虚拟机参数
      -DHADOOP_USER_NAME=root
      
  • [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-uRt8WVnz-1597307293868)(assets\1552486456439.png)]

  • 远程计算+远程HDFS⽂文件

  • 在 InitMR.java 类中加入以下代码,并将项目打成jar包 运行主函数

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    //===============================================================
    conf.set("fs.defaultFS", "hdfs://hadoop:9000/");
    conf.set("mapreduce.job.jar", "file:///E:\\训练营备课
    \\20180313_hadoop\\mr_demo\\target\\mr_demo-1.0-SNAPSHOT.jar");
    conf.set("mapreduce.framework.name", "yarn");
    conf.set("yarn.resourcemanager.hostname", "hadoop");
    conf.set("yarn.nodemanager.aux-services", "mapreduce_shuffle");
    conf.set("mapreduce.app-submission.cross-platform", "true");
    conf.set("dfs.replication", "1");
    //===============================================================
    // ......
    // MR测试⽅方法四:远程计算+远程HDFS⽂文件
    FileInputFormat.setInputPaths(job, "/user/word.txt");
    FileOutputFormat.setOutputPath(job, new Path("/user/result"));
  
}

练习实例

wordCount 统计单词出现的次数
flow 流量统计案列
自定义Writable

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值