hive任务中MoveTask的策略

          Hive版本:1.2.1

        hive将sql转换为mapreduce任务,最后结束的时候可能涉及到将临时文件存到目标目录中。

        hive1.2.1版本临时文件有可能是存放在当前的hive工作空间中。

         例如:

         insert overwrite table  temp.test select id,name from dwd.tk ;

        会在test表的目录下生成一个session目录,将结果写入到这个目录下,最后将结果文件copy到目标目录下。

         Hive中负责移动文件的操作的类是MoveTask类。现在来介绍一下具体的策略。

         

  private void moveFile(Path sourcePath, Path targetPath, boolean isDfsDir)
      throws Exception {
    FileSystem fs = sourcePath.getFileSystem(conf);
    if (isDfsDir) {
      // Just do a rename on the URIs, they belong to the same FS
      String mesg = "Moving data to: " + targetPath.toString();
      String mesg_detail = " from " + sourcePath.toString();
      console.printInfo(mesg, mesg_detail);

      // if source exists, rename. Otherwise, create a empty directory
      if (fs.exists(sourcePath)) {
        Path deletePath = null;
        // If it multiple level of folder are there fs.rename is failing so first
        // create the targetpath.getParent() if it not exist
        if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_INSERT_INTO_MULTILEVEL_DIRS)) {
          deletePath = createTargetPath(targetPath, fs);
        }
        if (!Hive.moveFile(conf, sourcePath, targetPath, fs, true, false)) {
          try {
            if (deletePath != null) {
              fs.delete(deletePath, true);
            }
          } catch (IOException e) {
            LOG.info("Unable to delete the path created for facilitating rename"
                + deletePath);
          }
          throw new HiveException("Unable to rename: " + sourcePath
              + " to: " + targetPath);
        }
      } else if (!fs.mkdirs(targetPath)) {
        throw new HiveException("Unable to make directory: " + targetPath);
      }
    } else {
      // This is a local file
      String mesg = "Copying data to local directory " + targetPath.toString();
      String mesg_detail = " from " + sourcePath.toString();
      console.printInfo(mesg, mesg_detail);

      // delete the existing dest directory
      LocalFileSystem dstFs = FileSystem.getLocal(conf);

      if (dstFs.delete(targetPath, true) || !dstFs.exists(targetPath)) {
        console.printInfo(mesg, mesg_detail);
        // if source exists, rename. Otherwise, create a empty directory
        if (fs.exists(sourcePath)) {
          fs.copyToLocalFile(sourcePath, targetPath);
        } else {
          if (!dstFs.mkdirs(targetPath)) {
            throw new HiveException("Unable to make local directory: "
                + targetPath);
          }
        }
      } else {
        throw new AccessControlException(
            "Unable to delete the existing destination directory: "
            + targetPath);
      }
    }
  }
   

可以看出具体执行的是Hive.moveFile方法.代码如下:

  //it is assumed that parent directory of the destf should already exist when this
  //method is called. when the replace value is true, this method works a little different
  //from mv command if the destf is a directory, it replaces the destf instead of moving under
  //the destf. in this case, the replaced destf still preserves the original destf's permission
  public static boolean moveFile(HiveConf conf, Path srcf, Path destf,
      FileSystem fs, boolean replace, boolean isSrcLocal) throws HiveException {
    boolean success = false;

    //needed for perm inheritance.
    boolean inheritPerms = HiveConf.getBoolVar(conf,
        HiveConf.ConfVars.HIVE_WAREHOUSE_SUBDIR_INHERIT_PERMS);
    HadoopShims shims = ShimLoader.getHadoopShims();
    HadoopShims.HdfsFileStatus destStatus = null;
    HadoopShims.HdfsEncryptionShim hdfsEncryptionShim = SessionState.get().getHdfsEncryptionShim();

    // If source path is a subdirectory of the destination path:
    //   ex: INSERT OVERWRITE DIRECTORY 'target/warehouse/dest4.out' SELECT src.value WHERE src.key >= 300;
    //   where the staging directory is a subdirectory of the destination directory
    // (1) Do not delete the dest dir before doing the move operation.
    // (2) It is assumed that subdir and dir are in same encryption zone.
    // (3) Move individual files from scr dir to dest dir.
    boolean destIsSubDir = isSubDir(srcf, destf, fs, isSrcLocal);
    try {
      if (inheritPerms || replace) {
        try{
          destStatus = shims.getFullFileStatus(conf, fs, destf.getParent());
          //if destf is an existing directory:
          //if replace is true, delete followed by rename(mv) is equivalent to replace
          //if replace is false, rename (mv) actually move the src under dest dir
          //if destf is an existing file, rename is actually a replace, and do not need
          // to delete the file first
          if (replace && !destIsSubDir) {
            LOG.debug("The path " + destf.toString() + " is deleted");
            fs.delete(destf, true);
          }
        } catch (FileNotFoundException ignore) {
          //if dest dir does not exist, any re
          if (inheritPerms) {
            destStatus = shims.getFullFileStatus(conf, fs, destf.getParent());
          }
        }
      }
      if (!isSrcLocal) {
        // For NOT local src file, rename the file
        if (hdfsEncryptionShim != null && (hdfsEncryptionShim.isPathEncrypted(srcf) || hdfsEncryptionShim.isPathEncrypted(destf))
            && !hdfsEncryptionShim.arePathsOnSameEncryptionZone(srcf, destf))
        {
          LOG.info("Copying source " + srcf + " to " + destf + " because HDFS encryption zones are different.");
          success = FileUtils.copy(srcf.getFileSystem(conf), srcf, destf.getFileSystem(conf), destf,
              true,    // delete source
              replace, // overwrite destination
              conf);
        } else {
          if (destIsSubDir) {
            FileStatus[] srcs = fs.listStatus(srcf, FileUtils.HIDDEN_FILES_PATH_FILTER);
            if (srcs.length == 0) {
              success = true; // Nothing to move.
            }
            for (FileStatus status : srcs) {
              success = FileUtils.copy(srcf.getFileSystem(conf), status.getPath(), destf.getFileSystem(conf), destf,
                  true,     // delete source
                  replace,  // overwrite destination
                  conf);

              if (!success) {
                throw new HiveException("Unable to move source " + status.getPath() + " to destination " + destf);
              }
            }
          } else {
            success = fs.rename(srcf, destf);
          }
        }
      } else {
        // For local src file, copy to hdfs
        fs.copyFromLocalFile(srcf, destf);
        success = true;
      }

      LOG.info((replace ? "Replacing src:" : "Renaming src: ") + srcf.toString()
          + ", dest: " + destf.toString()  + ", Status:" + success);
    } catch (IOException ioe) {
      throw new HiveException("Unable to move source " + srcf + " to destination " + destf, ioe);
    }

    if (success && inheritPerms) {
      try {
        ShimLoader.getHadoopShims().setFullFileStatus(conf, destStatus, fs, destf);
      } catch (IOException e) {
        LOG.warn("Error setting permission of file " + destf + ": "+ e.getMessage(), e);
      }
    }
    return success;
  }

 

           从山述代码中可以看出,

          1.原文件是非hdfs文件,copyFromLocal
          2.原文件是hdfs文件
          2.1   Encrypted模式
                copy操作,如果文件大于默认值(32MB),则会进行distcp操作。
         2.2  非Encrypted模式
            (1)原目录是目标目录的子目录,原目录下的每个文件进行copy操作,如果文件大于默认值(32MB),则会进行distcp操作。
            (2)其他情况,进行mv操作

          因此,通过阅读代码就可以发现,在hive1.2版本中,当原目录是目标目录的子目录时,会对所有文件循环进行copy或者distcp操作。这会导致磁盘io以及网络流量,浪费大量的时间,导致任务变的缓满。
         解决办法:
        将   hive1.2.1默认的参数:
      <property>
            <name>hive.exec.stagingdir</name>
            <value>.hive-staging</value>
  </property>  
        修改为
      <property>
         <name>hive.exec.stagingdir</name>
         <value>/tmp/hive/.hive-staging</value>
    </property>
       这样的话,就可以直接mv过去而不产生新的io流量了。

               

           

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值