Hadoop--分布式文件系统HDFS--Java客户端调用

启动HDFS集群,集群示意图如下:hadoop-01作为NameNode同时也作为DataNode   hadoop-02作为DataNode

              

使用Java客户端来对HDFS文件系统进行操作管理,首先建立工程,导入所需Jar包,主要包括hadoop/share/hadoop目录下common包以及hdfs包下的jar包。

使用Java客户端主要使用FileSystem对象来对HDFS文件系统进行操作。

1.获取Configuration对象来进行客户端的必要配置

FileSystem fs = null;

	@Before 
	public void init() throws Exception {
		Configuration conf = new Configuration();
		//副本数量
		conf.set("dfs.replication", "2");
		//块大小
		conf.set("dfs.blocksize", "64m");
		fs = FileSystem.get(new URI("hdfs://hadoop-01:9000/"), conf, "root");

	}

Configuration对象加载配置文件的顺序,hadoop-site.xml(已过时) core-default.xml  core-site.xml以及set方法配置的选项来获取最终配置对象。

有关HDFS的默认参数配置地址如下:

http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

Configuration类初始化静态块源码

static{
    //print deprecation warning if hadoop-site.xml is found in classpath
    ClassLoader cL = Thread.currentThread().getContextClassLoader();
    if (cL == null) {
      cL = Configuration.class.getClassLoader();
    }
    if(cL.getResource("hadoop-site.xml")!=null) {
      LOG.warn("DEPRECATED: hadoop-site.xml found in the classpath. " +
          "Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, "
          + "mapred-site.xml and hdfs-site.xml to override properties of " +
          "core-default.xml, mapred-default.xml and hdfs-default.xml " +
          "respectively");
    }
    addDefaultResource("core-default.xml");
    addDefaultResource("core-site.xml");
  }

2.通过get方法获取FileSystem对象

URI传入NameNode的RPC通信地址,Conf配置对象,User为HDFS用户

FileSystem fs = FileSystem.get(new URI("hdfs://hadoop-01:9000/"), conf, "root");

3.使用FileSystem API来进行文件系统的管理操作

将本地文件上传至HDFS中

/**
	* 函数用途描述: 向HDFS中上传文件
	* @param: variable
	* @return:
	*/
	@Test
	public void testPut() throws Exception{
		// 上传一个文件到HDFS中
		fs.copyFromLocalFile(new Path("F:/hadoop-2.8.1/file.txt"), new Path("/chen/wen"));
		System.out.println("put success");
		fs.close();
	}

将HDFS中文件下载到本地

/**
	 * 从HDFS中下载文件到客户端本地磁盘
	 * 
	 * @throws IOException
	 * @throws IllegalArgumentException
	 */
	@Test
	public void testGet() throws IllegalArgumentException, IOException {
		fs.copyToLocalFile(new Path("/chen/wen/spring-data-jpa-reference-documentation.pdf"), new Path("F:/"));
		fs.close();

	}

在HDFS中移动文件

/**
	 * 在hdfs内部移动文件\修改名称
	 */
	@Test
	public void testRename() throws Exception {
		fs.rename(new Path("/chen/wen/spring-data-jpa-reference-documentation.pdf"), new Path("/chen/spring-data-jpa-reference-documentation.pdf"));
		fs.close();
	}

在HDFS创建文件/夹  删除文件/夹

/**
	 * 在hdfs中创建文件夹
	 */
	@Test
	public void testMkdir() throws Exception {
		fs.mkdirs(new Path("/chen/wen/kkk"));
		fs.close();
	}

	/**
	 * 在hdfs中删除文件或文件夹
	 */
	@Test
	public void testRm() throws Exception {
		fs.delete(new Path("/chen"), true);
		fs.close();
	}

查询指定目录下的文件信息

/**
	 * 查询hdfs指定目录下的文件信息
	 */
	@Test
	public void testLs() throws Exception {
		// 只查询文件的信息,不返回文件夹的信息
		RemoteIterator<LocatedFileStatus> iter = fs.listFiles(new Path("/"), true);

		while (iter.hasNext()) {
			LocatedFileStatus status = iter.next();
			System.out.println("文件全路径:" + status.getPath());
			System.out.println("块大小:" + status.getBlockSize());
			System.out.println("文件长度:" + status.getLen());
			System.out.println("副本数量:" + status.getReplication());
			System.out.println("块信息:" + Arrays.toString(status.getBlockLocations()));
			System.out.println("--------------------------------");
		}
		fs.close();
	}

查看指定目录下的文件以及文件夹信息

/**
	 * 查询hdfs指定目录下的文件和文件夹信息
	 */
	@Test
	public void testLs2() throws Exception {
		FileStatus[] listStatus = fs.listStatus(new Path("/"));

		for (FileStatus status : listStatus) {
			System.out.println("文件全路径:" + status.getPath());
			System.out.println(status.isDirectory() ? "这是文件夹" : "这是文件");
			System.out.println("块大小:" + status.getBlockSize());
			System.out.println("文件长度:" + status.getLen());
			System.out.println("副本数量:" + status.getReplication());

			System.out.println("--------------------------------");
		}
		fs.close();
	}

利用输入流读取HDFS中文件内容

/**
	 * 利用流读取hdfs中的文件的内容
	 * 
	 * @throws IOException
	 * @throws IllegalArgumentException
	 */
	@Test
	public void testReadData() throws IllegalArgumentException, IOException {
		FSDataInputStream in = fs.open(new Path("/chen/wen/file.txt"));
		BufferedReader br = new BufferedReader(new InputStreamReader(in, "utf-8"));
		String line = null;
		while ((line = br.readLine()) != null) {
			System.out.println(line);
		}
		br.close();
		in.close();
		fs.close();

	}

利用输入流读取指定起始位置指定偏移量的HDFS内容

/**
	 * 利用流读取hdfs中文件的指定偏移量范围的内容
	 * @throws IOException
	 * @throws IllegalArgumentException
	 */
	@Test
	public void testRandomReadData() throws IllegalArgumentException, IOException {
		FSDataInputStream in = fs.open(new Path("/chen/wen/file.txt"));
		// 将读取的起始位置进行指定
		in.seek(10);
		// 读100个字节
		byte[] buf = new byte[100];
		in.read(buf);
		System.out.println(new String(buf));	
		in.close();
		fs.close();

	}

利用输出流往HDFS中文件写入内容

/**
	 * 利用流往hdfs中的文件写内容
	 * 
	 * @throws IOException
	 * @throws IllegalArgumentException
	 */
	@Test
	public void testWriteData() throws IllegalArgumentException, IOException {
		FSDataOutputStream out = fs.create(new Path("/chen/wen/test.xml"), false);
		FileInputStream in = new FileInputStream("F:/settings.xml");
		byte[] buf = new byte[1024];
		int read = 0;
		while ((read = in.read(buf)) != -1) {
			out.write(buf,0,read);
		}	
		in.close();
		out.close();
		fs.close();

	}

4.HDFS上传文件源码解析

fs.copyFromLocalFile(new Path("F:/hadoop-2.8.1/file.txt"), new Path("/chen/wen"));

依次进入解析:

/**
   * The src file is on the local disk.  Add it to FS at
   * the given dst name and the source is kept intact afterwards
   * @param src path
   * @param dst path
   */
  public void copyFromLocalFile(Path src, Path dst)
    throws IOException {
    copyFromLocalFile(false, src, dst);
  }
 public void copyFromLocalFile(boolean delSrc, Path src, Path dst)
    throws IOException {
    copyFromLocalFile(delSrc, true, src, dst);
  }
/**
   * The src file is on the local disk.  Add it to FS at
   * the given dst name.
   * delSrc indicates if the source should be removed
   * @param delSrc whether to delete the src
   * @param overwrite whether to overwrite an existing file
   * @param src path
   * @param dst path
   */
  public void copyFromLocalFile(boolean delSrc, boolean overwrite, 
                                Path src, Path dst)
    throws IOException {
    Configuration conf = getConf();
    FileUtil.copy(getLocal(conf), src, this, dst, delSrc, overwrite, conf);
  }

使用FileUtil copy方法来实现文件上传

/** Copy files between FileSystems. */
  public static boolean copy(FileSystem srcFS, Path src, 
                             FileSystem dstFS, Path dst, 
                             boolean deleteSource,
                             boolean overwrite,
                             Configuration conf) throws IOException {
    FileStatus fileStatus = srcFS.getFileStatus(src);
    return copy(srcFS, fileStatus, dstFS, dst, deleteSource, overwrite, conf);
  }

核心copy方法

/** Copy files between FileSystems. */
  public static boolean copy(FileSystem srcFS, FileStatus srcStatus,
                             FileSystem dstFS, Path dst,
                             boolean deleteSource,
                             boolean overwrite,
                             Configuration conf) throws IOException {
    Path src = srcStatus.getPath();
    dst = checkDest(src.getName(), dstFS, dst, overwrite);
    if (srcStatus.isDirectory()) {
      checkDependencies(srcFS, src, dstFS, dst);
      if (!dstFS.mkdirs(dst)) {
        return false;
      }
      FileStatus contents[] = srcFS.listStatus(src);
      for (int i = 0; i < contents.length; i++) {
        copy(srcFS, contents[i], dstFS,
             new Path(dst, contents[i].getPath().getName()),
             deleteSource, overwrite, conf);
      }
    } else {
      InputStream in=null;
      OutputStream out = null;
      try {
        in = srcFS.open(src);
        out = dstFS.create(dst, overwrite);
        IOUtils.copyBytes(in, out, conf, true);
      } catch (IOException e) {
        IOUtils.closeStream(out);
        IOUtils.closeStream(in);
        throw e;
      }
    }
    if (deleteSource) {
      return srcFS.delete(src, true);
    } else {
      return true;
    }
  
  }

如果带上传文件为目录,则递归调用copy实现上传;如果不是目录则直接使用IOUtils copyBytes方法来实现上传。

输入流来读取待上传文件

 /**
   * Opens an FSDataInputStream at the indicated Path.
   * @param f the file to open
   */
  public FSDataInputStream open(Path f) throws IOException {
    return open(f, getConf().getInt("io.file.buffer.size", 4096));
  }
/**
   * Opens an FSDataInputStream at the indicated Path.
   * @param f the file name to open
   * @param bufferSize the size of the buffer to be used.
   */
  public abstract FSDataInputStream open(Path f, int bufferSize)
    throws IOException;

利用输出流来传输文件至HDFS,输出流的构造实现了文件的切分分块

/**
   * Create an FSDataOutputStream at the indicated Path.
   * @param f the file to create
   * @param overwrite if a file with this name already exists, then if true,
   *   the file will be overwritten, and if false an exception will be thrown.
   */
  public FSDataOutputStream create(Path f, boolean overwrite)
      throws IOException {
    return create(f, overwrite, 
                  getConf().getInt("io.file.buffer.size", 4096),
                  getDefaultReplication(f),
                  getDefaultBlockSize(f));
  }
/**
   * Create an FSDataOutputStream at the indicated Path.
   * @param f the file name to open
   * @param overwrite if a file with this name already exists, then if true,
   *   the file will be overwritten, and if false an error will be thrown.
   * @param bufferSize the size of the buffer to be used.
   * @param replication required block replication for the file. 
   */
  public FSDataOutputStream create(Path f, 
                                   boolean overwrite,
                                   int bufferSize,
                                   short replication,
                                   long blockSize
                                   ) throws IOException {
    return create(f, overwrite, bufferSize, replication, blockSize, null);
  }
/**
   * Create an FSDataOutputStream at the indicated Path with write-progress
   * reporting.
   * @param f the file name to open
   * @param overwrite if a file with this name already exists, then if true,
   *   the file will be overwritten, and if false an error will be thrown.
   * @param bufferSize the size of the buffer to be used.
   * @param replication required block replication for the file. 
   */
  public FSDataOutputStream create(Path f,
                                            boolean overwrite,
                                            int bufferSize,
                                            short replication,
                                            long blockSize,
                                            Progressable progress
                                            ) throws IOException {
    return this.create(f, FsPermission.getFileDefault().applyUMask(
        FsPermission.getUMask(getConf())), overwrite, bufferSize,
        replication, blockSize, progress);
  }

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
利用JPA做“公共黑板”,解决了数据挖掘中hadoop的子任务无法共享数据的问题,提出了树型结构的高效算法。具体实现了kdtree的hadoop版本。 代码可以在http://svn.javaforge.com/svn/hadoopjpa/HadoopDataMining check out. 需要先注册;如果不能成功,换小写地址。 下面是ris格式的引文,存盘后可为endnote等文献管理软件导入。 TY - CHAP AU - Lai, Yang AU - ZhongZhi, Shi A2 - Shi, Zhongzhi A2 - Vadera, Sunil A2 - Aamodt, Agnar A2 - Leake, David T1 - An Efficient Data Indexing Approach on Hadoop Using Java Persistence API T2 - Intelligent Information Processing V T3 - IFIP Advances in Information and Communication Technology PY - 2010 PB - Springer Boston SN - SP - 213 EP - 224 VL - 340 UR - http://dx.doi.org/10.1007/978-3-642-16327-2_27 DO - 10.1007/978-3-642-16327-2_27 AB - Data indexing is common in data mining when working with high-dimensional, large-scale data sets. Hadoop, a cloud computing project using the MapReduce framework in Java, has become of significant interest in distributed data mining. To resolve problems of globalization, random-write and duration in Hadoop, a data indexing approach on Hadoop using the Java Persistence API (JPA) is elaborated in the implementation of a KD-tree algorithm on Hadoop. An improved intersection algorithm for distributed data indexing on Hadoop is proposed, it performs O(M+logN), and is suitable for occasions of multiple intersections. We compare the data indexing algorithm on open dataset and synthetic dataset in a modest cloud environment. The results show the algorithms are feasible in large-scale data mining. ER -

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值