在eclipse上面读写hdfs中的文件

最新推荐文章于 2022-09-20 17:14:02 发布

韩韩的博客

最新推荐文章于 2022-09-20 17:14:02 发布

阅读量2.6k

点赞数

分类专栏： hadoop

本文链接：https://blog.csdn.net/qq_40605167/article/details/102728236

版权

hadoop 专栏收录该内容

9 篇文章 1 订阅

订阅专栏

准备工作：
1.在创建工程之前，我们需要将hadoop的所有jar准备好。解压之后如图所示。
在这里插入图片描述
2.里面jar包非常全面，我们将所有jar包复制出来。在搜索框中输入.jar搜索。然后将所有jar包拷贝出来。

3.除了我们创建工程的jar包之外，还有source包和test包。我们分别创建_source和_test 文件夹，将里面的source包和test包剪切出来。
在这里插入图片描述

拷贝完成！

注：- xxx-sources.jar -> 是源码包，关联源码时使用。- xxx-tests.jar是测试包.
5.添加运行hadoop所需要的jar包。

6.添加单元测试Junit

7.创建包：com.sk.hadoop.hdfs创建class文件,TestHDFS.
源程序：

package com.sk.hadoop.hdfs;

import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.junit.Test;

public class TestHDFS {

	/**
	 * 读取hdfs上面的文件
	 */
	@Test
	public void readFileByJava() throws Exception {
		// hdfs协议不是通用的协议，我们需要将hdfs协议注册才可以使用
		// 注册协议 就会用到hadoop包中的类
		URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
		// hdfs://192.168.56.100:9000 代表hdfs的根目录
		URL url = new URL("hdfs://192.168.56.100:9000/file/a.txt");
		// 获取连接
		URLConnection conn = url.openConnection();
		// 通过连接对象拿到输入流
		InputStream is = conn.getInputStream();
		// 创建缓冲区 ，用来批量读取文件的内容 
		byte[] buff = new byte[1024];
		// 由于文件较小，read() 读取1024个字节
		is.read(buff);
		is.close();
		String str = new String(buff);
		System.out.println(str);
	}
	/**
	 * 使用hadoopAPI读取hdfs文件内容
	 */
	@Test
	public void readHDFSByHadoop() throws Exception {
		// 获取hdfs的相关配置
		// conf创建出来以后并是空对象，代表默认配置。
		Configuration conf = new Configuration();
		// 进行定制化配置
		// 获取的是完全分布式的文件系统
		conf.set("fs.defaultFS", "hdfs://192.168.56.100:9000");
		// 获取文件系统对象 
		FileSystem fs = FileSystem.get(conf);
		// 获取文件路径
		Path path = new Path("/file/a.txt");
		// 通过文件系统来获取输入流
		// FSDataInputStream是hadoop提供的输入流 ,可以读取hdfs中文件的内容
		// FSDataOutputStream 向hdfs中文件输出
		FSDataInputStream fis = fs.open(path);
		// available : 指的是流对象本次读取的长度 : 适用于小文件的读取
 		byte[] buff = new byte[1024];
 		// 本次读取长度
 		int length = -1 ;
 		// read方法: 读取文件内容的方法
 		//    1. read() 一个字节一个字节的读取
 		//    2. read(buff) 将会按照缓冲数组的长度进行读取
 		//  无论使用哪种方式，read方法都会返回int值 ：代表当前流所在的位置；当没有字节可读，返回-1
 		ByteArrayOutputStream baos = new ByteArrayOutputStream();
 		while ( (length = fis.read(buff)) != -1 ) {
 			// 将输入流读取的缓冲区的内容写入到输出流中
 			baos.write(buff);
 		}
 		//通过输出流中的toByteArray方法将输出流转换为一个byte数组
 		byte[] out = baos.toByteArray();
 		//将byte数组转换为字符串
 		String str = new String(out);
 		System.out.println(str);
 		baos.close();
 		fis.close();
	}
	@Test
	public void readHDFSByHadoop2() throws Exception {
		Configuration conf = new Configuration();
		conf.set("fs.defaultFS", "hdfs://192.168.56.100:9000");
		FileSystem fs = FileSystem.get(conf);
		Path path = new Path("/file/a.txt");
		FSDataInputStream fis = fs.open(path);
		// IOUtils ：hadoop提供的对io进行处理的类
		ByteArrayOutputStream baos = new ByteArrayOutputStream();
		IOUtils.copyBytes(fis, baos, 1024);
		byte[] out = baos.toByteArray();
 		String str = new String(out);
 		System.out.println(str);
 		baos.close();
 		fis.close();
	}
	/**
	 * 注意： 在使用方法之前应该在path指定的目录的父级目录进行授权
	 * 	 -> 777  
	 */
	@Test
	public void writeHdfs() throws Exception {
		Configuration conf = new Configuration();
		conf.set("fs.defaultFS", "hdfs://192.168.56.100:9000");
		FileSystem fs = FileSystem.get(conf);
		FSDataOutputStream fos = fs.create(new Path("/file/hello.txt"));
		String txt = "hello world!";
		fos.writeUTF(txt);
		fos.close();
	}
}

韩韩的博客

关注

0
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
在eclipse上面读写hdfs中的文件

准备工作：1.在创建工程之前，我们需要将hadoop的所有jar准备好。解压之后如图所示。2.里面jar包非常全面，我们将所有jar包复制出来。在搜索框中输入.jar搜索。然后将所有jar包拷贝出来。3.除了我们创建工程的jar包之外，还有source包和test包。我们分别创建_source和_test 文件夹，将里面的source包和test包剪切出来。拷贝完成！注：- x...
复制链接

扫一扫

专栏目录