上面我们学的API操作HDFS系统都是框架封装好的。那么如果我们想自己实现上述API的操作该怎么实现呢?
我们可以采用IO流的方式实现数据的上传和下载。
1 HDFS文件上传
1.需求:把本地d盘上的xiaoyue.txt文件上传到HDFS根目录
2.编写代码
//需求:把本地d盘上的xiaoyue.txt文件上传到HDFS根目录
@Test
public void putFileToHDFS() throws URISyntaxException, IOException, InterruptedException {
//1.获取对象
System.setProperty("hadoop.home.dir","D:\\hadoop\\hadoop-2.7.2" );
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://192.168.88.102:9000"), conf, "hadoop");
//2.获取输入流
FileInputStream fis = new FileInputStream(new File("d:/xiaoyue.txt"));
//3.获取输出流
FSDataOutputStream fos = fs.create(new Path("/lin.txt"));
//4.流的对拷
IOUtils.copyBytes(fis,fos,conf);
//5.关闭资源
IOUtils.closeStream(fos);
IOUtils.closeStream(fis);
fs.close();
}
2 HDFS文件下载
1.需求:从HDFS上下载lin.txt文件到本地d盘上
2.编写代码
//从HDFS上下载lin.txt文件到本地d盘上
@Test
public void getFileFromHDFS() throws URISyntaxException, IOException, InterruptedException {
//1.获取对象
System.setProperty("hadoop.home.dir","D:\\hadoop\\hadoop-2.7.2" );
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://192.168.88.102:9000"), conf, "hadoop");
//2.获取输入流
FSDataInputStream fis = fs.open(new Path("/lin.txt"));
//3.获取输出流
final FileOutputStream fos = new FileOutputStream(new File("D:/linlin.txt"));
//4.流的对拷
IOUtils.copyBytes(fis,fos,conf);
//5.关闭资源
IOUtils.closeStream(fis);
IOUtils.closeStream(fos);
fs.close();
}
3 定位文件读取
1.需求:分块读取HDFS上的大文件,比如根目录下的/hadoop-2.7.2.tar.gz
2.编写代码
(1)下载第一块
[hadoop@hadoop102 hadoop-2.7.2]$ hadoop fs -put /opt/software/hadoop-2.7.2.tar.gz /
//(1)下载第一块
@Test
public void readFileSeek1() throws URISyntaxException, IOException, InterruptedException {
//1.获取对象
System.setProperty("hadoop.home.dir","D:\\hadoop\\hadoop-2.7.2" );
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://192.168.88.102:9000"), conf, "hadoop");
//2.获取输入流
FSDataInputStream fis = fs.open(new Path("/hadoop-2.7.2.tar.gz"));
//3.获取输出流
FileOutputStream fos = new FileOutputStream(new File("d:/hadoop-2.7.2.tar.gz.part1"));
//4.流的拷贝(之拷贝128M)
byte[] buf = new byte[1024];
for (int i = 0; i < 1024*128; i++) {
fis.read(buf);
fos.write(buf);
}
//5. 关闭资源
IOUtils.closeStream(fos);
IOUtils.closeStream(fis);
fs.close();
}
)
// (2)下载第二块
@Test
public void readFileSeek2() throws URISyntaxException, IOException, InterruptedException {
//1.获取对象
System.setProperty("hadoop.home.dir","D:\\hadoop\\hadoop-2.7.2" );
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://192.168.88.102:9000"), conf, "hadoop");
//2.获取输入流
FSDataInputStream fis = fs.open(new Path("/hadoop-2.7.2.tar.gz"));
//3 设置指定读取的起点
fis.seek(1024*1024*128);
//4.获取输出流
FileOutputStream fos = new FileOutputStream(new File("d:/hadoop-2.7.2.tar.gz.part2"));
//5. 流的对拷
IOUtils.copyBytes(fis,fos,conf);
//6.关闭资源
IOUtils.closeStream(fos);
IOUtils.closeStream(fis);
fs.close();
}
把一二块合并,发现还是还是原来的压缩包的压缩包。
D:\>type hadoop-2.7.2.tar.gz.part2 >> hadoop-2.7.2.tar.gz.part1
几百本常用电子书免费领取:https://github.com/XiangLinPro/IT_book