用API实现对HDFS的操作（1）Hadoop集群的HDFS客户端环境准备

最新推荐文章于 2022-08-15 19:42:57 发布

白T

最新推荐文章于 2022-08-15 19:42:57 发布

阅读量408

点赞数 1

本文链接：https://blog.csdn.net/zytmaster/article/details/102646062

版权

jar包准备：

1.解压 hadoop-2.7.6.tar.gz 到非中文目录

2.进入 share 文件夹，查找所有 jar 包，并把 jar 包拷贝到_lib 文件夹(新建的，可与解压出的hadoop-2.7.6.tar.gz放到同一个文件夹下，方便管理）下

3.在全部 jar 包中查找 sources.jar，并剪切到_source 文件夹(新建的，可与解压出的hadoop-2.7.6.tar.gz放到同一个文件夹下，方便管理）。

4.在全部 jar 包中查找 tests.jar，并剪切到_test 文件夹(新建的，可与解压出的hadoop-2.7.6.tar.gz放到同一个文件夹下，方便管理）。

Windows环境准备：

1.根据自己电脑的操作系统选择对应的编译后的Hadoop（主要是bin目录hadoop2.7.2_win10的配置，其他的不重要），拷贝到非中文目录（例如：D:\Hadoop\hadoop-2.7.2）

2.配置HADOOP_HOME环境变量

新建变量：HADOOP_HOME 变量值：hadoop2.7.2_win10的家目录

变量：Path 变量值：%HADOOP_HOME%\bin

注意：检验是否搭建好环境，需要等到运行HDFS的API时才能看到，这里的环境只是搭建了一部分，并没有在Windows系统上搭建完整的Hadoop环境！！！

写HDFS的API实现对HDFS的操作

新建一个Java工程HdfsClientDemo1

在该项目下创建lib文件夹，然后添加jar包

创建包名com.zyt.hdfs

创建HdfsClient类，代码内容如下：

注意：代码中有用到注解@Test，这需要导入junit，操作如下：右击项目名称-->Properties-->Java Build Path-->Libraries-->Add Library-->JUnit-->Finish

package hdfs_client_demo1;

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.BlockLocation;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import org.junit.Test;

public class HdfsClient {

	public static void main(String[] args) throws IOException, InterruptedException, URISyntaxException {
		//1 获取文件系统
		Configuration conf = new Configuration();
//		conf.set("fs.defaultFS", "hdfs://hadoop101:9000");
//		FileSystem fs = FileSystem.get(conf);
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop101:9000"), conf, "hadoop");

		//2 上传文件
		fs.copyFromLocalFile(new Path("D:\\hello.txt"), new Path("/hello.txt"));
		
		//3 关闭连接
		fs.close();
		System.out.println("over");
	}
	
	/**
	 * 获取文件系统
	 * @throws IOException
	 * @throws InterruptedException
	 * @throws URISyntaxException
	 */
	@Test
	public void initHDFS() throws IOException, InterruptedException, URISyntaxException{
		//获取文件系统
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop101:9000"), conf, "hadoop");
		System.out.println(fs.toString());
		
	}
	
	/**
	 * 上传文件
	 * @throws URISyntaxException 
	 * @throws InterruptedException 
	 * @throws IOException 
	 */
	@Test
	public void testCopyFromLocalFile() throws IOException, InterruptedException, URISyntaxException{
		//1 获取文件系统
		Configuration conf = new Configuration();
		conf.set("dfs.replication", "2");
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop101:9000"), conf, "hadoop");
		//2 上传文件
		fs.copyFromLocalFile(new Path("D:\\hello.txt"), new Path("/user/hadoop/test/hello8.txt"));
		//3 关闭连接
		fs.close();
	}

	/**
	 * 文件下载
	 * @throws URISyntaxException 
	 * @throws InterruptedException 
	 * @throws IOException 
	 */
	@Test
	public void testCopyToLocalFile() throws IOException, InterruptedException, URISyntaxException{
		//1 获取文件系统
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop101:9000"), conf, "hadoop");
		//2 下载文件
		
		// boolean delSrc 指是否将原文件删除
		// Path src 指要下载的文件路径
		// Path dst 指将文件下载到的路径
		// boolean useRawLocalFileSystem 是否开启文件效验
		fs.copyToLocalFile(false, new Path("/user/hadoop/test/lianxi.txt"), new Path("d:\\lianxi.txt"), true);
		//3 关闭连接
		fs.close();
	}
	
	/**
	 * 创建目录
	 * @throws URISyntaxException 
	 * @throws InterruptedException 
	 * @throws IOException 
	 */
	@Test
	public void testMkdirs() throws IOException, InterruptedException, URISyntaxException{
		//1 获取文件系统
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop101:9000"), conf, "hadoop");
		//2 创建目录
		fs.mkdirs(new Path("/xmbzdx/1701"));
		//3 关闭连接
		fs.close();
	}
	
	/**
	 * 删除文件或文件夾
	 * @throws URISyntaxException 
	 * @throws InterruptedException 
	 * @throws IOException 
	 */
	@Test
	public void testDelete() throws IOException, InterruptedException, URISyntaxException{
		//1 获取文件系统
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop101:9000"), conf, "hadoop");
		//2 刪除目录
//		boolean b = fs.delete(new Path("/user/bigdata"),true);
		boolean b = fs.delete(new Path("/user/hadoop/test/hello2.txt"),true);
		System.out.println(b);
		//3 关闭连接
		fs.close();
	}
	
	/**
	 * 修改文件名
	 * @throws URISyntaxException 
	 * @throws InterruptedException 
	 * @throws IOException 
	 */
	@Test
	public void testRename() throws IOException, InterruptedException, URISyntaxException{
		//1 获取文件系统
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop101:9000"), conf, "hadoop");
		//2 修改文件名
		fs.rename(new Path("/user/hadoop/test/hello.txt"), new Path("/user/hadoop/test/hello9.txt"));
		//3 关闭连接
		fs.close();
	}
	
	/**
	 * 查看文件详情
	 * @throws URISyntaxException 
	 * @throws InterruptedException 
	 * @throws IOException 
	 * 
	 */
	@Test
	public void testListFiles() throws IOException, InterruptedException, URISyntaxException{
		//1 获取文件系统
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop101:9000"), conf, "hadoop");
		//2 获取文件详情
		//思考：为什么不返回一个List集合，而是返回一个迭代器
		RemoteIterator<LocatedFileStatus> files = fs.listFiles(new Path("/"), true);
		while(files.hasNext()){
			LocatedFileStatus file = files.next();
			//获取文件名称
			System.out.println(file.getPath().getName());
			//文件长度
			System.out.println(file.getLen());
			//所有者
			System.out.println(file.getOwner());
			//组信息
			System.out.println(file.getGroup());
			//权限信息
			System.out.println(file.getPermission());
			//副本信息
			System.out.println(file.getReplication());
			//获取块信息
			BlockLocation[] blocks = file.getBlockLocations();
			for(BlockLocation block : blocks){
				//获取主机名信息
				String[] hosts = block.getHosts();
				for(String host : hosts){
					System.out.println(host);
				}
			}
			System.out.println("-----------------------------");
		}
		
		//3 关闭连接
		fs.close();
	}
	
	/*
	 * 判断文件还是文件夹
	 */
	@Test
	public void testListStatus() throws IOException, InterruptedException, URISyntaxException{
		//1 获取文件系统
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop101:9000"), conf, "hadoop");
		//2 判断文件还是文件夹
		FileStatus[] listStatus = fs.listStatus(new Path("/"));
		for(FileStatus status : listStatus){
			//判断是否是目录
			if(status.isDir()){
				System.out.println("d:"+status.getPath().getName());
			}
			//判断是否是文件
			if(status.isFile()){
				System.out.println("f:"+status.getPath().getName());
			}
		}
		//3 关闭连接
		fs.close();
	}
}

运行行程序

运行时需要配置用户名称。客户端去操作 hdfs 时，是有一个用户身份的。默认情况下，hdfs 客户端 api 会从 jvm 中获取一个参数来作为自己的用户身份：

-DHADOOP_USER_NAME=hadoop，hadoop为用户名称。

注意：如果 eclipse 打印不出日志，在控制台上只显示

1.log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).

2.log4j:WARN Please initialize the log4j system properly.

3.log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

需要在项目下的src目录下，新建一个文件，命名为log4j.properties，在文件中填入

log4j.rootLogger=INFO, stdout 
log4j.appender.stdout=org.apache.log4j.ConsoleAppender 
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n 
log4j.appender.logfile=org.apache.log4j.FileAppender 
log4j.appender.logfile.File=target/spring.log 
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout 
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

白T

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
用API实现对HDFS的操作（1）Hadoop集群的HDFS客户端环境准备

jar包准备： 1.解压 hadoop-2.7.6.tar.gz 到非中文目录 2.进入 share 文件夹，查找所有 jar 包，并把 jar 包拷贝到_lib 文件夹(新建的，可与解压出的hadoop-2.7.6.tar.gz放到同一个文件夹下，方便管理）下 3.在全部 jar 包中查找 sources.jar，并剪切到_source 文件夹(新建的，可与解压出的...
复制链接

扫一扫