HDFS操作

最新推荐文章于 2023-05-21 22:43:32 发布

寂寞烟

最新推荐文章于 2023-05-21 22:43:32 发布

阅读量687

点赞数

分类专栏： HDFS 文章标签： hdfs操作

HDFS 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

利用HDFS的JavaAPI编程

HDFS进行创建、删除、查询等操作。

一、之前的一篇里有提到如何创建文件，这里简单再说一下代码：

View Code JAVA

1
2
3

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
fs.create(new Path(hdfsPath));

create方法有多种重载，详细情况看API文档。

二、创建目录的样例如下：

View Code JAVA

1
2
3

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
fs.mkdirs(new Path(hdfsPath));

mkdirs方法有多种重载，详细情况看API文档。和上边的create方法一样，都会根据path建立相应的文件或目录，如果父级目录不存在，则自动创建。如果这并非你所期望的，需要先对路径中的各级目录进行判断。

三、检查目录或文件是否存在：

View Code JAVA

1
2
3

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
fs.exists(new Path(hdfsPath));

四、查看文件系统中文件元数据,包含文件长度、块大小、备份、修改时间、所有者以及权限信息：

View Code JAVA

public class getStatus {
 
	public static void main(String[] args) throws Exception {
 
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(conf);
		FileStatus stat = fs.getFileStatus(new Path(args[0]));
		System.out.print(stat.getAccessTime()+" "+stat.getBlockSize()+" "+stat.getGroup()
				+" "+stat.getLen()+" "+stat.getModificationTime()+" "+stat.getOwner()
				+" "+stat.getReplication()+" "+stat.getPermission()
				);
	}
}

FileStatus有一个isDir()方法，能够判断是否为目录或是否存在，如果判断是否存在使用exists方法比较方便。

五、查看目录列表:

View Code JAVA

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;
 
public class getPaths {
 
	public static void main(String[] args) throws Exception {
 
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(conf);
		FileStatus[] statu = fs.listStatus(new Path(args[0]));
		Path [] listPaths=FileUtil.stat2Paths(statu);
		for(Path p:listPaths){
			System.out.println(p);
		}
	}
}

主要使用的是FileSystem对象的listStatus()方法，有多个重载，可以传入一个Paht数组，同时查询多个给的路径。如果需要查询子目录的路径，需要另行写一个函数做递归调用，比较简单就不再另外写了。

六、删除文件和目录：
使用的是FileSystem对象的delete(Path f,boolean recursive)方法，布尔值设置为true时，才会删除一个目录。

七、文件模式。细心的可能已经尝试过了，以上的一些程序是不适用*、[]等通配符的传参的。FileSystem对象提供有globStatus()方法可以接受含有通配符的参数。

View Code JAVA

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathFilter;
 
public class pathFilter implements PathFilter{
		private final String regex;
		public pathFilter (String regex){
			this.regex=regex;
		}
		public boolean accept(Path path) {
			return !path.toString().matches(regex);
		}
	}
//---------------------------------------------
public class regxList{
 
	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(conf);
		FileStatus[] statu = fs.globStatus(new Path(args[0]), new pathFilter ("^2007"));
		Path [] listPaths=FileUtil.stat2Paths(statu);
		for(Path p:listPaths){
			System.out.println(p);
		}
	}
}

这里顺便使用PathFilter，主要用来过滤通配符不需要匹配的内容。

我猜你可能也喜欢：

寂寞烟

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HDFS操作

利用HDFS的JavaAPI编程HDFS进行创建、删除、查询等操作。一、之前的一篇里有提到如何创建文件，这里简单再说一下代码：View Code JAVA123Configuration conf = new Configuration();FileSystem fs = FileSystem.get(conf
复制链接

扫一扫

专栏目录