HDFS概念
1 数据块*
HDFS的一个数据块默认是64M,与元数据分开管理。
优点:
数据块的大小设计的较大,所以寻址占传输的时间比例较小,只需要计算传输速度即可。
便于简化管理,利于计算剩余空间、冗余备份(默认三个)
与元数据分开管理,保持他本身无属性的特性。
2 nameNode,DataNode*
nameNode:
1 命名空间
2 维护文件系统树(命名空间镜像文件)与目录(编辑日志文件)(本地磁盘)
3 保存每个块的元数据信息
4 维护多个dataNode
备份策略:写入远程磁盘、两个NameNode同时运行
DataNode
1 文件系统的工作节点
2 定期向NameNode发送块列表
3 收到NameNode和Client的调度
3 外部接口
Thrift:Hadoop提供给外部非JAVA语言调用的接口
HTTP:网页监控
FTP:传输文件
4 JAVA接口
1 URL API读取
@Test public void input1() throws MalformedURLException, IOException { URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()); InputStream in = new URL("hdfs://192.168.1.100:9000/user/sunfan/input/file1.txt").openStream(); byte[] buff = new byte[1024]; int len; while (-1 != (len = in.read(buff))) { for (int i = 0; i < len; i++) { System.out.print((char) buff[i]); } } in.close(); }
2 FileSystem API 读取FSDatainputStream流的使用(seek方法可以重新定位读取,和inputStream的skip不一样) *
@Test public void input2() throws MalformedURLException, IOException { String uri = "hdfs://192.168.1.100:9000/user/sunfan/input/file1.txt"; FileSystem fs = FileSystem.get(URI.create(uri), new Configuration()); FSDataInputStream in = null; in = fs.open(new Path(uri)); byte[] buff = new byte[1024]; int len; while (-1 != (len = in.read(buff))) { for (int i = 0; i < len; i++) { System.out.print((char) buff[i]); } } in.seek(3); while (-1 != (len = in.read(buff))) { for (int i = 0; i < len; i++) { System.out.print((char) buff[i]); } } in.close(); }
写入数据 FSDataOutPutStream
@Test public void out3() throws IOException { String uri2 = "hdfs://192.168.1.100:9000/user/sunfan/input/file3.txt"; FileSystem fs = FileSystem.get(URI.create(uri2), new Configuration()); FSDataOutputStream out = fs.create(new Path(uri2)); System.out.println(fs.exists(new Path(uri2))); out.write(97); }
本地文件的复制:注意这里重写Progressable来写进度条,用IOUtils.copy方法来复制
@Test public void out3() throws IOException { long start = System.currentTimeMillis(); FileInputStream in = new FileInputStream("C:\\Users\\sunfan\\Desktop\\copy.pdf"); String uri2 = "hdfs://192.168.1.100:9000/user/sunfan/input/file3.txt"; FileSystem fs = FileSystem.get(URI.create(uri2), new Configuration()); FSDataOutputStream out = fs.create(new Path(uri2), new Progressable() { public void progress() { System.out.print("."); } }); IOUtils.copyBytes(in, out, 4096, true); System.out.println(System.currentTimeMillis()-start); }
读取文件的详细信息:通过fs.getFileStatus得到FileStatus
@Test public void showFilesystem() throws IOException { String dir = "hdfs://192.168.1.100:9000/user/sunfan/input"; FileSystem fs = FileSystem.get(URI.create("hdfs://192.168.1.100:9000"), new Configuration()); FileStatus status = fs.getFileStatus(new Path(dir)); System.out.println(status.getPermission()); }
读取文件列表:通过fs.listStatus获取FileStatus数组
@Test public void showFilesystem2() throws IOException { String dir = "hdfs://192.168.1.100:9000/user/sunfan/input"; FileSystem fs = FileSystem.get(URI.create(dir), new Configuration()); FileStatus[] status = fs.listStatus(new Path("hdfs://192.168.1.100:9000/user/sunfan/input")); for (FileStatus fileStatus : status) { System.out.println(fileStatus.getPath()); } }
用正则读取文件:通过fs.globStatus读取
@Test public void showFilesystem2() throws IOException { String dir = "hdfs://192.168.1.100:9000/user/sunfan/input"; FileSystem fs = FileSystem.get(URI.create(dir), new Configuration()); FileStatus[] status = fs.globStatus(new Path("hdfs://192.168.1.100:9000/user/sunfan/input/*")); for (FileStatus fileStatus : status) { System.out.println(fileStatus.getPath()); } }