Hadoop文件系统之上传下载文件

如何将本地文件上传到hadoop,又如何将hadoop文件下载到本地。借助于java.net.URL的InputStream读取HDFS的地址,在Hadoop集群中,要使用HDFS自己的文件系统FileSystem,必须生成一定的配置,有以下几种方法生成:

public static FileSystem get(Configuration conf) throws IOException
public static FileSystem get(URI uri, Configuration conf) throws IOException
public static FileSystem get(URI uri, Configuration conf, String user)
throws IOException
一个 Configuration实体,是利用hadoop安装目录下的etc/hadoop/core-site.xml生成的。

那么实现类似于Linux文件系统里的cat功能的代码如下:

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;

public class FileSystemCat {
     public static void main(String[] args) throws Exception {
     String uri = args[0];
     Configuration conf = new Configuration();
     FileSystem fs = FileSystem.get(URI.create(uri), conf);
     InputStream in = null;
     try {
           in = fs.open(new Path(uri));
           IOUtils.copyBytes(in, System.out, 4096, false);
     }finally {
    IOUtils.closeStream(in);
   }
  }
}
打印出HDFS上一个文件夹的内容:
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;

public class ListStatus {
   public static void main(String[] args) throws Exception {
        String uri = args[0];
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(URI.create(uri), conf);
        Path[] paths = new Path[args.length];
        for (int i = 0; i < paths.length; i++) {
            paths[i] = new Path(args[i]);
        } 
       FileStatus[] status = fs.listStatus(paths);
       Path[] listedPaths = FileUtil.stat2Paths(status);
       for(Path p : listedPaths) {
          System.out.println(p);
        }
   }
}
HDFS使用通配符的方法:

public FileStatus[] globStatus(Path pathPattern) throws IOException
public FileStatus[] globStatus(Path pathPattern, PathFilter filter)
throws IOException
hadoop支持的通配符(有些东西就不翻译了,大家大体可以看懂啥意思):


比如集群上有两个文件夹结构如下:


匹配时候,匹配结果如下,左边为通配符,右边选项为结果:


Hadoop删除HDFS数据函数:

public boolean delete(Path f, boolean recursive) throws IOException
当f为一个文件或者一个空目录的时候,recursive的值为假,当删除一个非空目录的时候,recursive为真。

最后看一下上传HDFS与下载到HDFS实现的代码:

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;

public class UploadAndDown {

    public static void main(String[] args) {
        UploadAndDown uploadAndDown = new UploadAndDown();
        try {
            //将本地文件local.txt上传为HDFS上cloud.txt文件
            uploadAndDown.upLoadToCloud("local.txt", "cloud.txt");
            //将HDFS上的cloud.txt文件下载到本地cloudTolocal.txt文件
            uploadAndDown.downFromCloud("cloudTolocal.txt", "cloud.txt");
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

    private void upLoadToCloud(String srcFileName, String cloudFileName)
            throws FileNotFoundException, IOException {
            String LOCAL_SRC = "/home/sina/hbase2/bin/" + srcFileName;
            String CLOUD_DEST = "hdfs://localhost:9000/user/hadoop/" + cloudFileName;
            InputStream in = new BufferedInputStream(new FileInputStream(LOCAL_SRC));
  
           Configuration conf = new Configuration();
           FileSystem fs = FileSystem.get(URI.create(CLOUD_DEST), conf);
           OutputStream out = fs.create(new Path(CLOUD_DEST), new Progressable() {
            @Override
            public void progress() {
                System.out.println("upload a file to HDFS");
            }
        });
        IOUtils.copyBytes(in, out, 1024, true);
    }
    private void downFromCloud(String srcFileName, String cloudFileName) throws FileNotFoundException, IOException {
        String CLOUD_DESC = "hdfs://localhost:9000/user/hadoop/"+cloudFileName;
        String LOCAL_SRC = "/home/hadoop/datasrc/"+srcFileName;
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(URI.create(CLOUD_DESC), conf);
        FSDataInputStream HDFS_IN = fs.open(new Path(CLOUD_DESC));
        OutputStream OutToLOCAL = new FileOutputStream(LOCAL_SRC);
        IOUtils.copyBytes(HDFS_IN, OutToLOCAL, 1024, true);
    }

}


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值