往HDFS 上写文件

最新推荐文章于 2023-10-12 22:17:47 发布

weixin_33924312

最新推荐文章于 2023-10-12 22:17:47 发布

阅读量194

点赞数

文章标签： python java

原文链接：https://my.oschina.net/u/3267050/blog/1822722

版权

2019独角兽企业重金招聘Python工程师标准>>>

public class HdfsFileTools {
    public static FileSystem fs ;
    private static void init(String url,String user) throws IOException {
        Configuration config = new Configuration();
        config.set("fs.defaultFS", url);
        try {
            fs = FileSystem.get(URI.create(url), config,user);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }

    public static void appendToHdfsFile(String pathString,String contens) throws IOException{
        FSDataOutputStream out;
        Configuration conf = new Configuration();
        init("hdfs://SANDBOX-HADOOP-01.whh.net:8022","bigdata");
        Path path = new Path(pathString);

        if (fs.exists(path))
        {
            out = fs.append(path);
        }
        else {
            out = fs.create(path);

        }
        System.out.println(contens);
        //out.writeChars(contens);
       // out.writeBytes(contens);
      //  out.writeUTF(contens);
        out.write(contens.getBytes());
        fs.close();
    }

我说几点注意事项： 1、appendToHdfsFile的功能是写HDFS文件，当有文件存在的时候追加；不存在的时候是新建文件；（要注意文件的权限问题）

2、写中文的时候只有FSDataOutputStream.writeUTF()方法能写，但是会在文本前面加2个字节的内容，所以用out.write(contens.getBytes())方法代替；以下是详细解说：

2.1 在writeBytes(String s)这个方法上。

JAVA中的char是16位的，一个char存储一个中文字符，直接用writeBytes方法转换会变为8位，直接导致高8位丢失。从而导致中文乱码。

解决方法：

现转换为字节组，再write写入流。方法如下：

原方法：

out.writeBytes(string());

新方法：

out.write(string.getBytes());

2.2 writeUTF()写出一个UTF-8编码的字符串前面会加上2个字节的长度标识，以标识接下来的多少个字节是属于本次方法所写入的字节数。

转载于:https://my.oschina.net/u/3267050/blog/1822722