hdfs 文件中文编码_hadoop读取GBK编码文件，中文乱码问题求助

最新推荐文章于 2022-10-18 20:18:14 发布

张程初号机

最新推荐文章于 2022-10-18 20:18:14 发布

阅读量400

点赞数

文章标签： hdfs 文件中文编码

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_42361175/article/details/113011668

版权

当使用Hadoop的mapper处理GBK编码的HDFS文件时，出现中文乱码问题。mapper尝试通过不同编码方式转换字符串，但输出仍显示乱码。在main函数中，通过FileInputFormat.addInputPath指定输入路径，由于FileInputFormat读取文件时无法像IO流那样控制编码，导致问题。寻求如何在处理HDFS文件时正确处理GBK编码的解决方案。

摘要由CSDN通过智能技术生成

输入文件是GBK编码的，mapper是这样的：publicvoidmap(LongWritablekey,Textvalue,OutputCollectoroutput,Reporterreporter)throwsIOException{Stringstr=value.toString(...

输入文件是GBK编码的，mapper是这样的：

public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {

String str= value.toString();

System.out.println(str);

String str1 = new String(str.getBytes(),"GBK");

System.out.println(str1);

String str2 = new String(str1.getBytes("GBK"));

System.out.println(str2);

String str3 = new String(str.getBytes("GBK"));

System.out.println(str3);

String str4 = new String(str.getBytes(),"UTF-8");;

System.out.println(str4);

String str5 = new String(str4.getBytes("UTF-8"));;

System.out.println(str5);

String str6 = new String(str.getBytes("UTF-8"));

System.out.println(str6);

String str7 = new String(str.getBytes("UTF-8"),"GBK");

System.out.println(str7);

String str8= new String(str.getBytes("GBK"),"UTF-8");

System.out.println(str8);

output.collect(new Text(info[15]), new Text(str));

}

}

在查询个map输出记录的时候，所有输出字符都是乱码，其中main函数的输入是这样写的：

org.apache.hadoop.mapred.FileInputFormat.addInputPath(conf, new Path(dfs文件路径));

FileInputFormat 读文件时不能想IO流那样控制输入，请问我应该怎么解决这个问题？

展开

张程初号机

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。