输入文件是GBK编码的,mapper是这样的:publicvoidmap(LongWritablekey,Textvalue,OutputCollectoroutput,Reporterreporter)throwsIOException{Stringstr=value.toString(...
输入文件是GBK编码的,mapper是这样的:
public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {
String str= value.toString();
System.out.println(str);
String str1 = new String(str.getBytes(),"GBK");
System.out.println(str1);
String str2 = new String(str1.getBytes("GBK"));
System.out.println(str2);
String str3 = new String(str.getBytes("GBK"));
System.out.println(str3);
String str4 = new String(str.getBytes(),"UTF-8");;
System.out.println(str4);
String str5 = new String(str4.getBytes("UTF-8"));;
System.out.println(str5);
String str6 = new String(str.getBytes("UTF-8"));
System.out.println(str6);
String str7 = new String(str.getBytes("UTF-8"),"GBK");
System.out.println(str7);
String str8= new String(str.getBytes("GBK"),"UTF-8");
System.out.println(str8);
output.collect(new Text(info[15]), new Text(str));
}
}
在查询个map输出记录的时候,所有输出字符都是乱码,其中main函数的输入是这样写的:
org.apache.hadoop.mapred.FileInputFormat.addInputPath(conf, new Path(dfs文件路径));
FileInputFormat 读文件时不能想IO流那样控制输入,请问我应该怎么解决这个问题?
展开