今天测试lucene构建索引时需要从文本中读取数据,使用BufferReader获取文本数据返回乱码。代码如下:
public static void main(String[] args) {
File file = new File("d:\\1.txt");
System.out.println(fileToString(file));
}
public static String fileToString(File file) {
StringBuilder out = new StringBuilder();
try {
BufferedReader bufferedReader = new BufferedReader(new FileReader(file));
String str = null;
while ((str = bufferedReader.readLine()) != null) {
out.append(str + "\n");
}
bufferedReader.close();
} catch (Exception e) {
e.printStackTrace();
}
return out.toString();
}
1.txt 文本内容
我是一个中国人
我爱中国
运行结果:
解决方法
指定文本编码格式
public static String fileToString(File file) {
StringBuilder out = new StringBuilder();
try {
InputStreamReader inputStreamReader = new InputStreamReader(new FileInputStream(file), "GB2312");
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
String str = null;
while ((str = bufferedReader.readLine()) != null) {
out.append(str + "\n");
}
bufferedReader.close();
} catch (Exception e) {
e.printStackTrace();
}
return out.toString();
}
运行结果: