场景:需要读取某个TXT文件内容,但是该TXT使用UTF-8写的,开发环境默认字符集是GBK。
核心代码:
public ArrayList<String[]> readData(String srcPath) {
// TODO Auto-generated method stub
ArrayList<String[]> datalist = new ArrayList<String[]>();
String errorPath = null;
String backupPath = null;
String receivedPath = null;
File srcFile = null;
try {
backupPath = AdjustPath(srcPath, "backup");
errorPath = AdjustPath(srcPath, "error");
receivedPath = AdjustPath(srcPath, "received");
String[] inFileNames = new File(receivedPath).list();
if (null != inFileNames && inFileNames.length > 0) {
for (String filename : inFileNames) { // 如果文件夹下有多个文件 逐个读取
String filePath = receivedPath + File.separator + filename;
srcFile = new File(filePath);
System.out.println(srcFile.length());
InputStream fis = new FileInputStream(srcFile);
UnicodeReader ur = new UnicodeReader(fis,"UTF-8");
BufferedReader br = new BufferedReader(ur);
String[] splitedData = null;
String str = null;
while ((str = br.readLine()) != null && str.length() > 0) {
splitedData = splitstr(str);
datalist.add(splitedData);
}
br.close();
ur.close();
fis.close();
}
}
} catch (FileNotFoundException e) {
Logger.error("发生异常" + e.getMessage());
e.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
Logger.error("关闭流发生异常: " + ioe.getMessage());
}
return datalist;
}
关键就在于
while ((str = br.readLine()) != null && str.length() > 0)
这个判定条件中 先将 br.readLine() 赋值给了str
由于br.readLine() 执行这个方法时 就会读取字符
所以不能写成
while (br.readLine()) != null && br.readLine()) > 0) {str = br.readLine()}
这样的话 str读出来永远为 null;
同时 UTF-8编码格式的TXT文件,开头自带的BOM,通过此种方式解析后,第一行第一个字符为 ?
需要新建一个Reader子类,可以参考
http://blog.csdn.net/jackpk/article/details/5702964/