dom4j解析xml遇中文,加载报错问题。错误信息为:org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0xdd26) was found in the element content of the document.
举个最简答的例子,D:/log/test.xml 文件为GBK编码,内容如下:
public class XmlTest {
public static void main(String[] args) {
SAXReader saxReader = new SAXReader();
String fileName = "D:\\log\\test.xml";
File file = new File(fileName);
Document document = null;
try {
if (fileName.endsWith(".xml.gz")) {
document = saxReader.read(new InputStreamReader(new GZIPInputStream(new FileInputStream(file))));
} else {
document = saxReader.read(new FileReader(file));
//document = saxReader.read(new BufferedInputStream(new FileInputStream(file)));
}
Element root = document.getRootElement();
System.out.println(root.asXML());
} catch (Exception e){
e.printStackTrace();
}
}
}
如果XmlTest类为UTF-8编码的话,就会报错:An invalid XML character (Unicode: 0xdd26) was found in the element content of the document.
而如果XmlTest类为GBK编码的话,就没有问题。
原因是FileReader读取文件,进行字节到字符转化的时候,如果没有指定编码,会默认使用本地环境的编码。
所以dom4j加载xml文件时,建议使用saxReader.read(new BufferedInputStream(new FileInputStream(file)));
或者saxReader.read(file); 而避免使用FileReader或BufferedReader。