在用 dom4j 以 utf8 编码格式生成 xml 文档后,发现该 xml 文档包含中文的部分异常,无法读取。随后被逼无奈,只好使出猥琐招数,直接将要写入 xml 的字符串重新以 utf8 格式编码后再写入 xml:
str = new String(str.getBytes(), "UTF8");
终于,xml 文档异常消除,可以正常读取。然而,其中的中文部分是却乱码,悲了个剧,事情为什么是这个样子呢?
终于知道:问题在于 FileWriter 类的滥用,将 FileWriter 改为 FileOutputStream 之后,问题解决。
1 dom4j 中 XMLWriter 对文件的处理过程:
- public XMLWriter(OutputStream out) throws UnsupportedEncodingException
- {
- this.format = DEFAULT_FORMAT;
- this.writer = createWriter(out, format.getEncoding());
- this.autoFlush = true;
- namespaceStack.push(Namespace.NO_NAMESPACE);
- }
- public XMLWriter(OutputStream out, OutputFormat format) throws UnsupportedEncodingException
- {
- this.format = format;
- this.writer = createWriter(out, format.getEncoding());
- this.autoFlush = true;
- namespaceStack.push(Namespace.NO_NAMESPACE);
- }
- protected Writer createWriter(OutputStream outStream, String encoding) throws UnsupportedEncodingException
- {
- return new BufferedWriter( new OutputStreamWriter( outStream, encoding ));
- }
结论:dom4j 在生产 xml 文档时,构造其 XMLWriter 所需参数为 OutputStream 对象,而非 Writer 对象。
2 示例:
- public void createXML(String fileName)
- {
- Document doc = DocumentHelper.createDocument();
- Element rootElement = doc.addElement("animal");
- rootElement.addAttribute("name", "汤姆猫");
- Element ageElement = rootElement.addElement("age");
- ageElement.setText("3岁");
- Element colorElement = rootElement.addElement("color");
- colorElement.setText("黄色");
- try
- {
- OutputFormat format = OutputFormat.createPrettyPrint();
- format.setEncoding("UTF-8");
- //XMLWriter xmlWriter = new XMLWriter(new FileWriter(fileName), format);
- XMLWriter xmlWriter = new XMLWriter(new FileOutputStream(fileName), format);
- xmlWriter.write(doc);
- xmlWriter.close();
- }
- catch (Exception e)
- {
- System.out.println(e);
- }
- }
- 转自:http://blog.csdn.net/dancen/article/details/7044213