关于dom4j无法解析xmlns问题及生成非UTF-8字符集乱码问题的解决

最新推荐文章于 2023-12-02 10:55:01 发布

shadowkiss

最新推荐文章于 2023-12-02 10:55:01 发布

阅读量4.4k

点赞数

文章标签： hashmap xml encoding exception string file

本文链接：https://blog.csdn.net/shadowkiss/article/details/4269816

版权

dom4j 无法解析xml命名空间的问题近日得以解决，如果这个问题也正在困扰你，看看下文也许能给你一些启发

<?xml version="1.0" encoding="UTF-8"?> <MyXML xmlns="http://www.ttt.com/ttt-TrdInfo-1-0" xmlns:x="http://www.ttt.com/ttt/metadata.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="res286.xsd"> <Hdr> <ReqId>001</ReqId> <Tid>1002</Tid> <Cid>500</Cid> <user>cuishen</user> <Mname>supermarket</Mname> <pwd>543200210</pwd> </Hdr> <Car> <Flg>T</Flg> <Cod>ccc</Cod> <Door>kkk</Door> <mktId>b01</mktId> <Key> <KeyID>t01</KeyID> </Key> </Car> </MyXML>

解析代码

import java.io.File; import java.util.List; import java.util.Map; import java.util.HashMap; import org.dom4j.Document; import org.dom4j.Element; import org.dom4j.XPath; import org.dom4j.Attribute; import org.dom4j.io.SAXReader; import org.dom4j.DocumentException; public class ReadMyXML{ public static void main(String args[]){ File xmlFile = new File("c:/myXML.xml"); SAXReader xmlReader = new SAXReader(); try{ Document document = xmlReader.read(xmlFile); ///*测试代码适用于读取xml的节点 HashMap xmlMap = new HashMap(); xmlMap.put("mo","http://www.ttt.com/ttt-TrdInfo-1-0"); XPath x = document.createXPath("//mo:ReqId"); x.setNamespaceURIs(xmlMap); Element valueElement = (Element)x.selectSingleNode(document); System.out.println(valueElement.getText()); //*/ }catch(DocumentException e){ e.printStackTrace(); } } }

上面就是运用dom4j 解析带命名空间的xml文件的节点的例子，只要给XPath设置默认的命名空间就行了，这个xml文件尽管定义了其他命名空间，但是没有用到它，所以不必管它，那个HashMap里的键是随便定义的字符串，值就是默认的命名空间对应的字符串。document.createXPath()里传的参数是要读取的节点的XPath，即“//”+ HashMap里的键名 + “:”+ 要读取的节点名组成的字符串，简单吧，后面怎么做我就不用说了吧^_^
如果要读取的是xml文件里的属性该怎么办呢，不用急，看看下面的例子你就明白了，原理一样，只要在造XPath字符串的时候在属性前加个“@”就行了。

XML

<?xml version="1.0" encoding="UTF-8"?> <MyXML xmlns="http://www.ttt.com/ttt-TrdInfo-1-0" xmlns:x="http://www.ttt.com/ttt/metadata.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="res286.xsd"> <Hdr ReqId="001" Tid="1002" Cid="500" user="cuishen" Mname="supermarket" pwd="543200210"/> <Car Flg="T" Cod="ccc" Door="kkk" mktId="b01"> <Key KeyID="t01"/> </Car> </MyXML>

解析代码

import java.io.File; import java.util.List; import java.util.Map; import java.util.HashMap; import org.dom4j.Document; import org.dom4j.Element; import org.dom4j.XPath; import org.dom4j.Attribute; import org.dom4j.io.SAXReader; import org.dom4j.DocumentException; public class ReadMyXML2{ public static void main(String args[]){ File xmlFile = new File("c:/myXML2.xml"); SAXReader xmlReader = new SAXReader(); try{ Document document = xmlReader.read(xmlFile); ///*测试代码解析xml的属性 HashMap xmlMap = new HashMap(); xmlMap.put("mo","http://www.ttt.com/ttt-TrdInfo-1-0"); XPath x = document.createXPath("//mo:Hdr/@ReqId"); x.setNamespaceURIs(xmlMap); Attribute valueAttribute = (Attribute)x.selectSingleNode(document); System.out.println(valueAttribute.getText()); //*/ }catch(DocumentException e){ e.printStackTrace(); } } }

使用DOM4J的XMLWriter输出UTF-8编码的XML文件时，出现乱码。

首先，设置输出的编码，在这我们使用UTF-8

OutputFormat format = OutputFormat.createPrettyPrint(); format.setEncoding("utf-8");

输出代码

try { output = new XMLWriter(new FileWriter("entity.xml"), format); output.write(document); output.close(); } catch (IOException e) { e.printStackTrace(); }

上面的输出如果有中文，可以会出现乱码的问题，将上面的FileWriter改成FileOutputStream便可以了。

try { output = new XMLWriter(new FileOutputStream("entity.xml"), format); output.write(document); output.close(); } catch (IOException e) { e.printStackTrace(); }

另附一篇编码解决方法

这几天开始学习dom4j，在网上找了篇文章就开干了，上手非常的快，但是发现了个问题就是无法以UTF-8保存xml文件，保存后再次读出的时候会报 “Invalid byte 2 of 2-byte UTF-8 sequence.”这样一个错误，检查发现由dom4j生成的这个文件，在使用可正确处理XML编码的任何的编辑器中中文成乱码，从记事本查看并不会出现乱码会正确显示中文。让我很是头痛。试着使用GBK、gb2312编码来生成的xml文件却可以正常的被解析。因此怀疑的dom4j没有对utf-8编码进行处理。便开始查看dom4j的原代码。终于发现的问题所在，是自己程序的问题。
　　在dom4j的范例和网上流行的《DOM4J 使用简介》这篇教程中新建一个xml文档的代码都类似如下
　　 public void createXML(String fileName) {
　　 document．nbspdoc = org.dom4j.document．elper.createdocument．);
　　 Element root = doc.addElement("book");
　　 root.addAttribute("name", "我的图书");
　　 Element childTmp;
　　 childTmp = root.addElement("price");
　　 childTmp.setText("21.22");
　　 Element writer = root.addElement("author");
　　 writer.setText("李四");
　　 writer.addAttribute("ID", "001");
　　 try {
　　 org.dom4j.io.XMLWriter xmlWriter = new org.dom4j.io.XMLWriter(
　　 new FileWriter(fileName));
　　 xmlWriter.write(doc);
　　 xmlWriter.close();
　　 }
　　 catch (Exception e) {
　　 System.out.println(e);
　　 }
　　 }
　　在上面的代码中输出使用的是FileWriter对象进行文件的输出。这就是不能正确进行文件编码的原因所在，java中由Writer类继承下来的子类没有提供编码格式处理，所以dom4j也就无法对输出的文件进行正确的格式处理。这时候所保存的文件会以系统的默认编码对文件进行保存，在中文版的 window下java的默认的编码为GBK，也就是所虽然我们标识了要将xml保存为utf-8格式但实际上文件是以GBK格式来保存的，所以这也就是为什么能够我们使用GBK、GB2312编码来生成xml文件能正确的被解析，而以UTF-8格式生成的文件不能被xml解析器所解析的原因。
　　好了现在我们找到了原因所在了，我们来找解决办法吧。首先我们看看dom4j是如何实现编码处理的
　　 public XMLWriter(OutputStream out) throws UnsupportedEncodingException {
　　 //System.out.println("In OutputStream");
　　 this.format = DEFAULT_FORMAT;
　　 this.writer = createWriter(out, format.getEncoding());
　　 this.autoFlush = true;
　　 namespaceStack.push(Namespace.NO_NAMESPACE);
　　 }
　　 public XMLWriter(OutputStream out, OutputFormat format) throws UnsupportedEncodingException {
　　 //System.out.println("In OutputStream,OutputFormat");
　　 this.format = format;
　　 this.writer = createWriter(out, format.getEncoding());
　　 this.autoFlush = true;
　　 namespaceStack.push(Namespace.NO_NAMESPACE);
　　 }
　　 /**
　　 * Get an OutputStreamWriter, use preferred encoding.
　　 */
　　 protected Writer createWriter(OutputStream outStream, String encoding) throws UnsupportedEncodingException {
　　 return new BufferedWriter(
　　 new OutputStreamWriter( outStream, encoding )
　　 );
　　 }
　　由上面的代码我们可以看出dom4j对编码并没有进行什么很复杂的处理，完全通过java本身的功能来完成。所以我们在使用dom4j的来生成我们的 XML文件时不应该直接为在构建XMLWriter时，不应该直接为其赋一个Writer对象，而应该通过一个OutputStream的子类对象来构建。也就是说在我们上面的代码中，不应该用FileWriter对象来构建xml文档，而应该使用FileOutputStream对象来构建所以将代码修改入下：
　　 public void createXML(String fileName) {
　　 document．nbspdoc = org.dom4j.document．elper.createdocument．);
　　 Element root = doc.addElement("book");
　　 root.addAttribute("name", "我的图书");
　　 Element childTmp;
　　 childTmp = root.addElement("price");
　　 childTmp.setText("21.22");
　　 Element writer = root.addElement("author");
　　 writer.setText("李四");
　　 writer.addAttribute("ID", "001");
　　 try {
　　 //注意这里的修改
　　 org.dom4j.io.XMLWriter xmlWriter = new org.dom4j.io.XMLWriter(
　　 new FileOutputStream(fileName));
　　 xmlWriter.write(doc);
　　 xmlWriter.close();
　　 }
　　 catch (Exception e) {
　　 System.out.println(e);
　　 }
　　 }
　　
　　至此DOM4J的问题编码问题算是告一段落，希望对此文章对其他朋友有用。