解析xml文件时，怎么得到encoding的值？

最新推荐文章于 2024-06-16 21:30:34 发布

jianghuxing

最新推荐文章于 2024-06-16 21:30:34 发布

阅读量2.9k

点赞数

分类专栏： Java基础文章标签： encoding xml whitespace string interface import

本文链接：https://blog.csdn.net/jianghuxing/article/details/357890

版权

Java基础专栏收录该内容

3 篇文章 0 订阅

订阅专栏

dom4j api:

org.dom4j
Interface Document

getXMLEncoding
public String getXMLEncoding()
Return the encoding of this document, as part of the XML declaration This is null when unspecified or when it is not known (such as when the Document was created in memory) or when the implementation does not support this operation.
The way this encoding is retrieved also depends on the way the XML source is parsed. For instance, if the SAXReader is used and if the underlying XMLReader implementation support the org.xml.sax.ext.Locator2 interface, the result returned by this method is specified by the getEncoding() method of that interface.

Returns:
The encoding of this document, as stated in the XML declaration, or null if unknown.
Since:
1.5

为 DOM Level 3 而完成的一项重要任务是：通过加入新的可以查询缺少的 XMLInfoset 信息的方法，使 DOM 数据模型与 XML Information Set（Infoset）相匹配。例如，现在可以通过 Document 接口（它被映射到 Infoset 文档信息项）查询和修改储存在一个 XML 声明中的信息，例如 version、standalone 和 encoding。类似地，基本 URI 和声明基本 URI 属性是根据 XML Base 处理的，它们被放在 Node 接口中。您还可以获取 XML Infoset 元素内容的 whitespace 属性。这个属性表明一个 Text 节点是否只包含可以被忽略的空白。可以通过 Text 接口（它映射到 XML Inforset 字符信息项）获得这个属性。清单1展示了在 Java 语言绑定中这个接口中的实际方法签名。

清单1. 在 Java 语言绑定的方法签名

// XML Declaration information on
// the org.w3c.dom.Document interface
public String getXmlEncoding();
public void setXmlEncoding(String xmlEncoding);
public boolean getXmlStandalone();
public void setXmlStandalone(boolean xmlStandalone)
throws DOMException;
public String getXmlVersion();
public void setXmlVersion(String xmlVersion)
throws DOMException;

// element content whitespace property on the Text
// interface
public boolean isWhitespaceInElementContent();

通过 Attr 接口的 schemaTypeInfo 属性，您还可以获取一个属性信息项的属性类型特性的值 ——即一个属性的类型。后面有一节对此给予了更详细的介绍。

此外，这里提供了一个新的特性，用于以最接近 XML Infoset 的形式返回 Document，在此之前，由于不同的编辑操作（例如插入或者删除节点）的作用，文档通常会更加偏离 XML Infoset。这是在进行文档标准化（document normalization）操作时可能造成的部分结果，我们将在下面的文档标准化一节中对此加以描述。

最后，新的 Appendix C 提供了 XML Infoset 模型与 DOM 之间的映射，在这种映射中，每一个 XML Infoset 信息项都映射到其相应的 Node，反之也一样，一个信息项的每一个属性都映射到其相应 Node 的属性。这个附录应该可以使您对 DOM 数据模型有一个很好的全面了解，并且展示了如何访问所要查找的信息。

那些功能都是在　DOM Level 3　中提供的，你的可能为DOM Level 1　或　DOM Level 2　
到它的管方网站上重down

using JDOM:
SAXBuilder builder = new SAXBuilder();
Document doc;

        doc = builder.build(new FileInputStream("sample.xml"));
        XMLOutputter output = new XMLOutputter();
        output.output(doc, new FileOutputStream("shit.xml"));
        System.out.println(output.getFormat().getEncoding());
so simple...
------------------------------------------------------------->
XML File:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<comment>Vincent</comment>
</properties>
-------------------------------------------------------------->
Output:
UTF-8

import org.dom4j.Document;
import org.dom4j.DocumentHelper;

            String xml = "<?xml version='1.0' encoding='iso-8859-1'?><Message>Hi there</Message>";
            Document doc = DocumentHelper.parseText(xml);
            System.out.println("The encoding is " + doc.getXMLEncoding());
            System.out.println("As XML: " + doc.asXML());

The result is:

The encoding is iso-8859-1
As XML: <?xml version="1.0" encoding="iso-8859-1"?>
<Message>Hi there</Message>

=================================

            String xml = "<?xml version='1.0' encoding='UTF-8'?><Message>Hi there</Message>";
            Document doc = DocumentHelper.parseText(xml);
            System.out.println("The encoding is " + doc.getXMLEncoding());
            System.out.println("As XML: " + doc.asXML());

The result is:

The encoding is UTF-8

As XML: <?xml version="1.0" encoding="UTF-8"?>
<Message>Hi there</Message>

====================================
            String xml = "<?xml version='1.0' encoding='GBK'?><Message>Hi there</Message>";
            Document doc = DocumentHelper.parseText(xml);
            System.out.println("The encoding is " + doc.getXMLEncoding());
            System.out.println("As XML: " + doc.asXML());

The result is:

The encoding is GBK

As XML: <?xml version="1.0" encoding="GBK"?>
<Message>Hi there</Message>

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.apache.xerces.dom.DocumentImpl;

            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder = factory.newDocumentBuilder();
            InputStream in = new FileInputStream(args[0]);
            DocumentImpl doc = (DocumentImpl)builder.parse(in);
            System.out.println(doc.getXmlEncoding());

jianghuxing

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
解析xml文件时，怎么得到encoding的值？

dom4j api:org.dom4j Interface DocumentgetXMLEncodingpublic String getXMLEncoding()Return the encoding of this document, as part of the XML declaration This is null when unspecified or when it is not k
复制链接

扫一扫

专栏目录