XML规范化(1)
文章的主要目的是说明如何规范化XML文档,为了更好地了解规范化的规则,我在翻译时省略了一些内容(XML数字签名;非对称密钥体系和信息摘要)。
让我们先来看看下面两份文件(文件1和文件2)
文件1
<?xml version="1.0"?>
<rooms>
<room type="single" charge="50" currency="USD"/>
<room type="double" charge="70" currency="USD"/>
<room type="suite" charge="100" currency="USD"/>
</rooms>
文件2
<?xml version="1.0"?>
<rooms>
<room type="single" currency="USD" charge="50"/>
<room type="double" currency="USD" charge="70"/>
<room type="suite" currency="USD" charge="100"/>
</rooms>
你肯定会说:这两份文件是一样的。对的,这两份文件表达的是相同的信息,采用了同样的文档结构,它们在逻辑上是一样的。你也许也已经注意到了它们之间的一些小差别:某些内容的顺序不一样(蓝色字体的内容)。
在这个例子里,两份文件的元素room的属性的顺序是不一样的,所以,它们相应的字节流也是不同的。当然,还有其他很多原因导致在逻辑上相同的XML文档的字符流不同。建立XML文档规范形式的目的是用来判定不同的XML文档在逻辑上是否相同。W3C制定了规范化规则,使用这些规则对两份逻辑上相同的文档进行规范化后,可以得到相同的文档。 当我们需要判断两份XML文档在逻辑上是否相同时,我们可以先将文档规范化,然后转化成字节流进行比较,如果字节流相同,那么我们可以断定这两份文档在逻辑上是相同的。
XML规范化规则定义了一套规则用来形成规范的XML文档。下面将以一份文件(文件3)为例,逐步说明如何规范化XML文档。
文件3
<?xml version="1.0"?>
<!DOCTYPE product [
<!ATTLIST part approved CDATA "yes">
<!ENTITY testhistory "Part has been tested according to the specified standards.">
]>
<product xmlns="http://www.myFictitiousCompany.com/product"
xmlns:sup="http://www.myFictitiousCompany.com/supplier"
name='rotating disc "Energymeter"'
id="P 184.435"
classification = "MeasuringInstruments/Electrical/Energy/">
<parts>
<part id="P 184.675"
name="bearing">
<sup:supplier id="S 1753"/>
<sup:supplier id="S 2341"/>
<sup:supplier id="S 3276"/>
<comments>&testhistory;</comments>
</part>
<part id="P 184.871"
name="magnet"
xmlns="http://www.myFictitiousCompany.com/product">
<sup:supplier id="S 3908"/>
<sup:supplier id="S 4589"/>
<sup:supplier id="S 1098"/>
<comments>&testhistory;</comments>
</part>
</parts>
</product>
1 编码方式
编码是指按照一定的方式用字节代替字符。很显然,使用了不同的编码方式的同样内容的文档,得到的字节流是不同的。
XML规范条款规定XML的规范形式使用UTF-8进行编码,如果需要规范化的XML文档使用其他的方式编码,首先要将它转化为UTF-8编码。
2 断行符
文本文件中断行符一般使用A或D(十六进制)或者两者的组合来表示。XML文档是普通的文档文件,所以它也使用#xA和#xD作为断行符。XML的规范形式要求所有的断行符都用#xA表示。
3 空白符
XML规范化要求将所有的空白符(比如tab和space)转化成space(#x20),文件4是转化后的文件。注:在文件3中(<sup:supplier id="S 1753"/>),S与1753之间存在一个制表符
文件4
<?xml version="1.0"?>
<!DOCTYPE product [
<!ATTLIST part approved CDATA "yes">
<!ENTITY testhistory "Part has been tested according to the specified standards.">
]>
<product xmlns="http://www.myFictitiousCompany.com/product"
xmlns:sup="http://www.myFictitiousCompany.com/supplier"
name='rotating disc "Energymeter"'
id="P 184.435"
classification = "MeasuringInstruments/Electrical/Energy/">
<parts>
<part id="P 184.675"
name="bearing">
<sup:supplier id="S 1753"/>
<sup:supplier id="S 2341"/>
<sup:supplier id="S 3276"/>
<comments>&testhistory;</comments>
</part>
<part id="P 184.871"
name="magnet"
xmlns="http://www.myFictitiousCompany.com/product">
<sup:supplier id="S 3908"/>
<sup:supplier id="S 4589"/>
<sup:supplier id="S 1098"/>
<comments>&testhistory;</comments>
</part>
</parts>
</product>
4 属性值中的双引号
XML文档的规范形式中,属性值必须使用双引号括起来。文件4中(红色部分),name的属性值用的是单引号,必须改成双引号。文件5是规范后的文件。
文件5
<?xml version="1.0"?>
<!DOCTYPE product [
<!ATTLIST part approved CDATA "yes">
<!ENTITY testhistory "Part has been tested according to the specified standards.">
]>
<product xmlns="http://www.myFictitiousCompany.com/product"
xmlns:sup="http://www.myFictitiousCompany.com/supplier"
name="rotating disc "Energymeter""
id="P 184.435"
classification = "MeasuringInstruments/Electrical/Energy/">
<parts>
<part id="P 184.675"
name="bearing">
<sup:supplier id="S 1753"/>
<sup:supplier id="S 2341"/>
<sup:supplier id="S 3276"/>
<comments>&testhistory;</comments>
</part>
<part id="P 184.871"
name="magnet"
xmlns="http://www.myFictitiousCompany.com/product">
<sup:supplier id="S 3908"/>
<sup:supplier id="S 4589"/>
<sup:supplier id="S 1098"/>
<comments>&testhistory;</comments>
</part>
</parts>
</product>
5 属性值中的特殊字符
文件5有一个问题(红色部分):name 的属性值含有双引号。XML规范化规则规定,属性值中的特殊字符(比如双引号)必须使用相应的转义字符(比如用"代替双引号)代替。
文件6
<?xml version="1.0"?>
<!DOCTYPE product [
<!ATTLIST part approved CDATA "yes">
<!ENTITY testhistory "Part has been tested according to the specified standards.">
]>
<product xmlns="http://www.myFictitiousCompany.com/product"
xmlns:sup="http://www.myFictitiousCompany.com/supplier"
name="rotating disc "Energymeter""
id="P 184.435"
classification = "MeasuringInstruments/Electrical/Energy/">
<parts>
<part id="P 184.675"
name="bearing">
<sup:supplier id="S 1753"/>
<sup:supplier id="S 2341"/>
<sup:supplier id="S 3276"/>
<comments>&testhistory;</comments>
</part>
<part id="P 184.871"
name="magnet"
xmlns="http://www.myFictitiousCompany.com/product">
<sup:supplier id="S 3908"/>
<sup:supplier id="S 4589"/>
<sup:supplier id="S 1098"/>
<comments>&testhistory;</comments>
</part>
</parts>
</product>
6 实体引用
文件6包含了DTD声明,它定义了一个实体:testhistory(红色部分),这个实体被元素comments引用。规范化要求文档中不能存在实体引用,需要用其内容代替引用。文件7是规范化后的文档。
文件7
<?xml version="1.0"?>
<!DOCTYPE product [
<!ATTLIST part approved CDATA "yes">
<!ENTITY testhistory "Part has been tested according to the specified standards.">
]>
<product xmlns="http://www.myFictitiousCompany.com/product"
xmlns:sup="http://www.myFictitiousCompany.com/supplier"
name="rotating disc "e;Energymeter"e;"
id="P 184.435"
classification = "MeasuringInstruments/Electrical/Energy/">
<parts>
<part id="P 184.675"
name="bearing">
<sup:supplier id="S 1753"/>
<sup:supplier id="S 2341"/>
<sup:supplier id="S 3276"/>
<comments>Part has been tested according to the specified standards.</comments>
</part>
<part id="P 184.871"
name="magnet"
xmlns="http://www.myFictitiousCompany.com/product">
<sup:supplier id="S 3908"/>
<sup:supplier id="S 4589"/>
<sup:supplier id="S 1098"/>
<comments>Part has been tested according to the specified standards.</comments>
</part>
</parts>
</product>
7 缺省属性
文件7为part元素定义了一个缺省属性approved(红色字体),在规范化的文档中,缺省属性必须出现在元素的属性中。文件8时规范化后的文件。
文件8
<?xml version="1.0"?>
<!DOCTYPE product [
<!ATTLIST part approved CDATA "yes">
<!ENTITY testhistory "Part has been tested according to the specified standards.">
]>
<product xmlns="http://www.myFictitiousCompany.com/product"
xmlns:sup="http://www.myFictitiousCompany.com/supplier"
name="rotating disc "e;Energymeter"e;"
id="P 184.435"
classification = "MeasuringInstruments/Electrical/Energy/">
<parts>
<part id="P 184.675"
name="bearing"
approved="yes">
<sup:supplier id="S 1753"/>
<sup:supplier id="S 2341"/>
<sup:supplier id="S 3276"/>
<comments>Part has been tested according to the specified standards.</comments>
</part>
<part id="P 184.871"
name="magnet"
xmlns="http://www.myFictitiousCompany.com/product"
approved="yes">
<sup:supplier id="S 3908"/>
<sup:supplier id="S 4589"/>
<sup:supplier id="S 1098"/>
<comments>Part has been tested according to the specified standards.</comments>
</part>
</parts>
</product>
9 XML和DTD声明
规范化的XML文档不能存在XML或DTD声明,文件9是将XML和DTD声明去除后的文件。
文件9
<product xmlns="http://www.myFictitiousCompany.com/product"
xmlns:sup="http://www.myFictitiousCompany.com/supplier"
name="rotating disc "e;Energymeter"e;"
id="P 184.435"
classification = "MeasuringInstruments/Electrical/Energy/">
<parts>
<part id="P 184.675"
name="bearing"
approved="yes">
<sup:supplier id="S 1753"/>
<sup:supplier id="S 2341"/>
<sup:supplier id="S 3276"/>
<comments>Part has been tested according to the specified standards.</comments>
</part>
<part id="P 184.871"
name="magnet"
xmlns="http://www.myFictitiousCompany.com/product"
approved="yes">
<sup:supplier id="S 3908"/>
<sup:supplier id="S 4589"/>
<sup:supplier id="S 1098"/>
<comments>Part has been tested according to the specified standards.</comments>
</part>
</parts>
</product>
10 文档元素外的空格
规范化的XML文档在文档元素外面不能存在空格,文档以“<”开始,在"<"前面不能有空格。文件10时去掉“<”前面的空格后的文件。
文件10
<product xmlns="http://www.myFictitiousCompany.com/product" xmlns:sup="http://www.myFictitiousCompany.com/supplier"
name="rotating disc "e;Energymeter"e;" id="P 184.435"
classification="MeasuringInstruments/Electrical/Energy/">
<parts>
<part id="P 184.675" name="bearing" approved="yes">
<sup:supplier id="S 1753"/>
<sup:supplier id="S 2341"/>
<sup:supplier id="S 3276"/>
<comments>Part has been tested according to the specified standards.</comments>
</part>
<part id="P 184.871" name="magnet" xmlns="http://www.myFictitiousCompany.com/product" approved="yes">
<sup:supplier id="S 3908"/>
<sup:supplier id="S 4589"/>
<sup:supplier id="S 1098"/>
<comments>Part has been tested according to the specified standards.</comments>
</part>
</parts>
</product>
11 开始和结束元素中的空格
1 ) "<"与元素名之间不能存在空格,"</"也一样。
2 ) 如果元素包含属性,在元素名和属性之间有且只有一个空格。
3 ) 在属性和属性值之间的等号两边不能有空格。
4 ) 属性值和相邻属性之间有且只有一个空格。
5 ) 在">"之前不能有空格。
12 空元素
规范化的xml文档中,空元素要以<...></...>的形式出现,将<emptyElement/>转化为<emptyElement></emptyElement>后得到文件11。
文件11
<product xmlns="http://www.myFictitiousCompany.com/product" xmlns:sup="http://www.myFictitiousCompany.com/supplier"
name="rotating disc "Energymeter"" id="P 184.435" classification="MeasuringInstruments/Electrical/Energy/">
<parts>
<part id="P 184.675" name="bearing" approved="yes">
<sup:supplier id="S 1753"></sup:supplier>
<sup:supplier id="S 2341"></sup:supplier>
<sup:supplier id="S 3276"></sup:supplier>
<comments>Part has been tested according to the specified standards.</comments>
</part>
<part id="P 184.871" name="magnet" xmlns="http://www.myFictitiousCompany.com/product" approved="yes">
<sup:supplier id="S 3908"></sup:supplier>
<sup:supplier id="S 4589"></sup:supplier>
<sup:supplier id="S 1098"></sup:supplier>
<comments>Part has been tested according to the specified standards.</comments>
</part>
</parts>
</product>
13 名称空间声明
XML文档规范化要求文档中除了多余的名称空间外,所有的名称空间都保留。文件11中第二个part元素的名称空间是多余的,将她出去不会影响文档中所有节点的名称空间上下文。
文件12
<product xmlns="http://www.myFictitiousCompany.com/product" xmlns:sup="http://www.myFictitiousCompany.com/supplier"
name="rotating disc "Energymeter"" id="P 184.435" classification="MeasuringInstruments/Electrical/Energy/">
<parts>
<part id="P 184.675" name="bearing" approved="yes">
<sup:supplier id="S 1753"></sup:supplier>
<sup:supplier id="S 2341"></sup:supplier>
<sup:supplier id="S 3276"></sup:supplier>
<comments>Part has been tested according to the specified standards.</comments>
</part>
<part id="P 184.871" name="magnet" approved="yes">
<sup:supplier id="S 3908"></sup:supplier>
<sup:supplier id="S 4589"></sup:supplier>
<sup:supplier id="S 1098"></sup:supplier>
<comments>Part has been tested according to the specified standards.</comments>
</part>
</parts>
</product>
14 元素属性的排序
XML文档规范化要求元素的属性以字母的升序排列,在一个元素中,名称空间首先出现,然后是属性名和属性值,文件13是排列后的文件
文件13
<product xmlns="http://www.myFictitiousCompany.com/product" xmlns:sup="http://www.myFictitiousCompany.com/supplier"
classification="MeasuringInstruments/Electrical/Energy/" id="P 184.435" name="rotating disc "Energymeter"">
<parts>
<part approved="yes" id="P 184.675" name="bearing">
<sup:supplier id="S 1753"></sup:supplier>
<sup:supplier id="S 2341"></sup:supplier>
<sup:supplier id="S 3276"></sup:supplier>
<comments>Part has been tested according to the specified standards.</comments>
</part>
<part approved="yes" id="P 184.871" name="magnet">
<sup:supplier id="S 3908"></sup:supplier>
<sup:supplier id="S 4589"></sup:supplier>
<sup:supplier id="S 1098"></sup:supplier>
<comments>Part has been tested according to the specified standards.</comments>
</part>
</parts>
</product>
(待续)
XML规范化规则(W3C制定)