XML快速入门的基本语法

element   ::=    EmptyElemTag| STagcontentETag

[2]   EmptyElemTag   ::=   '<' Name(SAttribute)*S?'/>'

[3]   STag   ::=   '<' Name(SAttribute)*S?'>'

[4]   ETag    ::=     '</' NameS?'>'

[5]   content   ::=    CharData?((element| Reference| CDSect| PI| Comment)CharData?)*

Name   ::=   NameStartChar(NameChar)*

【7】NameStartChar   ::=   ":" | [A-Z] |"_" | [a-z]

【8】NameChar   ::=   NameStartChar |"-" | "." | [0-9]

 

[9]   S   ::=   (#x20 | #x9 | #xD | #xA)+

 

[10]   Attribute   ::=    NameEqAttValue

E::= S? '='S?

[12]   AttValue   ::=   ' " ' ([^<&"] | Reference)*' " '| " ' " ([^<&'] | Reference)*" ' "

 

[13]   CharData ::=  [^<&]* - ([^<&]* ']]>'[^<&]*)

【】reeference   ::=    EntityRef | CharRef

[15]  EntityRef   ::=   '&' Name ';'

[16]   CharRef   ::=   '&#' [0-9]+ ';' | '&#x'[0-9a-fA-F]+ ';'

 

[17]   CDSect   ::=   CDStart CData CDEnd

[18]   CDStart   ::=   '<![CDATA['

[19]   CData   ::=   (Char* - (Char* ']]>' Char*))

[20]   CDEnd   ::=   ']]>'

 

[21]   PI   ::=   '<?'PITarget(S(Char*- (Char*'?>' Char*)))?'?>'

[22]   PITarget   ::=   Name- (('X' | 'x') ('M' | 'm') ('L' | 'l'))

 

[23]  Comment   ::=   '<!--'((Char - '-') | ('-' (Char - '-')))* '-->'

[24]  Char   ::=   #x9 | #xA | #xD | [#x20-#x7F]

 

 

 

Abstract

1. This documentcontains simplified XML Spec1.0 and dependences among those syntactic constructs.

2. Our work is toparse an XML document in a most paralleling way with FPGA. When parallelprocessing, flat and multiple rules can be checked in parallel.

3. In order torecognize characters matching the syntactic constructs properly, it is neededto first consider those dependences among the above rules. If a character or astring matching one syntactic construct can also make up another one, we definethese two syntactic constructs are dependent on each other. So we pick up allthese dependences or potential interrelations to improve parallel processing. 

 

Dependences

4 key tips:

1) Tag ‘<’

It can be a partof texts in CDSect,PIor Comment.

It is the first character of STag,ETag,CDSect,PIor Comment.

“element” also begins with tag ‘<’.

 

2) CDSect,PIor Comment

These three kinds of elements are very special. Thetexts in these three can be consisted of any characters in Char except their correspondingclosing tags.

Illegal examples:

<![CDATA[ Hello ]]> world!]]>

<?xmlversion="1.0" ?>encoding="ISO-8859-1" ?>

<!--CDATA[ Hello --> world!-->

<!--CDATA[ Hello, world!--->

 

3) Tag ‘>’

It can be a partof texts in CDSect,PIor Comment.

It is the last character of STag,ETag,CDSect,PIor Comment.

It occurs anywhere of texts in any element except Name.

 

4) Tag ‘</’

It can be a part of texts in CDSect,PIor Comment.

It is the firsttwo characters of ETag.

 

Others

 

XML documentsconsist of a lot of tags. The start tags ‘<’, ‘<?’, ‘<!--’, ‘</’, ‘<![CDATA[’must be in pairs with ‘>’, ‘?>’, ‘-->’, ‘>’, ‘]]>’.

 

5) Tag ‘<?’

It can be a part of texts in CDSect,PIor Comment.

When it is thefirst two characters of PI,it occurs in pair with ‘?>’.

 

Tag ‘<!’

It can be a part of texts in CDSect,PIor Comment.

It is the firsttwo characters of CDSectand Comment.

 

6) Tag ‘<!--’

It can be a part of texts in CDSect,PIor Comment.

When it is thefirst four characters of Comment,it occurs in pair with ‘-->’.

 

7) Tag ‘-->’

It can be a part of texts in CDSector PI.

It can be the closing tag of Comment.

 

8) ‘<![CDATA[’

It can be a part of texts in CDSect,PIor Comment.

When it is thefirst four characters of CDSect,it occurs in pair with ‘]]>’.

 

9) ‘]]>’

It can be a part of texts in PIor Comment.

It can be the closing tag of CDSect.

It can not be inside CDSect.

It can not occur in CharData.

 

10) Tag ‘/>’

It can be a part of texts in CDSect,PIor Comment.

It can be theclosing tag of EmptyElemTag.

 

CDSect,PIor Comment

These three canalso be inside of each other.

11) CDSectand PI

When “ '<?' PITarget(S(Char*- (Char*'?>' Char*)))?'?>' “ is a part of CDSect,it is not Processing Instruction any more and it lose the PIfunction because the text in a CDSectwill not be parsed by a parse.

e.g. <![CDATA[ HelloWorld!<?xml version="1.0"?>]]>

 

CDStartCDataCDEndcan be a part of PI,it is still CDSectinside PI.

e.g. <?xml version="1.0"<![CDATA[SSPKU]]> ?>

 

12) PIand Comment

  '<!--' ((Char- '-') | ('-' (Char- '-')))* '-->' “can be a part of PI,it is still Commentinside PI.

e.g. <?xml version="1.0" <!--encoding="ISO-8859-1-->"?>

 

But when “ '<?'PITarget(S(Char*- (Char*'?>' Char*)))?'?>' “ is a part of Comment,it is not Processing Instruction any more and it lose the PIfunction.

e.g. <!-- HelloWorld!<?xml version="1.0"?>-->

 

13) CDSectand Comment

WhenCDStartCDataCDEnd”is a part of Comment,it is not CDATA Section any more and it lose the CDSectfunction.

e.g. <!-- HelloWorld!<![CDATA[SSPKU]]>  -->

 

When   '<!--'((Char- '-') | ('-' (Char- '-')))* '-->'is a part of CDSect,it will not be parsed by a parse.

e.g. <![CDATA[ HelloWorld!<!--xml --> ]]>

 

 

14) Attribute

 ' " ' ([^<&"] | Reference)*' " '| " ' " ([^<&'] | Reference)*" ' " can be a part of AttValueitself.

e.g. CarNo.=”PKU99 CarNo.= ‘PKU99’”

 

 

  

< 可以出现在PI CDSect Comment内容中

或者作为STag ETag PI CDSect Comment首字符element开始

 

PI CDSect Comment的内容可以包含除了他们结束符以外的任何字符

< > , …….

 

>可以出现在PI CDSect Comment内容中

STag ETag PI CDSectComment结尾字符

除了name 其他任何元素的内容都可以出现

 

</可以出现在PI CDSect Comment内容中

Etag的前两个字符

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值