了解XML

 
英文原文:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnxml/html/UnderstXML.asp
 
Understanding XML
了解XML
 
Dare Obasanjo
Microsoft Corporation
July 2003
 
Dare Obasanjo( 注:人名不译
微软公司
2003年7月
 
Summary: Learn how the Extensible Markup Language (XML) facilitates universal data access. XML is a plain-text, Unicode-based meta-language: a language for defining markup languages. It is not tied to any programming language, operating system, or software vendor. XML provides access to a plethora of technologies for manipulating, structuring, transforming and querying data. (14 printed pages)
摘要:学习可扩展标记语言(XML)是如何推动通用的数据访问。XML是基于Unicode的纯文本元语言:一种定义标记语言。它不依赖任何编程语言、操作系统和软件提供商。XML提供了非常多对数据进行访问、操作、构造、转换和查询的技术。(这里包含14页打印页面)
 
Introduction
XML Everywhere
The XML 1.0 Syntax
The Infoset and the XML Family of Technologies
Conclusion
Further Reading
简介
无处不在的XML
XML1.0语法
信息集和XML系列技术
结束语
进一步阅读
 
Introduction
简介
 
The Extensible Markup Language (XML) was originally envisioned as a language for defining new document formats for the World Wide Web. XML is derived from the Standard Generalized Markup Language (SGML), and can be considered to be a meta-language: a language for defining markup languages. SGML and XML are text-based formats that provide mechanisms for describing document structures using markup tags (words surrounded by '<' and '>'). Web developers may notice some similarity between HTML and XML, which is due to the fact that they are both derived from SGML.
可扩展标记语言(XML)最初设计作为一种用于定义万维网的新文档格式的语言。XML从标准通用标记语言(SGML)衍生而来,被称为是一种定义标记的元语言。SGML和XML都是基于文本格式,提供使用标记(标记文字以'<' 和 '>'括起来)描述文档结构的机制。WEB开发者可能已经注意到,XML和HTML有些相似,那时因为它们都是从SGML衍生而来的。
 
As the use of XML has grown, it is now generally accepted that XML is not only useful for describing new document formats for the Web but is also suitable for describing structured data. Examples of structured data include information that is typically contained in spreadsheets, program configuration files, and network protocols.
随着XML的应用增长,大家普遍认为XML不仅适用于WEB新文档格式的描述,而且适合结构化数据的描述。所谓结构化数据,包括典型的电子表格、程序配置文件和网络协议的信息。
 
XML is preferable to previous data formats because XML can easily represent both tabular data (such as relational data from a database or spreadsheets) and semi-structured data (such as a Web page or business document). Popular pre-existing formats such as comma separated value (CSV) files either work well for tabular data and handle semi-structured data poorly, or like RTF are too specialized for semi-structured text documents. This has led to the widespread adoption of XML as the lingua franca of information interchange.
XML之所以优于先前的数据格式,是因为它可能很轻易地表示列表数据(比如数据厍的关系数据或电子表格)和半结构化数据。以前流行的格式,比如逗号分隔值(CVS)文件非常适合表示列表数据而不足于处理半结构化数据,或者像RTF格式专门对半结构化文本文档进行处理。因此, XML被广泛接受成为信息交换商用混合语言。
 
XML Everywhere
无处不在的XML
 
Besides being able to represent both structured and semi-structured data, XML has a number of characteristics that have caused it to be widely adopted as a data representation format. XML is extensible, platform-independent, and supports internationalization by being fully Unicode compliant. The fact that XML is a text-based format means that when the need arises, one can read and edit XML documents using standard text-editing tools.
除了可用于表示结构化和半结构化数据,XML还有很多特性使它被广泛用于表示数据格式。
XML具有可扩展性,平台独立性,而且完全通过Unicode编码支持国际化。实际上,XML是基于文本格式的,就是说,只要你有需要,可以使用任何标准的文本编辑工具来阅读和编辑它。
 
XML's extensibility manifests itself in a number of ways. First of all, unlike HTML it does not have a fixed vocabulary. Instead, one can define vocabularies specific to particular applications or industries using XML. Secondly, applications that process or consume XML formats are more resistant to changes in the structure of the XML being provided to them than applications that use other formats, as long as such changes are additive. For instance, an application that depends on processing a <Customer> element with a customer-id attribute typically would not break if another attribute, such as last-purchase-date, was added to the <Customer> element. Such flexibility is uncommon in other data formats and is a significant benefit of using XML.
XML的可扩展性表现在很多方面。首先,不同HTML,它没有固定的词汇表。相反,使用XML可以为各种应用或行业定义特定的词汇表。其次,使用或处理XML格式的应用程序比使用其它格式的应用程序为改变数据结构提供了更好的抵抗力,只要是可添加的更改。比如,依赖处理带有customer-id 属性的<Customer>元素,如果把一个叫last-purchase-date的属性添加到<Customer>元素,并不会破坏原先的结构。这种灵活性在其它格式中是罕有的,而这正是使用XML的显著优势。
 
XML is not tied to any programming language, operating system or software vendor. In fact, it is fairly straightforward to produce or consume XML using a variety of programming languages. Platform independence makes XML very useful as a means for achieving interoperability between different programming platforms and operating systems.
XML不依赖任何编程语言、操作系统和软件提供商。实际上,使用各种编程语言来创建或使用XML,都是那样简单直接的。平台独立性使XML为不同编程平台、不同操作系统下协同完成任务提供了一个有效手段。
 
The benefits of exposing data as XML have been acknowledged by many, and have led to a proliferation of XML data sources. Business documents, databases and inter-business communication are all examples of information sources that are moving or have moved to using XML as a representation format. Microsoft products such as Microsoft Office, Microsoft SQL Server and the Microsoft .NET Framework enable end users and developers to produce and consume documents, network messages and other data as XML.
很多人都认为以XML格式发布数据的好处很多,所以大量使用XML格式的数据源。业务文档、数据厍和内部业务通信,这些将转换或已转换的XML表示格式都在使用。微软出品的Microsoft Office, Microsoft SQL Server 和 Microsoft .NET Framework 都能够让最终用户和开发者产生使用文档、网络信息和其它XML格式的数据。
 
The XML 1.0 Syntax
XML 1.0 语法
As mentioned earlier, the W3C XML 1.0 recommendation describes a text-based format for describing structured and semi-structured data using syntax similar to HTML.
正如先前提到那样,W3C XML 1.0 推荐描述了一种用来描述结构化和半结构化数据语法类似HTML的文本格式。
XML and HTML Compared
XML和HTML的比较
Both HTML and XML documents are made up of elements, each of which consists of a "start tag" (such as <order>), an "end tag" (such as </order>), and the information between the two tags (referred to as the contents of the element). Elements can be annotated with attributes that contain metadata about the element and its contents.
HTML和XML的文档都由元素构成,而每个元素由“起始标记”(如 <order>),“结束标记”(如 </order>)和标记之间的信息(称作元素的内容)组成。
However, there are significant differences between HTML and XML. XML is case sensitive while HTML is not. This means that in XML the start tags <Table> and <table> are different, while in HTML they are the same. Another difference between HTML and XML is that XML introduces the concept of well-formedness. The well-formedness rules of XML remove some of ambiguity inherent in processing markup languages like HTML by enforcing rules such as mandating that all attribute values must be in quotes, and that all elements must have either a start tag and end tag or explicitly indicate that they are empty elements. A succinct description of well-formedness is given in section D.2 of the XML FAQ.
但是,HTML和XML是有显著差别的。XML是大小写敏感的,而HTML不是。这是说,在XML中,开始标记<Table> 和 <table> 是不同的,而在HTML中是相同的。另外一个不同的是,XML引进了良好文档格式的概念。XML的良好格式规则去除了处理标记语言(如HTML)与生俱来的语义不清,比如通过强制所有属性值必须用引号引用,而且所有元素必须有开始标记和结束标记或明确标出该元素为空。在XML常见问题的第二部分有给出关于良好格式的简洁描述。
The most significant difference between HTML and XML is that HTML has predefined elements and attributes whose behavior is well specified, while XML does not. Instead, document authors can create their own XML vocabularies that are specific to their application or business needs. XML vocabularies currently exist for a large number of industries and applications from financial filings (XBRL) and financial services (FpML) to Web documents (XHTML) and network protocols (SOAP). The lack of emphasis on predefined elements and attributes that specify how an XML document is rendered or displayed enables document authors to focus on creating documents that contain only relevant semantic information for their particular problem domain. The separation of content from presentation enabled by XML vocabularies allows for greater reuse of information and content repurposing.
HTML和XML的最大差别是HTML所有元素和属性行为都已经明确规定了,而XML并不是这样。相反,文档创建者可以指定为应用程序或业务需求创建自己的XML特殊标记词汇。当前存在大量各种行业应用的XML词汇表,从财政信息报告(XBRL)、金融服务(FpML)到网络文件(XHTML)和网络协议(SOAP)。文档创建者可以不必理会那些指定如何呈现或显示XML文档的预定义元素和属性,而把重点放在包含特定问题域创的相关语义信息上。XML词汇带来了内容和形式的分离,更带来了大量信息和内容的重用。
The Anatomy of an XML Document
XML文档剖析
Below is a sample XML document that represents a customer order for a music store. A point of note is how the document easily represents both the rigidly structured data that describes information about compact discs as well as the semi-structured data containing special instructions and comments about a specific customer.
下面是一个简单XML文档示例,描述了音乐商店的客户订单。
要注意一点,文档非常轻易地使用严格的结构化数据来描述CD信息,同时又使用了包含特定客户的注释和说明的半结构化数据。
<? xml version="1.0" encoding="iso-8859-1"  ?>
<? xml-stylesheet href="orders.xsl" ?>

< order  id ="ord123456" >
    
< customer  id ="cust0921" >
        
< first-name > Dare </ first-name >
        
< last-name > Obasanjo </ last-name >
        
< address >
            
< street > One Microsoft Way </ street >
            
< city > Redmond </ city >
            
< state > WA </ state >
            
< zip > 98052 </ zip >
        
</ address >
    
</ customer >
    
< items >
        
< compact-disc >
            
< price > 16.95 </ price >
            
< artist > Nelly </ artist >
            
< title > Nellyville </ title >
        
</ compact-disc >
        
< compact-disc >
            
< price > 17.55 </ price >
            
< artist > Baby D </ artist >
            
< title > Lil Chopper Toy </ title >
        
</ compact-disc >
    
</ items >
    
<!--   Always go the extra mile for the customer  -->
    
< special-instructions  xmlns:html ="http://www.w3.org/1999/xhtml/" >
        
< html:p > If customer is not available at the address then attempt 
leave package at one of the following locations listed in order of 
which should be attempted first 
< html:ol >
                
< html:li > Next Door </ html:li >
                
< html:li > Front Desk </ html:li >
                
< html:li > On Doorstep </ html:li >
            
</ html:ol >
< html:b > Note </ html:b >  Remember to leave a note detailing where 
to pick up the package.
</ html:p >
    
</ special-instructions >
</ order >
The document begins with the optional XML declaration that specifies what version of XML is being used and character encoding used by the document. This is followed by the xml-stylesheet processing instruction, which is used to bind a style sheet containing formatting instructions to the XML document for use in rendering it in a more attractive manner in user applications such as Web browsers. Processing instructions are generally used to embed application-specific information in an XML document. For instance, most applications that process the contents of the above document would ignore the xml-stylesheet processing instruction. On the other hand, applications used for displaying XML documents such as a Web browser would use the information in the processing instruction to determine where to locate the style sheet that contains special instructions for displaying the document.
文档开头是可选的XML声明,用于指示该文档使用的XML版本和字符编码。接下来的是xml-stylesheet处理指令,用于绑定具有格式化XML文档指令的样式表,使用户在应用程序中(如WEB浏览器)看到更吸引人的XML文档显示方式。处理指令一般用来嵌入指定应用程序信息到XML文档中。比如,大多数应用程序在处理会包含以上内容的文档时会忽略xml-stylesheet处理指令。另一方面,用于显示XML文档的应用程序,如WEB浏览器,就会用处理指令信息,以确定在哪里定位包含显示文档的特定指令的样式表。
Unicode + Angle Brackets = Interoperability
Unicode + 尖括号 = 互操作
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值