WordprocessingML 文档是一个包含许多不同部分(主要是 XML 文件)的包。但是,大部分实际内容都可以在主文档部分中找到。该内容主要由段落和表格组成。
段落
段落 (<w:p>) 是块级内容的基本单位。也就是说,它是从新行开始的内容划分。它通常有两块。首先声明段落的格式(或属性),然后是内容。
格式可以直接声明(“本段应居中”),也可以通过引用样式间接声明(“本段应使用 X 样式,使段落居中”)。或者它可以将两者结合起来。段落格式在 <w:pPr> 内。
段落的内容包含在一次或多次运行 (<w:r>) 中。运行是非块内容;它们定义了不一定从新行开始的文本区域。与段落一样,它们由格式/属性定义和内容组成。格式在 <w:rPr> 中指定,可以是直接格式、通过样式引用的间接格式,或两者兼而有之。
一次运行可以分成更小的运行,或者如果它们具有相同的属性,则可以组合运行。因此,例如,如果一个句子包含一个粗体字,则该句子必须分解为多次运行以说明句子的粗体和非粗体成分。
运行的内容主要由文本元素 (<w:t>) 组成,这些元素本身包含构成读取内容的实际字符数据。运行还可能包含中断、制表符、符号、图像和字段。下面是一个非常简单的段落示例。
<w:p>
<w:pPr>
<w:jc w:val=“center”>
<w:pPr>
<w:r>
<w:rPr>
<w:b/>
</w:rPr>
<w:t>这是文字。</w:t>
</w:r>
</w:p>
上面的示例以及您将在此站点上看到的几乎所有示例 XML 中都省略了可以添加以跟踪编辑会话的可选信息。此类信息(通常以属性的形式)会使您在查看 XML 底层 Word 文档时看到的 XML 变得混乱。为了清楚起见,这里省略了。下面显示了一个示例。
<w:p w:rsidR=“00D57EDE” w:rsidRDefault=“00D57EDE”>
. . .
</w:p>
表
表格是另一种类型的块级内容。一个表由行和列组成。表 (<w:tbl>) 的规范可以分为三个部分。与段落和运行一样,首先是属性,对于表格,它们在 <w:tblPr> 中定义。
然而,与段落和运行不同的是,表格将内容分成行,并且没有两行需要具有相同的列数。这给表的定义增加了一定程度的复杂性。 WordprocessingML 通过为 <w:tblGrid> 中的表格定义一个“网格”来解决这个挑战。此表网格定义是表定义的第二部分。
A WordprocessingML document is a package containing a number of different parts, mostly XML files. However, most of the actual content is found within the main document part. And that content is mostly composed of paragraphs and tables.
Paragraphs
A paragraph (<w:p>) is the basic unit of block-level content. That is, it’s a division of content that begins on a new line. It typically has two pieces. The formatting (or properties) for the paragraph is declared first, followed by the content.
The formatting can be declared directly (“this paragraph shall be centered”) or it can be declared indirectly by referencing a style (“this paragraph shall use the X style, which centers paragraphs”). Or it can do a combination of both. Paragraph formatting is within a <w:pPr>.
The content of the paragraph is contained in one or more runs (<w:r>). Runs are non-block content; they define regions of text that do not necessarily begin on a new line. Like paragraphs, they are comprised of formatting/property definitions, followed by content. The formatting is specified within a <w:rPr> and can be direct formatting, indirect formatting through a style reference, or both.
A run can be divided into smaller runs or runs can be combined if they have the same properties. So, for example, if a sentence contains one word that is bold, then the sentence must be broken up into multiple runs to account for the bold and non-bold components of the sentence.
The content of a run is comprised mostly of text elements (<w:t>), which themselves contain the actual character data that comprises read content. A run might also contain breaks, tabs, symbols, images, and fields. Below is a sample of a very simple paragaph.
<w:p>
<w:pPr>
<w:jc w:val=“center”>
<w:pPr>
<w:r>
<w:rPr>
<w:b/>
</w:rPr>
<w:t>This is text.</w:t>
</w:r>
</w:p>
Omitted from the above example, and from nearly all sample XML you’ll see on this site, is the optional information that can be added to track editing sessions. Such information, typically in the form of attributes, clutter the XML you’ll see as you look at the XML underlying Word documents. It is omitted here for the sake of clarity. An example is shown below.
<w:p w:rsidR=“00D57EDE” w:rsidRDefault=“00D57EDE”>
. . .
</w:p>
Tables
Tables are another type of block-level content. A table consists of rows and columns. The specification for a table (<w:tbl>) can be broken up into three parts. Like paragraphs and runs, there are first the properties, and for tables they are defined within a <w:tblPr>.
Unlike paragraphs and runs, however, a table divides the content into rows, and no two rows need to have the same number of columns. This adds a level of complexity to the definition of a table. WordprocessingML addresses this challenge by defining a “grid” for the table within a <w:tblGrid>. This table grid definition is the second part of the table definition.