WordprocessingML (docx)

WordprocessingML 或 docx 文件是一个 zip 文件(一个包),其中包含许多“部分”——通常是 UTF-8 或 UTF-16 编码的 XML 文件,尽管有严格定义,但一部分是字节流。该包还可能包含其他媒体文件,例如图像和视频。该结构是根据开放包装约定组织的。

您可以通过简单地将任何 docx 文件重命名为 zip 文件并解压缩文件来查看文件结构和文件。 WordprocessingML 文件结构

内容类型
每个包都必须有一个 [Content_Types].xml,位于包的根目录。此文件包含包中部件的所有内容类型的列表。每个部分及其类型都必须在 [Content_Types].xml 中列出。以下是主文档部分的内容类型:

![](https://img-blog.csdnimg.cn/d7e4b065c8674653abbf858ae47d55fc.png)

在向包中添加新部件时,请务必牢记这一点。

关系
每个包都包含一个关系部分,该部分定义了其他部分之间的关​​系以及与包外资源的关系。这将关系与内容分开,并且可以轻松更改关系,而无需更改引用目标的源。

在这里插入图片描述

包关系部分
对于 OOXML 包,_rels 文件夹中始终有一个关系部分 (.rels),用于标识包的起始部分或包关系。例如,以下内容定义了内容的开始部分的标识:

app.xml 和 core.xml 的 .rels 中通常也有关系。

除了包的关系部分之外,作为一个或多个关系来源的每个部分都将拥有自己的关系部分。每个这样的关系部分都可以在该部分的 _rels 子文件夹中找到,并通过将“.rels”附加到该部分的名称来命名。通常,主要内容部分 (document.xml) 有自己的关系部分。它将包含与内容其他部分的关系,例如styles.xml、themes,xml 和footer.xml,以及外部链接的URI。

在这里插入图片描述

文档关系部分
关系可以是显式的,也可以是隐式的。对于显式关系,使用 元素的 Id 属性引用资源。也就是说,源中的 Id 直接映射到关系项的 Id,并显式引用目标。

例如,一个文档可能包含这样的超链接:

<w:超链接 r:id=“rId4”>
r:id=“rId4” 引用文档 (document.xml.rels) 的关系部分中的以下关系。

对于隐式关系,没有对 Id 的直接引用。相反,参考被理解。例如,文档可能包含对脚注的引用,如下所示。

<w:footnoteReference r:id=“2”>
在这种情况下,对带有 w:id=“2” 的脚注的引用被理解为在有脚注时存在的脚注部分。在脚注部分,我们将看到以下内容。

<w:footnote w:id=“2”>
特定于 WordprocessingML 文档的部分
下面是 WordprocessingML 包中特定于 WordprocessingML 文档的可能部分的列表。请记住,一个文档可能只有其中的几个部分。例如,如果文档没有脚注,则包中将不包含脚注部分。

零件说明
评论
包含文档中的注释。如果有词汇表,可能有主文档的注释部分和词汇表的注释部分。

文档设置
指定文档的设置,包括是否隐藏拼写和语法错误、跟踪修订、写保护等。如果有词汇表,可能有一个用于主文档的文档设置部分和一个用于词汇表的设置部分。

尾注
包含文档的尾注。如果有词汇表,可能有主文档的尾注部分和词汇表的尾注部分。

字体表
指定有关文档中使用的字体的信息。当指定的字体在系统上不可用时,应用程序将使用部件中的信息来确定使用哪些字体来显示文档。如果有词汇表,可能有一个用于主文档的字体表和一个用于词汇表的字体表。

页脚
包含页脚的信息。请注意,文档的每个部分可能包含第一页、奇数页和偶数页的页脚。所以可能有多个页脚部分,这取决于文档网中有多少个部分以及这些部分的页脚类型。

附注
包含文档的脚注。如果有词汇表,则可能有主文档的脚注部分和词汇表的脚注部分。

词汇表
这是一个补充文档存储位置,其中可能包含随文档携带但从主文档内容中看不到的内容。它用于存储可选的文档片段。只允许一个。

标题
包含标题的信息。请注意,文档的每个部分可能包含第一页、奇数页和偶数页的页眉。所以可能有多个标题部分,这取决于文档网中有多少个节以及这些节的标题类型。

主文件
包含文档的正文。

编号定义
包含文档中每个编号定义的结构定义。如果有词汇表,可能有一个用于主文档的编号定义部分和一个用于词汇表的编号定义部分。

样式定义
包含文档使用的一组样式的定义。如果有词汇表,可能有一个用于主文档的样式定义部分和一个用于词汇表的样式定义部分。

网页设置
包含文档使用的特定于 Web 的设置的定义。这些设置指定了两个类别:与可在 WordprocessingML 文档中使用的 HTML 文档(即框架集定义)相关的设置,以及影响文档在另存为 HTML 时的处理方式的设置。如果有词汇表,可能有一个用于主文档的 Web 设置部分和一个用于词汇表的 Web 设置部分。

其他 OOXML 文档共享的部分
任何 OOXML 包中都可能出现多种部件类型。下面是一些与 WordprocessingML 文档更相关的部分。

零件说明
嵌入式封装
包含一个完整的包,在引用包的内部或外部。例如,WordprocessingML 文档可能包含电子表格或演示文稿文档。

扩展文件属性(通常在 docProps/app.xml 中找到)
包含特定于 OOXML 文档的属性——诸如使用的模板、页数和单词数以及应用程序名称和版本等属性。

文件属性,核心
核心文件属性使用户能够发现和设置包中的通用属性——诸如创建者姓名、创建日期、标题等属性。尽可能使用都柏林核心属性(一组用于描述资源的元数据术语)。

字体
包含直接嵌入到文档中的字体。字体可以存储为位图字体,其中每个字形存储为光栅图像,或符合 ISO/IEC 14496-22:2007 的格式。

图片
文档通常包含图像。图像可以作为 zip 项目存储在包中。该项目必须通过图像部分关系和适当的内容类型来标识。

主题
DrawingML 是一种跨 OOXML 文档类型的共享语言。它包括一个主题部分,当文档使用主题时,该部分包含在 WordprocessingML 文档中。主题部分包含有关文档主题的信息,即配色方案、字体和格式方案等信息。

A WordprocessingML or docx file is a zip file (a package) containing a number of “parts”–typically UTF-8 or UTF-16 encoded XML files, though strictly defined, a part is a stream of bytes. The package may also contain other media files, such as images and video. The structure is organized according to the Open Packaging Conventions.

You can look at the file structure and the files by simply renaming any docx file to a zip file and unzipping the file. WordprocessingML file structure

Content Types
Every package must have a [Content_Types].xml, found at the root of the package. This file contains a list of all of the content types of the parts in the package. Every part and its type must be listed in [Content_Types].xml. The following is a content type for the main document part:

It’s important to keep this in mind when adding new parts to the package.

Relationships
Every package contains a relationships part that defines the relationships between the other parts and to resources outside of the package. This separates the relationships from content and makes it easy to change relationships without changing the sources that reference targets.

package relationships part
For an OOXML package, there is always a relationships part (.rels) within the _rels folder that identifies the starting parts of the package, or the package relationships. For example, the following defines the identity of the start part for the content:

.

There are also typically relationships within .rels for app.xml and core.xml.

In addition to the relationships part for the package, each part that is the source of one or more relationships will have its own relationships part. Each such relationship part is found within a _rels sub-folder of the part and is named by appending ‘.rels’ to the name of the part. Typically the main content part (document.xml) has its own relationships part. It will contain relationships to the other parts of the content, such as styles.xml, themes,xml, and footer.xml, as well as the URIs for external links.

document relationships part
A relationship can be either explicit or implicit. For an explicit relationship, a resource is referenced using the Id attribute of a element. That is, the Id in the source maps directly to an Id of a relationship item, with an explicit reference to the target.

For example, a document might contain a hyperlink such as this:

<w:hyperlink r:id=“rId4”>
The r:id=“rId4” references the following relationship within the relationships part for the document (document.xml.rels).

For an implicit relationship, there is no such direct reference to a Id. Instead, the reference is understood. For example, a document might contain a reference to a footnote as shown below.

<w:footnoteReference r:id=“2”>
In this case, the reference to the footnote with w:id=“2” is understood to be in the Footnotes part that exists when there are footnotes. In the Footnotes part we will see the following.

<w:footnote w:id=“2”>
Parts Specific to WordprocessingML Documents
Below is a list of the possible parts of a WordprocessingML package that are specific to WordprocessingML documents. Keep in mind that a document may only have a few of these parts. For example, if a document has no footnotes, then a footnotes part will not be included in the package.

Part Description
Comments
Contains the comments in the document. There may be a comments part for the main document and one for the glossary, if there is a glossary.

Document Settings
Specifies the settings for the document, including such things as whether to hide spelling and grammatical errors, track revisions, write protection, etc. There may be a document settings part for the main document and one for the glossary, if there is a glossary.

Endnotes
Contains the endnotes for a document. There may be an endnotes part for the main document and one for the glossary, if there is a glossary.

Font Table
Specifies information about the fonts used in the document. The application will use the information in the part to determine which fonts to use to display the document when the specified fonts are not available on the system. There may be a font table for the main document and one for the glossary, if there is a glossary.

Footer
Contains the information for a footer. Note that each section of a document may contain a footer for the first page, odd pages, and even pages. So there may be multiple footer parts, depending upon how many sections there are in the documnet and the types of footers for the sections.

Footnotes
Contains the footnotes for the document. There may be a footnotes part for the main document and one for the glossary, if there is a glossary.

Glossary
This is a supplementary document storage location which may contain content that is carried with the document but is not visible from the main document contents. It is intended for storage of optional document fragments. Only one is permitted.

Header
Contains the information for a header. Note that each section of a document may contain a header for the first page, odd pages, and even pages. So there may be multiple header parts, depending upon how many sections there are in the documnet and the types of headers for the sections.

Main Document
Contains the body of the document.

Numbering Definitions
Contains the definition for the structure of each numbering definition in the document. There may be a numbering definitions part for the main document and one for the glossary, if there is a glossary.

Style Definitions
Contains the definitions for a set of styles used by the document. There may be a styles definitions part for the main document and one for the glossary, if there is a glossary.

Web Settings
Contains the definitions for web-specific settings used by the document. These settings specify two categories: settings related to HTML documents (that is, frameset definitions) that can be used in WordprocessingML documents, and settings which affect how the document is handled when saved as HTML. There may be a web settings part for the main document and one for the glossary, if there is a glossary.

Parts Shared by Other OOXML Documents
There are a number of part types that may appear in any OOXML package. Below are some of the more relevant parts for WordprocessingML documents.

Part Description
Embedded package
Contains a complete package, either internal or external to the referencing package. For example, a WordprocessingML document might contain a spreadsheet or presentation document.

Extended File Properties (often found at docProps/app.xml)
Contains properties specific to an OOXML document–properties such as the template used, the number of pages and words, and the application name and version.

File Properties, Core
Core file properties enable the user to discover and set common properties within a package–properties such as creator name, creation date, title. Dublin Core properties (a set of metadate terms used to describe resources) are used whenever possible.

Font
Contains a font embedded directly into the document. Fonts can be stored as either bitmapped font in which each glyph is stored as a raster image, or in a format conforming to ISO/IEC 14496-22:2007.

Image
Documents often contain images. An image can be stored in a package as a zip item. The item must be identified by an image part relationship and the appropriate content type.

Theme
DrawingML is a shared language across the OOXML document types. It includes a theme part that is included in WordprocessingML documents when the document uses a theme. The theme part contains information about a document’s theme, that is, such information as the color scheme, font and

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值