http://www.xmlw.ie/aboutxml/wordml.htm

10 篇文章 0 订阅
7 篇文章 0 订阅

http://www.xmlw.ie/aboutxml/wordml.htm

 

WordML

Word 2003 Beta 2 has been released. We have installed a copy, and saved a Word file as XML for you to examine. This is the native binary Word document we used, and the Word XML document generated when saved as XML. The XML document is well-formed, and conforms to the XML Schema called WordML. Below is a slightly annotated version of the WordML mark-up.

Some more complex structures are included in a second sample: wordsample2.doc, wordsample2.xml.

XML and namespace declarations

Here is the top-level XML and namespace declarations, which are similar to Word 2000 and XP.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<?mso-application progid="Word.Document"?>

<w:wordDocument

xmlns:w="http://schemas.microsoft.com/office/word/2003/2/wordml"

xmlns:v="urn:schemas-microsoft-com:vml"

xmlns:w10="urn:schemas-microsoft-com:office:word"

xmlns:SL="http://schemas.microsoft.com/schemaLibrary/2003/2/core"

xmlns:aml="http://schemas.microsoft.com/aml/2001/core"

xmlns:wx="http://schemas.microsoft.com/office/word/2003/2/auxHint"

xmlns:o="urn:schemas-microsoft-com:office:office"

xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xml:space="preserve">

Custom Properties

Here is the custom property section, which includes both built-in and user-defined properties. Note that user-defined properties are assigned an element name, which is a bit silly, as it makes validation more difficult.

<o:DocumentProperties>

  <o:Title>Sample Word file encoded in XML</o:Title>

  <o:Subject>XML, XHTML, Word</o:Subject>

  <o:Author>Eoin Campbell</o:Author>

  <o:LastAuthor>Eoin Campbell</o:LastAuthor>

  <o:Revision>2</o:Revision>

  <o:TotalTime>0</o:TotalTime>

  <o:Created>2003-03-27T14:35:00Z</o:Created>

  <o:LastSaved>2003-03-27T14:35:00Z</o:LastSaved>

  <o:Pages>1</o:Pages>

  <o:Words>103</o:Words>

  <o:Characters>588</o:Characters>

  <o:Company>XML Workshop Ltd.</o:Company>

  <o:Lines>4</o:Lines>

  <o:Paragraphs>1</o:Paragraphs>

  <o:CharactersWithSpaces>690</o:CharactersWithSpaces>

  <o:Version>11.4920</o:Version>

</o:DocumentProperties>

<o:CustomDocumentProperties>

  <o:DCIdentifier dt:dt="string">http://www.xmlw.ie/xml2word/xml2word.xml</o:DCIdentifier>

</o:CustomDocumentProperties>

Style information

Here is a chunk of the style section, which is very long.

<w:fonts>

<w:defaultFonts w:ascii="Times New Roman"

w:fareast="Times New Roman" w:h-ansi="Times New Roman"

w:cs="Times New Roman"/>

<w:font w:name="Tahoma">

<w:panose-1 w:val="020B0604030504040204"/>

<w:charset w:val="00"/>

<w:family w:val="Swiss"/>

<w:pitch w:val="variable"/>

<w:sig w:usb-0="21007A87" w:usb-1="80000000" w:usb-2="00000008" w:usb-3="00000000" w:csb-0="000101FF" w:csb-1="00000000"/>

</w:font>

</w:fonts>

<w:lists>

<w:listDef w:listDefId="0">

  <w:lsid w:val="FFFFFF7F"/>

  <w:plt w:val="SingleLevel"/>

  <w:tmpl w:val="5A5E4FAC"/>

  <w:lvl w:ilvl="0">

    <w:start w:val="1"/>

    <w:pStyle w:val="ListNumber2"/>

    <w:lvlText w:val="%1."/>

    <w:lvlJc w:val="left"/>

    <w:pPr>

    <w:tabs>

      <w:tab w:val="list" w:pos="643"/>

    </w:tabs>

    <w:ind w:left="643" w:hanging="360"/></w:pPr>

  </w:lvl>

</w:listDef>

</w:lists>

<w:styles>

  <w:versionOfBuiltInStylenames w:val="3"/>

  <w:latentStyles w:defLockedState="off" w:latentStyleCount="156"/>

  <w:style w:type="paragraph" w:default="on" w:styleId="Normal">

    <w:name w:val="Normal"/>

    <w:pPr>

       <w:spacing w:before="60" w:after="60"/>

    </w:pPr>

    <w:rPr>

      <wx:font wx:val="Times New Roman"/>

      <w:lang w:val="EN-IE" w:fareast="EN-US" w:bidi="AR-SA"/>

    </w:rPr>

  </w:style>

  <w:style w:type="paragraph" w:styleId="Heading1">

    <w:name w:val="heading 1"/>

    <wx:uiName wx:val="Heading 1"/>

    <w:basedOn w:val="Normal"/>

    <w:next w:val="Normal"/>

    <w:pPr>

      <w:pStyle w:val="Heading1"/>

      <w:keepNext/>

      <w:spacing w:before="240"/>

      <w:outlineLvl w:val="0"/>

    </w:pPr>

    <w:rPr>

      <w:rFonts w:ascii="Arial" w:h-ansi="Arial"/>

      <wx:font wx:val="Arial"/><w:b/>

      <w:kern w:val="28"/>

      <w:sz w:val="28"/>

      <w:lang w:val="EN-GB"/>

    </w:rPr>

  </w:style>

Headings

Here are heading levels 1 to 5. Hierarchy is deduced from the headings, and wrapper elements (wx:sub-section) are added in appropriate places to associate headings and following text. This is really useful, as documents become hierarchical, not linear.

<wx:sub-section>

  <w:p>

    <w:pPr><w:pStyle w:val="Title"/></w:pPr>

    <w:r><w:t>Sample Word file</w:t></w:r>

  </w:p>

  <w:p>

    <w:r><w:t>This file contains various paragraph and character styles, custom properties, tables and images.</w:t></w:r>

  </w:p>

</wx:sub-section>

<wx:sub-section>

  <w:p>

    <w:pPr>

      <w:pStyle w:val="Heading1"/>

    </w:pPr>

    <w:r>

      <w:t>Heading Level 1</w:t>

    </w:r>

  </w:p>

  <w:p>

    <w:r>

      <w:t>Normal paragraph</w:t>

    </w:r>

  </w:p>

  <wx:sub-section>

    <w:p>

      <w:pPr>

        <w:pStyle w:val="Heading2"/>

      </w:pPr>

      <w:r>

        <w:t>Heading Level 2</w:t>

      </w:r>

    </w:p>

    <w:p>

      <w:r>

        <w:t>Normal paragraph</w:t>

      </w:r>

    </w:p>

    <wx:sub-section>

      <w:p>

        <w:pPr>

          <w:pStyle w:val="Heading3"/>

        </w:pPr>

        <w:r>

          <w:t>Heading Level 3</w:t>

        </w:r>

      </w:p>

      <w:p>

        <w:r>

          <w:t>Normal paragraph</w:t>

        </w:r>

      </w:p>

      <wx:sub-section>

        <w:p>

          <w:pPr>

            <w:pStyle w:val="Heading4"/>

          </w:pPr>

          <w:r>

            <w:t>Heading Level 4</w:t>

          </w:r>

        </w:p>

        <w:p>

          <w:r>

            <w:t>Normal paragraph</w:t>

          </w:r>

        </w:p>

        <wx:sub-section>

          <w:p>

            <w:pPr>

              <w:pStyle w:val="Heading5"/>

            </w:pPr>

            <w:r>

              <w:t>Heading Level 5</w:t>

            </w:r>

          </w:p>

        </wx:sub-section>

      </wx:sub-section>

    </wx:sub-section>

  </wx:sub-section>

</wx:sub-section>

Lists

Here is the XML markup for bulleted and numbered lists. All hierarchy is lost, because the generated mark-up doesn't contain any nested structures. This is not too surprising, as Word doesn't have the concept of nested lists anyway, but hierarchy is deduced for headings, so why not lists too? The hierarchy could be re-instated by post-processing with a very clever piece of XSLT on export, but why should you have to?

Perhaps if an XML Schema is used, you can assign hierarchical levels to lists.

<w:p>

  <w:pPr>

    <w:pStyle w:val="ListNumber"/>

    <w:listPr>

      <wx:t wx:val="1." wx:wTabBefore="0" wx:wTabAfter="225"/>

      <wx:font wx:val="Times New Roman"/>

    </w:listPr>

  </w:pPr>

  <w:r>

    <w:t>This is a numbered list item </w:t>

  </w:r>

</w:p>

<w:p>

  <w:pPr>

    <w:pStyle w:val="ListNumber"/>

    <w:listPr>

      <wx:t wx:val="2." wx:wTabBefore="0" wx:wTabAfter="225"/>

      <wx:font wx:val="Times New Roman"/>

    </w:listPr>

  </w:pPr>

  <w:r>

    <w:t>Item 2.</w:t>

  </w:r>

</w:p>

<w:p>

  <w:pPr>

    <w:pStyle w:val="ListBullet"/>

    <w:listPr>

      <wx:t wx:val="·" wx:wTabBefore="0" wx:wTabAfter="270"/>

      <wx:font wx:val="Symbol"/>

    </w:listPr>

  </w:pPr>

  <w:r>

    <w:t>This is a bullet list item </w:t>

  </w:r>

</w:p>

<w:p>

  <w:pPr>

    <w:pStyle w:val="ListBullet"/>

    <w:listPr>

      <wx:t wx:val="·" wx:wTabBefore="0" wx:wTabAfter="270"/>

      <wx:font wx:val="Symbol"/>

    </w:listPr>

  </w:pPr>

  <w:r>

    <w:t>Item 2.</w:t>

  </w:r>

</w:p>

Character level mark-up

Here are unnamed styles like bold and italic, and named styles and hyperlinks. The format specification is not applied by wrapping the text with an element, but instead by specifying an empty element that switches on the formatting required. The following chunk of text has that format. This is how RTF does this type of inline formatting.

Inline elements

Element

Meaning

<w:p>

Paragraph

<w:r>

Text run container

<w:rPr>

Text run properties container

  <w:u>

Underline property flag

  <w:i>

Italic property flag

  <w>b>

Bold property flag

<w:t>

Text container

<w:p>

  <w:r>

    <w:t>Some unnamed character level styles </w:t>

  </w:r>

  <w:r>

    <w:rPr>

      <w:u w:val="single"/>

    </w:rPr>

    <w:t>underline</w:t>

  </w:r>

  <w:r>

    <w:t>, </w:t>

  </w:r>

  <w:r>

    <w:rPr>

      <w:i/>

      <w:i-cs/>

    </w:rPr>

    <w:t>italic</w:t>

  </w:r>

  <w:r>

    <w:t>, </w:t>

  </w:r>

  <w:r>

    <w:rPr>

      <w:b/>

    </w:rPr>

    <w:t>bold</w:t>

  </w:r>

  <w:r>

    <w:t>, </w:t>

  </w:r>

  <w:hlink w:dest="http://www.xmlw.ie/">

  <w:r>

    <w:rPr>

      <w:rStyle w:val="Hyperlink"/>

    </w:rPr>

    <w:t>Hyperlink</w:t>

  </w:r>

  </w:hlink>

  <w:r>

    <w:t>. </w:t>

  </w:r>

</w:p>

<w:p>

  <w:r>

    <w:t>Named character styles: </w:t>

  </w:r>

  <w:r>

    <w:rPr>

      <w:rStyle w:val="EIRORef"/>

    </w:rPr>

    <w:t>EIRORef</w:t>

  </w:r>

</w:p>

Tables

This is a table with 5 columns and 2 rows, with a spanned row and a spanned column.

<w:tbl><w:tblPr><w:tblW w:w="0" w:type="auto"/><w:tblBorders><w:top w:val="single" w:sz="12" wx:bdrwidth="30" w:space="0" w:color="000000"/><w:left w:val="single" w:sz="12" wx:bdrwidth="30" w:space="0" w:color="000000"/><w:bottom w:val="single" w:sz="12" wx:bdrwidth="30" w:space="0" w:color="000000"/><w:right w:val="single" w:sz="12" wx:bdrwidth="30" w:space="0" w:color="000000"/><w:insideH w:val="single" w:sz="6" wx:bdrwidth="15" w:space="0" w:color="000000"/><w:insideV w:val="single" w:sz="6" wx:bdrwidth="15" w:space="0" w:color="000000"/></w:tblBorders><w:tblLook w:val="0000003F"/></w:tblPr><w:tblGrid><w:gridCol w:w="1704"/><w:gridCol w:w="1704"/><w:gridCol w:w="1704"/><w:gridCol w:w="1705"/><w:gridCol w:w="1705"/></w:tblGrid><w:tr><w:tc><w:tcPr><w:tcW w:w="1704" w:type="dxa"/><w:tcBorders><w:bottom w:val="single" w:sz="12" wx:bdrwidth="30" w:space="0" w:color="000000"/></w:tcBorders></w:tcPr><w:p/></w:tc><w:tc><w:tcPr><w:tcW w:w="1704" w:type="dxa"/><w:tcBorders><w:bottom w:val="single" w:sz="12" wx:bdrwidth="30" w:space="0" w:color="000000"/></w:tcBorders></w:tcPr><w:p/></w:tc><w:tc><w:tcPr><w:tcW w:w="1704" w:type="dxa"/><w:tcBorders><w:bottom w:val="single" w:sz="12" wx:bdrwidth="30" w:space="0" w:color="000000"/></w:tcBorders></w:tcPr><w:p/></w:tc><w:tc><w:tcPr><w:tcW w:w="1705" w:type="dxa"/><w:tcBorders><w:bottom w:val="single" w:sz="12" wx:bdrwidth="30" w:space="0" w:color="000000"/></w:tcBorders></w:tcPr><w:p/></w:tc><w:tc><w:tcPr><w:tcW w:w="1705" w:type="dxa"/><w:tcBorders><w:bottom w:val="single" w:sz="12" wx:bdrwidth="30" w:space="0" w:color="000000"/></w:tcBorders></w:tcPr><w:p/></w:tc></w:tr><w:tr><w:tc><w:tcPr><w:tcW w:w="1704" w:type="dxa"/><w:tcBorders><w:top w:val="single" w:sz="12" wx:bdrwidth="30" w:space="0" w:color="000000"/></w:tcBorders></w:tcPr><w:p/></w:tc><w:tc><w:tcPr><w:tcW w:w="1704"

w:type="dxa"/><w:tcBorders><w:top w:val="single" w:sz="12" wx:bdrwidth="30"

w:space="0" w:color="000000"/></w:tcBorders></w:tcPr><w:p/></w:tc><w:tc><w:tcPr>

<w:tcW w:w="1704" w:type="dxa"/><w:tcBorders><w:top w:val="single" w:sz="12"

wx:bdrwidth="30" w:space="0" w:color="000000"/></w:tcBorders></w:tcPr><w:p/>

</w:tc><w:tc><w:tcPr><w:tcW w:w="1705" w:type="dxa"/><w:tcBorders><w:top

w:val="single" w:sz="12" wx:bdrwidth="30" w:space="0" w:color="000000"/>

</w:tcBorders></w:tcPr><w:p/></w:tc><w:tc><w:tcPr><w:tcW w:w="1705"

w:type="dxa"/><w:tcBorders><w:top w:val="single" w:sz="12" wx:bdrwidth="30"

w:space="0" w:color="000000"/></w:tcBorders></w:tcPr><w:p/></w:tc></w:tr>

</w:tbl><w:p/><w:sectPr><w:pgSz w:w="11906" w:h="16838"/><w:pgMar w:top="1440"

w:right="1800" w:bottom="1440" w:left="1800" w:header="708" w:footer="708"

w:gutter="0"/><w:cols w:space="708"/><w:docGrid w:line-pitch="360"/>

Images

Here is the markup for a linked image. Vector Markup Language (VML), Microsofts' non-standard alternative to SVG, is used, which is a pity.

Embedded images are stored within the XML file in an encoded format, and not in an external file. This means a single XMLfile represents a complete Word file, and is an improvement on Word 2000/XP, which create multiple files when using the Save as HTML function.

<w:pict>

  <v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75"

      o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe"

      filled="f" stroked="f">

    <v:stroke joinstyle="miter"/>

    <v:formulas>

      <v:f eqn="if lineDrawn pixelLineWidth 0"/>

      <v:f eqn="sum @0 1 0"/><v:f eqn="sum 0 0 @1"/>

      <v:f eqn="prod @2 1 2"/>

      <v:f eqn="prod @3 21600 pixelWidth"/>

      <v:f eqn="prod @3 21600 pixelHeight"/>

      <v:f eqn="sum @0 0 1"/><v:f eqn="prod @6 1 2"/>

      <v:f eqn="prod @7 21600 pixelWidth"/>

      <v:f eqn="sum @8 21600 0"/>

      <v:f eqn="prod @7 21600 pixelHeight"/>

      <v:f eqn="sum @10 21600 0"/>

    </v:formulas>

    <v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>

    <o:lock v:ext="edit" aspectratio="t"/>

  </v:shapetype>

  <v:shape id="_x0000_i1025" type="#_x0000_t75"

      alt="XML Workshop Ltd." style="width:150pt;height:75pt">

    <v:imagedata src="D:/yawconline/test/xmlw.gif"/>

  </v:shape>

</w:pict>

Footnotes

This is how a footnote is encoded. The text of the footnote is embedded within the paragraph, which seems like a sensible option.

<w:p><w:r><w:t>This paragraph contains a footnote</w:t></w:r>

<w:r><w:rPr><w:rStyle w:val="FootnoteReference"/></w:rPr>

<w:footnote><w:p><w:pPr><w:pStyle w:val="FootnoteText"/></w:pPr><w:r><w:rPr><w:rStyle w:val="FootnoteReference"/></w:rPr><w:footnoteRef/></w:r><w:r><w:t> This is footnote text</w:t></w:r></w:p></w:footnote>

</w:r><w:r><w:t>.</w:t></w:r></w:p>

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值