9 Text

The following sections discuss issues surrounding the structuring of text. Elements that present text (alignment elements, font elements, style sheets, etc.) are discussed elsewhere in the specification. For information about characters, please consult the section on the document character set.

本章下面的部分将围绕文本的结构进行讨论。展现文本的元素(对齐方式元素,字体元素,样式表等)在本规范的其他地方讨论。想要获取关于字符的信息,请参考文档字符集部分。

9.1 White space

The document character set includes a wide variety of white space characters. Many of these are typographic elements used in some applications to produce particular visual spacing effects. In HTML, only the following characters are defined as white space characters:

文档字符集中包含了很多空白字符。他们当中很多都是在一些应用中产生特殊视觉空白效果的排版元素。在HTML中,只有下面的字符被定义成空白字符:

  • ASCII space ( )
  • ASCII tab (	)
  • ASCII form feed ()
  • Zero-width space (​)

  • ASCII 空格 ( )
  • ASCII 制表符 (	)
  • ASCII 换页符 ()
  • 零宽度空格 (​)

Line breaks are also white space characters. Note that although 
 and 
 are defined in [ISO10646] to unambiguously separate lines and paragraphs, respectively, these do not constitute line breaks in HTML, nor does this specification include them in the more general category of white space characters.

折行也是空白字符。请注意虽然在[ISO10646]中字符 
 和
定义用来分离行和段落,但它们在HTML中并不作为折行使用,本规范也没有将它们作为空白字符来对待。

This specification does not indicate the behavior, rendering or otherwise, of space characters other than those explicitly identified here as white space characters. For this reason, authors should use appropriate elements and styles to achieve visual formatting effects that involve white space, rather than space characters.

除了在这里将所有的字符标记为空白字符外,本规范不规定空白字符的行为,如展现或其他的行为。由于这个原因,作者应该使用合适的元素以及样式表来获得空白的视觉格式效果,而不是使用空白字符。

For all HTML elements except PRE, sequences of white space separate "words" (we use the term "word" here to mean "sequences of non-white space characters"). When formatting text, user agents should identify these words and lay them out according to the conventions of the particular written language (script) and target medium.

对于除了PRE以外的所有HTML元素,空白符的序列分隔”单词“(我们这里使用”单词“的含义是”非空白符的字符序列“)。当格式化文本时,用户代理应该识别出这些单词并且根据相应书写语言以及目标介质来将它们展示出来。

This layout may involve putting space between words (called inter-word space), but conventions for inter-word space vary from script to script. For example, in Latin scripts, inter-word space is typically rendered as an ASCII space ( ), while in Thai it is a zero-width word separator (​). In Japanese and Chinese, inter-word space is not typically rendered at all.

这种展示可能包含在单词之间放置空格(叫做 词间空格),但是词间空格的约定会根据脚本的不同而不同。例如,在拉丁脚本中,词间空格通常被作为ASCII空格( )来展现,然而在泰国语中将会是零宽度单词分隔符(​)。在日文和中文中,词间空格通常不会被展现。

Note that a sequence of white spaces between words in the source document may result in an entirely different rendered inter-word spacing (except in the case of the PRE element). In particular, user agents should collapse input white space sequences when producing output inter-word space. This can and should be done even in the absence of language information (from the lang attribute, the HTTP "Content-Language" header field (see [RFC2616], section 14.12), user agent settings, etc.).

请注意除了PRE元素外,在源文档中词间的空白符序列可能在展现时会出现完全不同的词间空白。特别的,用户代理应该在产生词间空白时,瓦解掉输入的空白符序列。及时缺少语言信息(该语言信息从lang属性,HTTP "Content-Language" 头字段(参考[RFC2616], 14.12部分),用户代理设置等等),这样的操作也应该被执行。

The PRE element is used for preformatted text, where white space is significant.

PRE元素被用于预格式化的文本,在哪里空白符是不会被瓦解的。

In order to avoid problems with SGML line break rules and inconsistencies among extant implementations, authors should not rely on user agents to render white space immediately after a start tag or immediately before an end tag. Thus, authors, and in particular authoring tools, should write:

为了避免与SGML折行规则相关的问题以及在不同实现的不一致性,作者应该依赖用户代理来展示出现在紧邻开始标签之后空白或者紧邻结束标签之前的空白。所以,作者以及使用某种撰写工具应该写成:

  <P>We offer free <A>technical support</A> for subscribers.</P>

and not:

而不是:

  <P>We offer free<A> technical support </A>for subscribers.</P>

9.2 Structured text

9.2.1 Phrase elements: EM, STRONG, DFN, CODE, SAMP, KBD, VAR, CITE, ABBR, and ACRONYM

<!ENTITY % phrase "EM | STRONG | DFN | CODE |
                   SAMP | KBD | VAR | CITE | ABBR | ACRONYM" >
<!ELEMENT (%fontstyle;|%phrase;) - - (%inline;)*>
<
!ATTLIST (%fontstyle;|%phrase;)
  %attrs;                              -- %coreattrs, %i18n, %events --
  >

Start tag: required, End tag: required

开始标签:必须,结束标签:必须

Attributes defined elsewhere

在其他地方定义的属性

Phrase elements add structural information to text fragments. The usual meanings of phrase elements are following:

短语型元素向文本段落中添加结构化信息。短语型元素的一般含义如下:

EM:
Indicates emphasis
表示强调.
STRONG:
Indicates stronger emphasis
表示突出强调.
CITE:
Contains a citation or a reference to other sources
承载一个引证或者一个指向其他资源的引用.
DFN:
Indicates that this is the defining instance of the enclosed term
表示术语定义.
CODE:
Designates a fragment of computer code
表示一段计算机代码.
SAMP:
Designates sample output from programs, scripts, etc
表示从程序,脚本等中输出的例子.
KBD:
Indicates text to be entered by the user
表示用户输入的文本.
VAR:
Indicates an instance of a variable or program argument
表示一个变量或程序参数的实例.
ABBR:
Indicates an abbreviated form (e.g., WWW, HTTP, URI, Mass., etc.)
表示一个缩写形式(比如:WWW,HTTP,URI,Mass,等).
ACRONYM:
Indicates an acronym (e.g., WAC, radar, etc.)
表示一个首字母缩写形式(例如:WAC,radar等).

EM and STRONG are used to indicate emphasis. The other phrase elements have particular significance in technical documents. These examples illustrate some of the phrase elements:

EM以及STRONG用来表示强调。其他的短语型元素在技术文档中会有非常大的用处。下面的例子展示了一些短语型元素:

As <CITE>Harry S. Truman</CITE> said,

<Q lang="en-us">The buck stops here.</Q>



More information can be found in <CITE>[ISO-0000]</CITE>.


Please refer to the following reference number in future
correspondence: <STRONG>1-234-55</STRONG>

The presentation of phrase elements depends on the user agent. Generally, visual user agents present EM text in italics and STRONG text in bold font. Speech synthesizer user agents may change the synthesis parameters, such as volume, pitch and rate accordingly.

短语型元素的展现依赖于用户代理。一般来说,可视化用户代理用斜体表示EM,用粗体表示STRONG。语音合成器型用户代理可能会改变合成参数,例如会相应地音量,音高以及频率等。

The ABBR and ACRONYM elements allow authors to clearly indicate occurrences of abbreviations and acronyms. Western languages make extensive use of acronyms such as "GmbH", "NATO", and "F.B.I.", as well as abbreviations like "M.", "Inc.", "et al.", "etc.". Both Chinese and Japanese use analogous abbreviation mechanisms, wherein a long name is referred to subsequently with a subset of the Han characters from the original occurrence. Marking up these constructs provides useful information to user agents and tools such as spell checkers, speech synthesizers, translation systems and search-engine indexers.

ABBR和ACRONYM元素允许用户清晰的表示缩写和首字母缩写形式。希望语言汇总会大量使用首字母缩写:例如,"GmbH", "NATO", 和 "F.B.I.",另外也会大量使用缩写,比如:"M.", "Inc.", "et al.", "etc."。中文和日语也使用类似的缩写机制,即:一个长名字来引用一个段话。对这些构件进行标记可以为用户代理及工具(例如:拼写检查,语言合成,翻译系统以及搜索引擎的索引器)提供很多有用信息。

The content of the ABBR and ACRONYM elements specifies the abbreviated expression itself, as it would normally appear in running text. The title attribute of these elements may be used to provide the full or expanded form of the expression.

ABBR和ACRONYM元素的内容表示缩写本身,它通常显示在正式文本中。这些元素的title属性可以被用来提偶那个缩写的缩写前内容。

Here are some sample uses of ABBR:

下面是使用ABBR的一些例子:

  <P>   
<ABBR title="World Wide Web">WWW</ABBR>   
<ABBR lang="fr"          title="Soci&eacute;t&eacute; Nationale des Chemins de Fer">      SNCF   </ABBR>   
<ABBR lang="es" title="Do&ntilde;a">Do&ntilde;a</ABBR>   <ABBR title="Abbreviation">abbr.</ABBR> 

Note that abbreviations and acronyms often have idiosyncratic pronunciations. For example, while "IRS" and "BBC" are typically pronounced letter by letter, "NATO" and "UNESCO" are pronounced phonetically. Still other abbreviated forms (e.g., "URI" and "SQL") are spelled out by some people and pronounced as words by other people. When necessary, authors should use style sheets to specify the pronunciation of an abbreviated form.

请注意缩写和首字母缩写通常有特定的发音。例如,“IRS”和"BBC"典型地发音是一个字母一个字母,"NATO"和 "UNESCO"就会以单词的形式发音。跟进一步,其他的一些缩写形式(例如:"URI"和"SQL")有的人会按字母读出,有些会按单词发音。如果有必要,作者应该使用样式表来指定一个缩写形式的发音。

9.2.2 Quotations: The BLOCKQUOTE and Q elements

<!ELEMENT BLOCKQUOTE - - (%block;|SCRIPT)+ -- long quotation -->

<!ATTLIST BLOCKQUOTE

  %attrs;                              -- %coreattrs, %i18n, %events --
  
cite        %URI;          #IMPLIED  -- URI for source document or msg --
  >

<!ELEMENT Q - - (%inline;)*            -- short inline quotation -->

<!ATTLIST Q
 %attrs;                              -- %coreattrs, %i18n, %events --
 cite        %URI;          #IMPLIED  -- URI for source document or msg --
  >

Start tag: required, End tag: required

开始标签:必须,结束标签:必须

Attribute definitions

属性定义

cite = uri [CT]
The value of this attribute is a URI that designates a source document or message. This attribute is intended to give information about the source from which the quotation was borrowed
该属性的值是一个指向某个源文档或消息的URI。该属性试图给出引文出自的源头。

 

These two elements designate quoted text. BLOCKQUOTE is for long quotations (block-level content) and Q is intended for short quotations (inline content) that don't require paragraph breaks.

这两个元素指定引文。BLOCKQUOTE用于长引文(块级别内容),Q用于不需要产生新段落的短引文(行内内容)。

This example formats an excerpt from "The Two Towers", by J.R.R. Tolkien, as a blockquote.

下面的例子将J.R.R. Tolkien 的“双塔奇兵”中的节选作为blockquote。

<BLOCKQUOTE cite="http://www.mycom.com/tolkien/twotowers.html">

<P>They went in single file, running like hounds on a strong scent,
and an eager light was in their eyes. Nearly due west the broad
swath of the marching Orcs tramped its ugly slot; the sweet grass
of Rohan had been bruised and blackened as they passed.</P>

</BLOCKQUOTE>
Rendering quotations

引文的展现

Visual user agents generally render BLOCKQUOTE as an indented block.

可视化用户代理通常会将BLOCKAUOTE展现成缩进块。

Visual user agents must ensure that the content of the Q element is rendered with delimiting quotation marks. Authors should not put quotation marks at the beginning and end of the content of a Q element.

可视化用户代理必须保证Q元素的内容以引号进行标记展示。作者不应该在Q元素内容的开始和结尾放置引号。

User agents should render quotation marks in a language-sensitive manner (see the lang attribute). Many languages adopt different quotation styles for outer and inner (nested) quotations, which should be respected by user-agents.

用户代理应该根据以语言感知的方式来展示引号(参看lang属性)。许多语言对外部和内部引号都有不同的引用样式,用户代理必须满足这种情况。

The following example illustrates nested quotations with the Q element.

下面的例子展示了使用Q元素进行嵌套式引用。

John said, <Q lang="en-us">I saw Lucy at lunch, she told me
<Q lang="en-us">Mary wants you
to get some ice cream on your way home.</Q> I think I will get
some at Ben and Jerry's, on Gloucester Road.</Q>

Since the language of both quotations is American English, user agents should render them appropriately, for example with single quote marks around the inner quotation and double quote marks around the outer quotation:

由于两个引用都是语言都是美国英语,用户代理应该适当地展示他们,例如,在内部的引文采用单引号,在外部的引文采用双引号。

  John said, "I saw Lucy at lunch, she told me 'Mary wants you
  to get some ice cream on your way home.' I think I will get some
  at Ben and Jerry's, on Gloucester Road."