6 Basic HTML data types

This section of the specification describes the basic data types that may appear as an element's content or an attribute's value.

For introductory information about reading the HTML DTD, please consult the SGML tutorial.

本部分描述了元素的内容里以及属性值中可以出现的基本数据类型。

关于阅读HTML DTD的详细介绍信息,请参阅"SGML教程"。

6.1 Case information

Each attribute definition includes information about the case-sensitivity of its values. The case information is presented with the following keys:

CS
The value is case-sensitive (i.e., user agents interpret "a" and "A" differently).
CI
The value is case-insensitive (i.e., user agents interpret "a" and "A" as the same).
CN
The value is not subject to case changes, e.g., because it is a number or a character from the document character set.
CA
The element or attribute definition itself gives case information.
CT
Consult the type definition for details about case-sensitivity.

If an attribute value is a list, the keys apply to every value in the list, unless otherwise indicated.

每个属性的定义都包含了改属性值是否大小写敏感的信息。这些大小写信息采用如下键值表示:

CS
值是大小写敏感的(例如:用户代理会对"a" 和"A" 采用不同的解析)。
CI
值是大小写不敏感的 (例如:用户代理会认为 "a" 和"A" 是一样的)。
CN
值为大小写中立,不会参与到大小写变化,即不存在大小写问题,例如:因为是数字或者来自文档字符集的字符。
CA
元素或者属性定义本身会给出大小写信息。
CT
从类型定义中获取大小心敏感的详细信息。

如果一个属性的值是列表,那么这些键值在没有其他说明的情况下,将会应用到该列表内的每一个值。

6.2 SGML basic types

The document type definition specifies the syntax of HTML element content and attribute values using SGML tokens (e.g., PCDATA, CDATA, NAME, ID, etc.). See [ISO8879] for their full definitions. The following is a summary of key information:

文档类型定义在描述HTML元素内容和属性值时采用了SGML的词汇(比如:PCDATA,CDATA,NAME,ID等)。请参见[ISO8879]来获取他们的完整定义。下面是这些关键字的汇总信息:

  • CDATA is a sequence of characters from the document character set and may include character entities. User agents should interpret attribute values as follows:
    • Replace character entities with characters,
    • Ignore line feeds,
    • Replace each carriage return or tab with a single space.

    User agents may ignore leading and trailing white space in CDATA attribute values (e.g., "   myval   " may be interpreted as "myval"). Authors should not declare attribute values with leading or trailing white space.

    For some HTML 4 attributes with CDATA attribute values, the specification imposes further constraints on the set of legal values for the attribute that may not be expressed by the DTD.

    Although the STYLE and SCRIPT elements use CDATA for their data model, for these elements, CDATA must be handled differently by user agents. Markup and entities must be treated as raw text and passed to the application as is. The first occurrence of the character sequence "</" (end-tag open delimiter) is treated as terminating the end of the element's content. In valid documents, this would be the end tag for the element.

  • ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").
  • IDREF and IDREFS are references to ID tokens defined by other attributes. IDREF is a single token and IDREFS is a space-separated list of tokens.
  • NUMBER tokens must contain at least one digit ([0-9]).

  • CDATA 是由来自文档字符集的字符所组成的字符序列,并且这个序列里可以包含字符实体。用户代理应该以下面的方式来解析属性值:
    • 将字符实体替换为实际代表的字符。
    • 忽略换行符。
    • 将每一个回车或者制表符(tab)替换为一个单一的空格。

    用户代理可以忽略CDATA型属性值头部和尾部的所有空格,例如: "   myval   "可以被解析成 "myval"。 HTML的作者不应该声明头部和尾部有空格的属性值。

    对于一些CDATA型的HTML 4 属性值,本规范提供了进一步的取值限制,因为这些限制没有办法用DTD来描述。

    虽然STYLESCRIPT 元素的数据模型采用CDATA型, 但是对于这些元素来说,用户代理必须采用另外的方式来处理CDATA。标记和实体必须被当做原始文本来对待,并将其原封不动地传递到相应的应用。第一个字符序列"</" (结束标签起始符) 被认为是元素内容的结束。在一个有效的文档中,这个字符序列应该是元素的结束标签。

  • IDNAME 型数据的必须以字母([A-Za-z])开始,可以后跟任意多个字母,数字([0-9]),连词符("-"),下划线("_"),冒号 (":")以及句号 (".")。
  • IDREFIDREFS 是指向其他属性定义的ID值的引用。IDREF 为单一ID引用,而IDREFS是以空格分隔的ID列表。
  • NUMBER至少包含一个数字([0-9])。

6.3 Text strings

A number of attributes ( %Text; in the DTD) take text that is meant to be "human readable". For introductory information about attributes, please consult the tutorial discussion of attributes.

有一些属性,在DTD中描述为%Text参数实体,这些属性将承载“人可读的”文本信息。关于属性的介绍信息,请参阅属性教程式讨论。

6.4 URIs

This specification uses the term URI as defined in [URI] (see also [RFC1630]).

Note that URIs include URLs (as defined in [RFC1738] and [RFC1808]).

Relative URIs are resolved to full URIs using a base URI. [RFC1808], section 3, defines the normative algorithm for this process. For more information about base URIs, please consult the section on base URIs in the chapter on links.

URIs are represented in the DTD by the parameter entity %URI;.

URIs in general are case-sensitive. There may be URIs, or parts of URIs, where case doesn't matter (e.g., machine names), but identifying these may not be easy. Users should always consider that URIs are case-sensitive (to be on the safe side).

Please consult the appendix for information about non-ASCII characters in URI attribute values.

本规范采用在[URI] 或者[RFC1630])中定义的属于URI,即URI的语义与上述规范定义一致。

需要注意URI包含URL,URL在[RFC1738] and [RFC1808]中定义。

相对URI在基址URI的帮助下会被解析成完整URI。[RFC1808]以及本规范的第三部分都定义了正式这一过程的处理机制。有关基址URI的更多信息,请参阅本规范链接章节中关于基址URI部分。

在DTD中URI采用参数实体%URI;来表示。

一般来说,URI是大小写敏感的。对于有些URI或者一个URI的默某些部分来说,大小写是无关紧要的,例如:计算机名,但是标识出这种情况不是一件容易的事情。为了安全起见,用户应该总是把URI认为是大小写敏感的。

关于URI属性内非ASCII字符的信息,请参阅相关附录。

6.5 Colors

The attribute value type "color" (%Color;) refers to color definitions as specified in [SRGB]. A color value may either be a hexadecimal number (prefixed by a hash mark) or one of the following sixteen color names. The color names are case-insensitive.

Color names and sRGB values
Black = "#000000"Green = "#008000"
Silver = "#C0C0C0"Lime = "#00FF00"
Gray = "#808080"Olive = "#808000"
White = "#FFFFFF"Yellow = "#FFFF00"
Maroon = "#800000"Navy = "#000080"
Red = "#FF0000"Blue = "#0000FF"
Purple = "#800080"Teal = "#008080"
Fuchsia = "#FF00FF"Aqua = "#00FFFF"

Thus, the color values "#800080" and "Purple" both refer to the color purple.

"color"属性值类型,在DTD中为参数实体%Color;,用于对[SRGB]中定义颜色的引用。一个颜色值或者是一个以#开头的十六进制数字,抑或是下面列表中给出的16个颜色名字。颜色名字是不区分大小写的。


颜色名及sRGB数值
Black = "#000000"Green = "#008000"
Silver = "#C0C0C0"Lime = "#00FF00"
Gray = "#808080"Olive = "#808000"
White = "#FFFFFF"Yellow = "#FFFF00"
Maroon = "#800000"Navy = "#000080"
Red = "#FF0000"Blue = "#0000FF"
Purple = "#800080"Teal = "#008080"
Fuchsia = "#FF00FF"Aqua = "#00FFFF"

根据上面列表情况,我们知道颜色值 "#800080" 和"Purple"都引用到相同的深紫色。

6.5.1 Notes on using colors

Although colors can add significant amounts of information to documents and make them more readable, please consider the following guidelines when including color in your documents:

  • The use of HTML elements and attributes for specifying color is deprecated. You are encouraged to use style sheets instead.
  • Don't use color combinations that cause problems for people with color blindness in its various forms.
  • If you use a background p_w_picpath or set the background color, then be sure to set the various text colors as well.
  • Colors specified with the BODY and FONT elements and bgcolor on tables look different on different platforms (e.g., workstations, Macs, Windows, and LCD panels vs. CRTs), so you shouldn't rely entirely on a specific effect. In the future, support for the [SRGB] color model together with ICC color profiles should mitigate this problem.
  • When practical, adopt common conventions to minimize user confusion.
尽管颜色可以向文档中加入大量的信息并且可以使文档更具可读性,但在文档中包含颜色信息时请遵循下面列表给出的指引:
  • HTML中用于指定颜色的元素和属性是不被推荐使用的。相反,使用样式表是被鼓励的。
  • 不要在文档中使用导致有色盲/色弱缺陷的人出现识别问题的颜色组合。
  • 如果使用背景图片或者是设置背景颜色,那么一定也要对文本颜色进行设置。
  • 通过BODY,FONT元素以及TABLE元素的bgcolor属性指定的颜色会在不同的平台上有不同的视觉展现(比如: 工作站, Macs, Windows, 以及LCD面板或者CRT上都会有所不同), 所以不能完全依赖某种特殊的效果。在未来,同时支持[SRGB] 颜色模型以及ICC颜色配置将会解决这个问题。
  • 在实践中, 应尽量采用通用的术语以使用户尽可能不迷惑。

6.6 Lengths

HTML specifies three types of length values for attributes:

  1. Pixels: The value (%Pixels; in the DTD) is an integer that represents the number of pixels of the canvas (screen, paper). Thus, the value "50" means fifty pixels. For normative information about the definition of a pixel, please consult [CSS1].
  2. Length: The value (%Length; in the DTD) may be either a %Pixel; or a percentage of the available horizontal or vertical space. Thus, the value "50%" means half of the available space.
  3. MultiLength: The value ( %MultiLength; in the DTD) may be a %Length; or a relative length. A relative length has the form "i*", where "i" is an integer. When allotting space among elements competing for that space, user agents allot pixel and percentage lengths first, then divide up remaining available space among relative lengths. Each relative length receives a portion of the available space that is proportional to the integer preceding the "*". The value "*" is equivalent to "1*". Thus, if 60 pixels of space are available after the user agent allots pixel and percentage space, and the competing relative lengths are 1*, 2*, and 3*, the 1* will be alloted 10 pixels, the 2* will be alloted 20 pixels, and the 3* will be alloted 30 pixels.
Length values are case-neutral.

HTML描述了为属性值提供的如下三种类型的长度值:

  1. 点(Pixels): 在DTD中用参数实体%Pixels; 表示,其取值是一个整型数字,用于表示在画布(屏幕或者纸张)上点的数量。因此, 值 "50" 意味着五十个点。有关pixel的正式定义信息,请参阅 [CSS1]
  2. 长度(Length): 在DTD中用参数实体%Length; 表示,它的值既可以是%Pixel; 也可以是在可用的水平或者垂直空间长度上的百分比。因此, 值"50%" 表示可用空间的一半。
  3. 多种长度(MultiLength): 在DTD中用参数实体 %MultiLength; 表示,其确实可以是 %Length; 也可以是一个相对长度。 相对长度具有类似 "i*"的形式,其中 "i" 是一个整数。在为互相竞争展现空间的众多元素分配空间时,用户代理首先分配点型和百分比型长度,然后再按照相对长度对剩余的空间进行分配。每一个相对长度 会按比例在剩余的空间中的长度,其计算规则如下,得到吗每个相对长度"*"前面整型数字,然后计算这些数字在剩余空间中的比例,最后按比例计算相应长度。 值 "*" 等于"1*"。因此,如果在用户代理分配了点型长度以及百分比长度后,有60个点的空间,而此时互相的竞争的相对长度是 1*, 2*, 和 3*, 1* 会分配到10个点, 2* 将会被分配到20点, 3* 将会被分配到30点。
长度值是大小写中立的。

6.7 Content types (MIME types)

Note. A "media type" (defined in [RFC2045] and [RFC2046]) specifies the nature of a linked resource. This specification employs the term "content type" rather than "media type" in accordance with current usage. Furthermore, in this specification, "media type" may refer to the media where a user agent renders a document.

This type is represented in the DTD by %ContentType;.

 

Content types are case-insensitive.

 

Examples of content types include "text/html", "p_w_picpath/png", "p_w_picpath/gif", "video/mpeg", "text/css", and "audio/basic". For the current list of registered MIME types, please consult [MIMETYPES].

 

注释。在[RFC2045][RFC2046]定 义的媒体类型(Media Type)是指一个连接资源的内在属性。为了符合当前的用法,本规范采用”内容类型(Content Type)“而不是”媒体类型(Media Type)“。进一步来说,在本规范中”媒体类型(Media Type)“可以用来表示用户代理展现(绘制)文档介质。

该类型在DTD中采用参数实体 %ContentType;表示。

内容类型是大小写不敏感的。

内容类型的例子有:"text/html", "p_w_picpath/png", "p_w_picpath/gif", "video/mpeg", "text/css", and "audio/basic"。如果想要获取当前登记注册的MIEM类型,请参阅 [MIMETYPES]。

6.8 Language codes

The value of attributes whose type is a language code ( %LanguageCode in the DTD) refers to a language code as specified by [RFC1766], section 2. For information on specifying language codes in HTML, please consult the section on language codes. Whitespace is not allowed within the language-code.

Language codes are case-insensitive.

语言代码类型的属性值,在DTD中用参数实体 %LanguageCode 表示。其值指的是在[RFC1766]第二部分中定义的语言代码。有关在HTML中指定语言代码的更多信息,请参阅语言代码部分。在语言代码里空格是不允许出现的。

语言代码是大小写不敏感的。


6.9 Character encodings

The "charset" attributes (%Charset in the DTD) refer to a character encoding as described in the section on character encodings. Values must be strings (e.g., "euc-jp") from the IANA registry (see [CHARSETS] for a complete list).

Names of character encodings are case-insensitive.

User agents must follow the steps set out in the section on specifying character encodings in order to determine the character encoding of an external resource.

在DTD中采用参数实体%Charset 定义的"charset"属性,是指在字符编码部分所描述的字符编码机制。其取值必须是IANA中正式注册登记的字符串(例如:"euc-jp")。请参阅[CHARSETS] 以获得字符编码的完整列表。

6.10 Single characters

Certain attributes call for a single character from the document character set. These attributes take the %Character type in the DTD.

Single characters may be specified with character references (e.g., "&amp;").

有一些属性只需要一个单独的文档字符集字符。这些属性拥有在DTD中的参数实体%Character类型。

单一字符可以用字符引用来表示。例如: "&amp;"。