Introducing Character Sets and Encodings(字符集与编码介绍)

翻译 2006年06月24日 00:32:00

on this page:  What? - Choosing - Using - Escapes - Web addresses - Validating & troubleshooting

Intended audience: anyone who is new to internationalization and needs guidance on topics to consider and ways to get into the material on the site.

This page introduces you to key internationalization topics and tasks, and directs you towards articles or resources that will take you on the next step of your journey. After reading these resources, you can find more detailed information using the topic index.

This page is not yet stable, and has not gone through wide review. It will be added to and improved over time. Please send comments to the www-international list.

What is it?(什么是字符集和编码)

A character set is a collection of letters and symbols used in a writing system, eg. the ASCII character set covers letters and symbols for English text, ISO-8859-6 covers letters and symbols needed for many languages based on the Arabic script, and Unicode contains characters for most of the living languages and scripts in the world.

字符集就是书写系统中字符与符号的集合。例如ASCII字符集包括了英文中的字母和符号,ISO-8859-6包括了许多基于阿拉伯语的语言的字符和符号,Unicode包括世界上大部分可用的字符。

Characters in a character set are stored as one or more bytes in a computer. Each byte or sequence of bytes represents a given character. A character encoding is the key that maps a particular byte or sequence of bytes to particular characters that the font renders as text.

在计算机中字符集中的字符是按一个或两个字节存储的。每个字节或字节序列代表一个给定的字符。字符编码是将一个特定字节或字节序列映射到字符,其中字符可以由对应的字体来表现成文本。

There are many different character encodings. If the wrong encoding is applied to the bytes in memory, the result will be unintelligible text. It is therefore important that the character encoding used for content is correctly labelled if you want people to be able to read it.

有许多不同的字符编码。如果把错误的编码应用到内容中的字节的话,结果会出现“乱码”。因此,如果你想让别人能够读出你的文字内容,是用正确的字符编码是很重要的。

Essential definitions. Unicode, character sets, coded character sets, character encodings, the document character set, and character escapes.

Choosing an encoding(选择编码)

Everyone developing content, whether content authors or programmers, must decide what character encoding to use. UTF-8 is a popular recommendation these days, but there may still be things you should consider before using it.

每个开发内容的人,无论他是内容的作者还是程序员,都必须决定用哪种字符编码。如今UTF-8是一种推荐的编码格式,但在您是用之前还是要有一些事情值得考虑。

v      Choosing an encoding. Advice on choosing encodings.

v      Upgrading from language-specific legacy encoding to Unicode encoding. What you should consider when upgrading my web pages from legacy encoding to a Unicode encoding.

Using an encoding(使用编码)

Once it has been decided what encoding to use, content developers and programmers must ensure that it is declared in the right way.

一旦编码被确定,内容开发者和程序员必须保证它能被正确的声明。

v      Character sets & encodings in XHTML, HTML and CSS. How to declare encodings in these languages.

v      CSS character encoding declarations. How to declare encoding in CSS style sheets.

With a technology such as XHTML, encoding declarations are not always straightforward; they require an understanding of 'standards' vs. 'quirks' modes, and the impact of the XML declaration.

XHTML这样的技术,编码声明不总是很明确的;我们需要了解标准模式和非标准模式的差异,并且明白XML声明的影响。

v      Serving XHTML 1.0. How do XHTML & MIME types, 'Standards' vs 'Quirks' modes, and the XML declaration influence encoding declarations?

You must also ensure that your data is saved in the encoding you have chosen, it is not sufficient to just label it.

您也必须确保您的数据是按您选择的编码格式存储的。

v      Setting encoding in web authoring applications. How to set character encoding in my web authoring application.

v      Changing (X)HTML page encoding to UTF-8. How to change the encoding of my (X)HTML pages to UTF-8.

Content developers and webmasters may also need to ensure that the server delivers content with the correct character encoding declarations.

内容开发者和站长也需要确保服务器按照声明的字符编码来传递数据内容。

v      The HTTP charset parameter. How to send encoding information in the HTTP header.

v      Setting 'charset' information in .htaccess. How to use .htaccess directives on an Apache server to serve files with a specific encoding.

Escapes(转义)

Escapes are a way of representing a character using only ASCII text. They provide a way of representing characters that are not available in the character encoding you are using, or a way of avoiding the use of the character for other reasons (such as when they may conflict with syntax). You should be clear on when and how these escapes should be used.

转义是一种只使用ASCII文本表现字符的方式。它们提供了一种在字符编码中无法使用的字符表现的方法,或是为了某种原因避免使用字符(例如语法冲突)的方法。您应该对于什么时候,如何使用转义很了解。

v      Using character entities and NCRs. What are character entities and NCRs, and when to use them.

Web addresses(Web地址)

These days Web addresses can also include non-ASCII characters. The user does little other than click on the appropriate link or enter the text as they see it, the heavy lifting is done by the user agent, but you may be interested to know how this works.

目前Web地址也可包含非ASCII字符。用户只需点击合适的链接或键入他们想看到的文字,其他都由代理完成,但您也许对于其中的工作原理感兴趣。

Positively Must Know About Unicode and Character Sets (No Excuses!)

by Joel Spolsky Wednesday, October 08, 2003 Ever wonder about that mysterious Content-Type tag? Yo...

Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Cha...
  • bxlsky
  • bxlsky
  • 2013年07月02日 14:33
  • 564

1.1介绍线程和运行(Introducing Thread and Runnable)

线程类提供了线程结构,其中包含了潜在地去操作系统的接口。(这个操作通常是请求创建和管理线程。)一个单一的操作系统线程也连接着一个Thread对象。 Runnable接口提供通过线程执行的代码,它也连接...

[Introducing Ethereum and Solidity]以太坊和solidity介绍----第一章-连接区块链知识的断点

1连接区块链知识的断点 为高速发展的区块链世界的欢呼是具有挑战的。这本书将会是你的指引。在开始之前,让我们定义一下之后将要用到的一些术语。 “区块链”是一种完全分布式的,点对点的软件网络,这个软件网络...

类型编码(Type Encodings)

为了协助运行时系统,编译器用字符串为每个方法的返回值和参数类型和方法选择器编码。使用的编码方案在其他情况下也很有用,所以它是public 的,可用于@encode() 编译器指令。当给定一个类型参数,...

Introducing Character Animation with Blender

  • 2010年06月22日 22:09
  • 24.06MB
  • 下载

iOS 类型编码(Type Encodings)

我们可以通过编译器指令 @encode() 来获取一个给定类型的编码字符串, Code Meaning c A char i An int ...

转换字符编码(Converting Between String Encodings)CFString

字符串对象给你大量的字符串编码转换的工具。一些常规做真实的转换,其他的显示哪些编码是可用的,帮助你选择在当前情形下得最佳编码。 如果你想要在任何两个non-unicode编码之间转换,你可以使用CF...
  • hkfn123
  • hkfn123
  • 2013年09月01日 21:59
  • 1447
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:Introducing Character Sets and Encodings(字符集与编码介绍)
举报原因:
原因补充:

(最多只允许输入30个字)