Introducing Character Sets and Encodings(字符集与编码介绍)

翻译 2006年06月24日 00:32:00

on this page:  What? - Choosing - Using - Escapes - Web addresses - Validating & troubleshooting

Intended audience: anyone who is new to internationalization and needs guidance on topics to consider and ways to get into the material on the site.

This page introduces you to key internationalization topics and tasks, and directs you towards articles or resources that will take you on the next step of your journey. After reading these resources, you can find more detailed information using the topic index.

This page is not yet stable, and has not gone through wide review. It will be added to and improved over time. Please send comments to the www-international list.

What is it?(什么是字符集和编码)

A character set is a collection of letters and symbols used in a writing system, eg. the ASCII character set covers letters and symbols for English text, ISO-8859-6 covers letters and symbols needed for many languages based on the Arabic script, and Unicode contains characters for most of the living languages and scripts in the world.


Characters in a character set are stored as one or more bytes in a computer. Each byte or sequence of bytes represents a given character. A character encoding is the key that maps a particular byte or sequence of bytes to particular characters that the font renders as text.


There are many different character encodings. If the wrong encoding is applied to the bytes in memory, the result will be unintelligible text. It is therefore important that the character encoding used for content is correctly labelled if you want people to be able to read it.


Essential definitions. Unicode, character sets, coded character sets, character encodings, the document character set, and character escapes.

Choosing an encoding(选择编码)

Everyone developing content, whether content authors or programmers, must decide what character encoding to use. UTF-8 is a popular recommendation these days, but there may still be things you should consider before using it.


v      Choosing an encoding. Advice on choosing encodings.

v      Upgrading from language-specific legacy encoding to Unicode encoding. What you should consider when upgrading my web pages from legacy encoding to a Unicode encoding.

Using an encoding(使用编码)

Once it has been decided what encoding to use, content developers and programmers must ensure that it is declared in the right way.


v      Character sets & encodings in XHTML, HTML and CSS. How to declare encodings in these languages.

v      CSS character encoding declarations. How to declare encoding in CSS style sheets.

With a technology such as XHTML, encoding declarations are not always straightforward; they require an understanding of 'standards' vs. 'quirks' modes, and the impact of the XML declaration.


v      Serving XHTML 1.0. How do XHTML & MIME types, 'Standards' vs 'Quirks' modes, and the XML declaration influence encoding declarations?

You must also ensure that your data is saved in the encoding you have chosen, it is not sufficient to just label it.


v      Setting encoding in web authoring applications. How to set character encoding in my web authoring application.

v      Changing (X)HTML page encoding to UTF-8. How to change the encoding of my (X)HTML pages to UTF-8.

Content developers and webmasters may also need to ensure that the server delivers content with the correct character encoding declarations.


v      The HTTP charset parameter. How to send encoding information in the HTTP header.

v      Setting 'charset' information in .htaccess. How to use .htaccess directives on an Apache server to serve files with a specific encoding.


Escapes are a way of representing a character using only ASCII text. They provide a way of representing characters that are not available in the character encoding you are using, or a way of avoiding the use of the character for other reasons (such as when they may conflict with syntax). You should be clear on when and how these escapes should be used.


v      Using character entities and NCRs. What are character entities and NCRs, and when to use them.

Web addresses(Web地址)

These days Web addresses can also include non-ASCII characters. The user does little other than click on the appropriate link or enter the text as they see it, the heavy lifting is done by the user agent, but you may be interested to know how this works.


使用redmine时出现incompatible character encodings: UTF-8 and ASCII-8BIT的解决方法

今天被这个问题困扰了一天,查了很久的资料,终于解决了,记在这里希望对有同样问题的朋友有帮助  :-)这可能是由以下两种原因导致:1、问题环境:模板中有,且有UTF-8的字串变量,如I18n.t(:he...
  • Sapphire_aling
  • Sapphire_aling
  • 2011年01月10日 20:31
  • 8624

incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)

$ pod install --no-repo-update /Library/Ruby/Gems/2.0.0/gems/cocoapods-0.35.0/lib/cocoapods/user_in...
  • qq469236803
  • qq469236803
  • 2015年03月05日 14:20
  • 2293

关于vagrant 的坑

一、当大家使用vagrant up 时  出现 D:/soft/Vagrant/embedded/gems/gemsildprocess-0.6.3bildprocess/windows/proce...
  • qq_39256527
  • qq_39256527
  • 2017年12月06日 16:57
  • 67

使用cocoaPods出现:incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)

使用cocoaPods安装AFNetworking时出现: incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::Com...
  • duoduo_333
  • duoduo_333
  • 2015年02月04日 17:44
  • 2311

incompatible character encodings: UTF-8 and ASCII-8BIT ...
  • shiralwz
  • shiralwz
  • 2015年10月21日 18:26
  • 638

字符集(Character Sets)

  • hkfn123
  • hkfn123
  • 2013年09月02日 14:06
  • 566


一、设置编码 LINUX  修改vi/etc/my.cnf WINDOWS my.ini   在[client]下添加      default-character-set=utf8   ...
  • flcandclf
  • flcandclf
  • 2014年04月30日 09:43
  • 16100

字符集(Charcater Set)与字符编码(Encoding)

字符集(Charcater Set或Charset):是一个系统支持的所有抽象字符的集合,也就是一系列字符的集合。字符是各种文字和符号的总称,包括各国家文字、标点符号、图形符号、数字等。常见的字符集有...
  • qq_20161893
  • qq_20161893
  • 2017年05月20日 16:37
  • 202


MyCAT默认字符集是UTF8 下面通过查看日志来验证不同的MySQL客户端字符集和服务器字符集对于MyCAT的影响。 日志中与字符集有关的主要有三部分: 1. 初始化MyCAT连接池 2. ...
  • slowtech
  • slowtech
  • 2016年03月10日 15:18
  • 250


  • woshimalingyi
  • woshimalingyi
  • 2015年10月17日 11:31
  • 3177
您举报文章:Introducing Character Sets and Encodings(字符集与编码介绍)