Emacs 文件编码设置

****

查看当前文件编码的命令:

M-x describe-code-system <RET> 默认是当前文档的编码设置.

此时会进入一个新的buffer窗口.显示的内容大概是这样:

Coding system for saving this buffer:

U -- utf-8-dos (alias: mule-utf-8-dos)

Default coding system (for new files):

c -- chinese-iso-8bit-dos (alias: cn-gb-2312-dos euc-china-dos euc-cn-dos cn-gb-dos gb2312-dos)

等.....

U=utf-8,代表当前缓冲区的字符编码,

c=gb2312 代表新的buffer会使用gb2312字符编码!

上面是文件编码的保存和新建立,那么读取的时候呢,用过vim的都知道Vim读取的文档的时候会有一个字符编码列表,

如果当前文档编码匹配某个字符编码就使用它来解析文档,文件不至于出现乱码.Emacs也一样.

以下是Emacs读取文件的字符编码顺序设置.

Priority order for recognizing coding systems when reading files:

1. chinese-iso-8bit (alias: cn-gb-2312 euc-china euc-cn cn-gb gb2312)

2. chinese-big5 (alias: big5 cn-big5 cp950)

3. iso-2022-cn (alias: chinese-iso-7bit)

4. utf-8 (alias: mule-utf-8)

5. iso-2022-7bit

6. iso-latin-1 (alias: iso-8859-1 latin-1)

7. iso-2022-8bit-ss2

8. emacs-mule

9. raw-text

10. iso-2022-jp (alias: junet)

11. in-is13194-devanagari (alias: devanagari)

12. utf-8-auto

13. utf-8-with-signature

14. utf-16

15. utf-16be-with-signature (alias: utf-16-be)

16. utf-16le-with-signature (alias: utf-16-le)

17. utf-16be

18. utf-16le

19. japanese-shift-jis (alias: shift_jis sjis)

20. undecided

以上就是读取文档的时候,解析buffer编码的顺序


编码设置命令(文件~/.emacs):

(setq default-buffer-file-coding-system 'utf-8) Default coding system (for new files) 默认buffer编码是utf-8,(写文件)

(prefer-coding-system 'utf-8) 指定文件编码,此时buffer新建和读取都默认是utf-8,也可以M-x prefer-coding-system 只执行一次


参考Emacs文档:(详细内容请查看Emacs文档)

Specifying a Coding System for File Text

In cases where Emacs does not automatically choose the right coding system for a file's contents, you can use these commands to specify one:

C-x <RET> f coding <RET>Use coding system coding for saving or revisiting the visited file in the current buffer.
C-x <RET> c coding <RET>Specify coding system coding for the immediately following command.
C-x <RET> r coding <RET>Revisit the current file using the coding system coding.
M-x recode-region <RET> right <RET> wrong <RET>Convert a region that was decoded using coding system wrong, decoding it using coding system right instead.