Recipe 1.5. Representing Unprintable Characters
Problem
You need to make reference to a control character, a strange UTF-8 character, or some other character that's not on your
keyboard.
Solution
针对非打印字符(unprintable characters),Ruby有一些机制来处理.在用双引号包含的字符串里面你可以使用这些机制,你可以在字符串里
面放一些binary character.
在字符串里面你可以使用这样的格式 "/000" 来表示八进制,或者"/x00"的格式来使用十六进制.
octal = "/000/001/010/020"
octal.each_byte { |x| puts x }
# 0
# 1
# 8
# 16
hexadecimal = "/x00/x01/x10/x20"
hexadecimal.each_byte { |x| puts x }
# 0
# 1
# 16
# 32
通过这样的方式,我们就可以处理UTF-8字符,尽管你不能在终端输入或者显示他们.
试试运行下面的程序,并且用浏览器打开生成的smiley.html文件:
open('smiley.html', 'wb') do |f|
f << '<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">'
f << "/xe2/x98/xBA"
end
大多数普通的非打印字符(比如换行符)都有一个特定的别名,通常是用一个/(backslash)加上一个字母表示.
"/a" == "/x07" # => true # ASCII 0x07 = BEL (Sound system bell)
"/b" == "/x08" # => true # ASCII 0x08 = BS (Backspace)
"/e" == "/x1b" # => true # ASCII 0x1B = ESC (Escape)
"/f" == "/x0c" # => true # ASCII 0x0C = FF (Form feed)
"/n" == "/x0a" # => true # ASCII 0x0A = LF (Newline/line feed)
"/r" == "/x0d" # => true # ASCII 0x0D = CR (Carriage return)
"/t" == "/x09" # => true # ASCII 0x09 = HT (Tab/horizontal tab)
"/v" == "/x0b" # => true # ASCII 0x0B = VT (Vertical tab)
Discussion
Ruby把字符串看成是一个字节的序列( sequence of bytes).它不会去区分这些字节是ASCII码(ASCII characters),binary characters
还是两者的混合体.
When Ruby prints out a human-readable string representation of a binary character, it uses the character's /xxx octal
representation. Characters with special /x mneumonics are printed as the mneumonic. Printable characters are output as
their printable representation, even if another representation was used to create the string.
"/x10/x11/xfe/xff" # => "/020/021/376/377"
"/x48/145/x6c/x6c/157/x0a" # => "Hello/n"
为了避免混淆,在一个字符串里面一个字面上的或者说实际的/(backslash),就要用两个/(backslash)来表示
For instance, the two-character string consisting of a backslash and the 14th letter of the alphabet is represented
as "//n".
"//".size # => 1
"//" == "/x5c" # => true
"//n"[0] == ?// # => true
"//n"[1] == ?n # => true
"//n" =~ //n/ # => nil
Ruby同意也提供了一个特殊的方法来表示按键序列(keyboard sequences),比如Control-C."/C-_x_" 表示你同时按下了Ctrl(control key)
和x按键,"/M-_x_" 表示你同时按下了Alt(or Meta)和x按键.
"/C-a/C-b/C-c" # => "/001/002/003"
"/M-a/M-b/M-c" # => "/341/342/343"
这种binary characters得快捷表示方法(即上述的方法)也能够被当作一个字符.举例说明,你在这个特殊的字符前面加一个前缀?,就能够得
到一个十进制的数字,并且你可以在正则表达式的ranges里使用这个特殊的字符.
?/C-a # => 1
?/M-z # => 250
contains_control_chars = /[/C-a-/C-^]/
'Foobar' =~ contains_control_chars # => nil
"Foo/C-zbar" =~ contains_control_chars # => 3
contains_upper_chars = /[/x80-/xff]/
'Foobar' =~ contains_upper_chars # => nil
"Foo/212bar" =~ contains_upper_chars # => 3
Here's a sinister application that scans logged keystrokes for special characters:
def snoop_on_keylog(input)
input.each_byte do |b|
case b
when ?/C-c; puts 'Control-C: stopped a process?'
when ?/C-z; puts 'Control-Z: suspended a process?'
when ?/n; puts 'Newline.'
when ?/M-x; puts 'Meta-x: using Emacs?'
end
end
end
snoop_on_keylog("ls -ltR/003emacsHello/012/370rot13-other-window/012/032")
# Control-C: stopped a process?
# Newline.
# Meta-x: using Emacs?
# Newline.
# Control-Z: suspended a process?
特殊的字符只能在被双引号,或者%{} or %Q{}创建的字符串里被这样的解析,如果放在单引号或者%q{}里面是不会做任何解析动作的.
你可以使用这些特性来显示特殊的字符,or create a string containing a lot of backslashes.
puts "foo/tbar"
# foo bar
puts %{foo/tbar}
# foo bar
puts %Q{foo/tbar}
# foo bar
puts 'foo/tbar'
# foo/tbar
puts %q{foo/tbar}
# foo/tbar
如果你是从Python转到Ruby,这个特性对你来说特别有利,它可能让你感到惊奇,为什么在单引号包含的字符串里面特殊的字符不会特殊的
对待呢?如果你创建字符串包含了特殊的字符,并且里面有许多的双引号,你可以使用%{}来构造字符串.
Problem
You need to make reference to a control character, a strange UTF-8 character, or some other character that's not on your
keyboard.
Solution
针对非打印字符(unprintable characters),Ruby有一些机制来处理.在用双引号包含的字符串里面你可以使用这些机制,你可以在字符串里
面放一些binary character.
在字符串里面你可以使用这样的格式 "/000" 来表示八进制,或者"/x00"的格式来使用十六进制.
octal = "/000/001/010/020"
octal.each_byte { |x| puts x }
# 0
# 1
# 8
# 16
hexadecimal = "/x00/x01/x10/x20"
hexadecimal.each_byte { |x| puts x }
# 0
# 1
# 16
# 32
通过这样的方式,我们就可以处理UTF-8字符,尽管你不能在终端输入或者显示他们.
试试运行下面的程序,并且用浏览器打开生成的smiley.html文件:
open('smiley.html', 'wb') do |f|
f << '<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">'
f << "/xe2/x98/xBA"
end
大多数普通的非打印字符(比如换行符)都有一个特定的别名,通常是用一个/(backslash)加上一个字母表示.
"/a" == "/x07" # => true # ASCII 0x07 = BEL (Sound system bell)
"/b" == "/x08" # => true # ASCII 0x08 = BS (Backspace)
"/e" == "/x1b" # => true # ASCII 0x1B = ESC (Escape)
"/f" == "/x0c" # => true # ASCII 0x0C = FF (Form feed)
"/n" == "/x0a" # => true # ASCII 0x0A = LF (Newline/line feed)
"/r" == "/x0d" # => true # ASCII 0x0D = CR (Carriage return)
"/t" == "/x09" # => true # ASCII 0x09 = HT (Tab/horizontal tab)
"/v" == "/x0b" # => true # ASCII 0x0B = VT (Vertical tab)
Discussion
Ruby把字符串看成是一个字节的序列( sequence of bytes).它不会去区分这些字节是ASCII码(ASCII characters),binary characters
还是两者的混合体.
When Ruby prints out a human-readable string representation of a binary character, it uses the character's /xxx octal
representation. Characters with special /x mneumonics are printed as the mneumonic. Printable characters are output as
their printable representation, even if another representation was used to create the string.
"/x10/x11/xfe/xff" # => "/020/021/376/377"
"/x48/145/x6c/x6c/157/x0a" # => "Hello/n"
为了避免混淆,在一个字符串里面一个字面上的或者说实际的/(backslash),就要用两个/(backslash)来表示
For instance, the two-character string consisting of a backslash and the 14th letter of the alphabet is represented
as "//n".
"//".size # => 1
"//" == "/x5c" # => true
"//n"[0] == ?// # => true
"//n"[1] == ?n # => true
"//n" =~ //n/ # => nil
Ruby同意也提供了一个特殊的方法来表示按键序列(keyboard sequences),比如Control-C."/C-_x_" 表示你同时按下了Ctrl(control key)
和x按键,"/M-_x_" 表示你同时按下了Alt(or Meta)和x按键.
"/C-a/C-b/C-c" # => "/001/002/003"
"/M-a/M-b/M-c" # => "/341/342/343"
这种binary characters得快捷表示方法(即上述的方法)也能够被当作一个字符.举例说明,你在这个特殊的字符前面加一个前缀?,就能够得
到一个十进制的数字,并且你可以在正则表达式的ranges里使用这个特殊的字符.
?/C-a # => 1
?/M-z # => 250
contains_control_chars = /[/C-a-/C-^]/
'Foobar' =~ contains_control_chars # => nil
"Foo/C-zbar" =~ contains_control_chars # => 3
contains_upper_chars = /[/x80-/xff]/
'Foobar' =~ contains_upper_chars # => nil
"Foo/212bar" =~ contains_upper_chars # => 3
Here's a sinister application that scans logged keystrokes for special characters:
def snoop_on_keylog(input)
input.each_byte do |b|
case b
when ?/C-c; puts 'Control-C: stopped a process?'
when ?/C-z; puts 'Control-Z: suspended a process?'
when ?/n; puts 'Newline.'
when ?/M-x; puts 'Meta-x: using Emacs?'
end
end
end
snoop_on_keylog("ls -ltR/003emacsHello/012/370rot13-other-window/012/032")
# Control-C: stopped a process?
# Newline.
# Meta-x: using Emacs?
# Newline.
# Control-Z: suspended a process?
特殊的字符只能在被双引号,或者%{} or %Q{}创建的字符串里被这样的解析,如果放在单引号或者%q{}里面是不会做任何解析动作的.
你可以使用这些特性来显示特殊的字符,or create a string containing a lot of backslashes.
puts "foo/tbar"
# foo bar
puts %{foo/tbar}
# foo bar
puts %Q{foo/tbar}
# foo bar
puts 'foo/tbar'
# foo/tbar
puts %q{foo/tbar}
# foo/tbar
如果你是从Python转到Ruby,这个特性对你来说特别有利,它可能让你感到惊奇,为什么在单引号包含的字符串里面特殊的字符不会特殊的
对待呢?如果你创建字符串包含了特殊的字符,并且里面有许多的双引号,你可以使用%{}来构造字符串.