Recipe 1.5. Representing Unprintable Characters

Recipe 1.5. Representing Unprintable Characters

Problem
You need to make reference to a control character, a strange UTF-8 character, or some other character that's not on your
keyboard.

Solution
针对非打印字符(unprintable characters),Ruby有一些机制来处理.在用双引号包含的字符串里面你可以使用这些机制,你可以在字符串里
面放一些binary character.

在字符串里面你可以使用这样的格式 "/000" 来表示八进制,或者"/x00"的格式来使用十六进制.

    octal = "/000/001/010/020"
    octal.each_byte { |x| puts x }
    # 0
    # 1
    # 8
    # 16

    hexadecimal = "/x00/x01/x10/x20"
    hexadecimal.each_byte { |x| puts x }
    # 0
    # 1
    # 16
    # 32


通过这样的方式,我们就可以处理UTF-8字符,尽管你不能在终端输入或者显示他们.
试试运行下面的程序,并且用浏览器打开生成的smiley.html文件:
    open('smiley.html', 'wb') do |f|
      f << '<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">'
      f << "/xe2/x98/xBA"
    end



大多数普通的非打印字符(比如换行符)都有一个特定的别名,通常是用一个/(backslash)加上一个字母表示.

    "/a" == "/x07" # => true # ASCII 0x07 = BEL (Sound system bell)
    "/b" == "/x08" # => true # ASCII 0x08 = BS (Backspace)
    "/e" == "/x1b" # => true # ASCII 0x1B = ESC (Escape)
    "/f" == "/x0c" # => true # ASCII 0x0C = FF (Form feed)
    "/n" == "/x0a" # => true # ASCII 0x0A = LF (Newline/line feed)
    "/r" == "/x0d" # => true # ASCII 0x0D = CR (Carriage return)
    "/t" == "/x09" # => true # ASCII 0x09 = HT (Tab/horizontal tab)
    "/v" == "/x0b" # => true # ASCII 0x0B = VT (Vertical tab)



Discussion
Ruby把字符串看成是一个字节的序列( sequence of bytes).它不会去区分这些字节是ASCII码(ASCII characters),binary characters
还是两者的混合体.

When Ruby prints out a human-readable string representation of a binary character, it uses the character's /xxx octal
representation. Characters with special /x mneumonics are printed as the mneumonic. Printable characters are output as
their printable representation, even if another representation was used to create the string.

    "/x10/x11/xfe/xff"             # => "/020/021/376/377"
    "/x48/145/x6c/x6c/157/x0a"     # => "Hello/n"


为了避免混淆,在一个字符串里面一个字面上的或者说实际的/(backslash),就要用两个/(backslash)来表示
For instance, the two-character string consisting of a backslash and the 14th letter of the alphabet is represented
as "//n".

    "//".size                      # => 1
    "//" == "/x5c"                 # => true
    "//n"[0] == ?//                # => true
    "//n"[1] == ?n                 # => true
    "//n" =~ //n/                  # => nil



Ruby同意也提供了一个特殊的方法来表示按键序列(keyboard sequences),比如Control-C."/C-_x_" 表示你同时按下了Ctrl(control key)
和x按键,"/M-_x_" 表示你同时按下了Alt(or Meta)和x按键.

    "/C-a/C-b/C-c" #               => "/001/002/003"
    "/M-a/M-b/M-c" #               => "/341/342/343"


这种binary characters得快捷表示方法(即上述的方法)也能够被当作一个字符.举例说明,你在这个特殊的字符前面加一个前缀?,就能够得
到一个十进制的数字,并且你可以在正则表达式的ranges里使用这个特殊的字符.

    ?/C-a                                    # => 1
    ?/M-z                                    # => 250

    contains_control_chars = /[/C-a-/C-^]/
    'Foobar' =~ contains_control_chars       # => nil
    "Foo/C-zbar" =~ contains_control_chars   # => 3

    contains_upper_chars = /[/x80-/xff]/
    'Foobar' =~ contains_upper_chars         # => nil
    "Foo/212bar" =~ contains_upper_chars     # => 3



Here's a sinister application that scans logged keystrokes for special characters:

    def snoop_on_keylog(input)
      input.each_byte do |b|
        case b
          when ?/C-c; puts 'Control-C: stopped a process?'
          when ?/C-z; puts 'Control-Z: suspended a process?'
          when ?/n; puts 'Newline.'
          when ?/M-x; puts 'Meta-x: using Emacs?'
        end
      end
    end

    snoop_on_keylog("ls -ltR/003emacsHello/012/370rot13-other-window/012/032")
    # Control-C: stopped a process?
    # Newline.
    # Meta-x: using Emacs?
    # Newline.
    # Control-Z: suspended a process?



特殊的字符只能在被双引号,或者%{} or %Q{}创建的字符串里被这样的解析,如果放在单引号或者%q{}里面是不会做任何解析动作的.
你可以使用这些特性来显示特殊的字符,or create a string containing a lot of backslashes.

    puts "foo/tbar"
    # foo     bar
    puts %{foo/tbar}
    # foo     bar
    puts %Q{foo/tbar}
    # foo     bar

    puts 'foo/tbar'
    # foo/tbar
    puts %q{foo/tbar}
    # foo/tbar



如果你是从Python转到Ruby,这个特性对你来说特别有利,它可能让你感到惊奇,为什么在单引号包含的字符串里面特殊的字符不会特殊的
对待呢?如果你创建字符串包含了特殊的字符,并且里面有许多的双引号,你可以使用%{}来构造字符串.

 

   
 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值