ruby 学习笔记(四) String

Listing all methods of a class or object

String.methods.sort
shows you a list of methods that the Class object String responds to.

String.instance_methods.sort
This method tells you all the instance methods that instances of String are endowed with.

String.instance_methods(false).sort
With this method, you can view a class's instance methods without those of the class's ancestors.
Comparing two strings for equality

Strings have several methods for testing equality. The most common one is == (double equals sign). Another equality-test instance method, String.eql?, tests two strings for identical content. It returns the same result as ==. A third instance method, String.equal?, tests whether two strings are the same object. An example p013strcmp.rb illustrates this:

    # p013strcmp.rb 
    # String#eql?, tests two strings for identical content. 
    # It returns the same result as == 
    # String#equal?, tests whether two strings are the same object 
    s1 = 'Jonathan' 
    s2 = 'Jonathan' 
    s3 = s1 
    if s1 == s2 
      puts 'Both Strings have identical content' 
    else 
      puts 'Both Strings do not have identical content' 
    end 
    if s1.eql?(s2) 
      puts 'Both Strings have identical content' 
    else 
      puts 'Both Strings do not have identical content' 
    end 
    if s1.equal?(s2) 
      puts 'Two Strings are identical objects' 
    else 
      puts 'Two Strings are not identical objects' 
    end 
    if s1.equal?(s3) 
      puts 'Two Strings are identical objects' 
    else 
      puts 'Two Strings are not identical objects' 
    end 

Using %w

Sometimes creating arrays of words can be a pain, what with all the quotes and commas. Fortunately, Ruby has a shortcut: %w does just what we want.

    names1 = [ 'ann', 'richard', 'william', 'susan', 'pat' ] 
    puts names1[0] # ann 
    puts names1[3] # susan 
    # this is the same: 
    names2 = %w{  ann richard william susan pat } 
    puts names2[0] # ann 
    puts names2[3] # susan 

Character Set

A character set, or more specifically, a coded character set is a set of character symbols, each of which has a unique numerical ID, which is called the character's code point.

An example of a character set is the 128-character ASCII character set, which is mostly made up of the letters, numbers, and punctuation used in the English language. The most expansive character set in common use is the Universal Character Set (UCS), as defined in the Unicode standard, which contains over 1.1 million code points.

The letter A, for example, is assigned a magic number by the Unicode consortium which is written like this: U+0048. A string "Hello" which, in Unicode, corresponds to these five code points:

    U+0048 U+0065 U+006C U+006C U+006F 

Just a bunch of code points. Numbers, really. We haven't yet said anything about how to store this in memory. That's where encodings come in.
Character Encoding

UTF-8 can be used for storing your string of Unicode code points, those magic U+ numbers, in memory using 8 bit bytes. In UTF-8, every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes. This has the neat side effect that English text looks exactly the same in UTF-8 as it did in ASCII.

It does not make sense to have a string without knowing what encoding it uses. Thus, if you have a string, you have to know what encoding it is in or you cannot interpret it or display it to users correctly.

Ruby supports the idea of character encodings.
Encoding class

Objects of class Encoding each represent a different character encoding. The Encoding.list method returns a list of the built-in encodings.

Ruby has a way of setting the encoding on a file-by-file basis using a new magic comment. If the first line of a file is a comment (or the second line if the first line is a #! shebang line), Ruby scans it looking for the string coding:. If it finds it, Ruby then skips any spaces and looks for the (case-insensitive) name of an encoding. Thus, to specify that a source file is in UTF-8 encoding, you can write this:

    # coding: utf-8 

As Ruby is just scanning for coding:, you could also write the following:

    # encoding: utf-8 

Note: Ruby writes a byte sequence \xEF\xBB\xBF at the start of a source file, when you use utf-8.

If nothing overrides the setting, the default encoding for source is US-ASCII.

Here's some example code:

# encoding: utf-8

# λ is the Greek character Lambda here
puts "λ".length # => 1
puts "λ".bytesize # => 2
puts "λ".encoding # => UTF-8

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值