Chapter 1. Strings
Ruby is a programmer-friendly language. If you are already familiar with object oriented programming, Ruby should quickly become second nature. If you've struggled with learning object-oriented programming or are not familiar with it, Ruby should make more sense to you than other object-oriented languages because Ruby's methods are consistently named, concise, and generally act the way you expect.
Throughout this book, we demonstrate concepts through interactive Ruby sessions. Strings are a good place to start because not only are they a useful data type, they're easy to create and use. They provide a simple introduction to Ruby, a point of comparison between Ruby and other languages you might know, and an approachable way to introduce important Ruby concepts like duck typing (see Recipe 1.12), open classes (demonstrated in Recipe 1.10), symbols (Recipe 1.7), and even Ruby gems (Recipe 1.20).
If you use Mac OS X or a Unix environment with Ruby installed, go to your command line right now and type irb. If you're using Windows, you can download and install the One-Click Installer from http://rubyforge.org/projects/rubyinstaller/, and do the same from a command prompt (you can also run the fxri program, if that's more comfortable for you). You've now entered an interactive Ruby shell, and you can follow along with the code samples in most of this book's recipes.
Strings in Ruby are much like strings in other dynamic languages like Perl, Python and PHP. They're not too much different from strings in Java and C. Ruby strings are dynamic, mutable, and flexible. Get started with strings by typing this line into your interactive Ruby session:
string = "My first string"
You should see some output that looks like this:
=> "My first string"
You typed in a Ruby expression that created a string "My first string", and assigned it to the variable string. The value of that expression is just the new value of string, which is what your interactive Ruby session printed out on the right side of the arrow. Throughout this book, we'll represent this kind of interaction in the following form:[1]
[1] Yes, this was covered in the Preface, but not everyone reads the Preface.
string = "My first string" # => "My first string"
In Ruby, everything that can be assigned to a variable is an object. Here, the variable string points to an object of class String. That class defines over a hundred built-in methods: named pieces of code that examine and manipulate the string. We'll explore some of these throughout the chapter, and indeed the entire book. Let's try out one now: String#length, which returns the number of bytes in a string. Here's a Ruby method call:
string.length # => 15
Many programming languages make you put parentheses after a method call:
string.length() # => 15
In Ruby, parentheses are almost always optional. They're especially optional in this case, since we're not passing any arguments into String#length. If you're passing arguments into a method, it's often more readable to enclose the argument list in parentheses:
string.count 'i' # => 2 # "i" occurs twice.
string.count('i') # => 2
The return value of a method call is itself an object. In the case of String#length, the return value is the number 15, an instance of the Fixnum class. We can call a method on this object as well:
string.length.next # => 16
Let's take a more complicated case: a string that contains non-ASCII characters. This string contains the French phrase "il était une fois," encoded as UTF-8:[2]
[2] "/xc3/xa9" is a Ruby string representation of the UTF-8 encoding of the Unicode character é.
french_string = "il /xc3/xa9tait une fois" # => "il /303/251tait une fois"
Many programming languages (notably Java) treat a string as a series of characters. Ruby treats a string as a series of bytes. The French string contains 14 letters and 3 spaces, so you might think Ruby would say the length of the string is 17. But one of the letters (the e with acute accent) is represented as two bytes, and that's what Ruby counts:
french_string.length # => 18
For more on handling different encodings, see Recipe 1.14 and Recipe 11.12. For more on this specific problem, see Recipe 1.8
You can represent special characters in strings (like the binary data in the French string) with string escaping. Ruby does different types of string escaping depending on how you create the string. When you enclose a string in double quotes, you can encode binary data into the string (as in the French example above), and you can encode newlines with the code "/n", as in other programming languages:
puts "This string/ncontains a newline"
# This string
# contains a newline
When you enclose a string in single quotes, the only special codes you can use are "/'" to get a literal single quote, and "//" to get a literal backslash:
puts 'it may look like this string contains a newline/nbut it doesn/'t'
# it may look like this string contains a newline/nbut it doesn't
puts 'Here is a backslash: //'
# Here is a backslash: /
This is covered in more detail in Recipe 1.5. Also see Recipes 1.2 and 1.3 for more examples of the more spectacular substitutions double-quoted strings can do.
Another useful way to initialize strings is with the " here documents" style:
long_string = <<EOF
Here is a long string
With many paragraphs
EOF
# => "Here is a long string/nWith many paragraphs/n"
puts long_string
# Here is a long string
# With many paragraphs
Like most of Ruby's built-in classes, Ruby's strings define the same functionality in several different ways, so that you can use the idiom you prefer. Say you want to get a substring of a larger string (as in Recipe 1.13). If you're an object-oriented programming purist, you can use the String#slice method:
string # => "My first string"
string.slice(3, 5) # => "first"
But if you're coming from C, and you think of a string as an array of bytes, Ruby can accommodate you. Selecting a single byte from a string returns that byte as a number.
string.chr + string.chr + string.chr + string.chr + string.chr
# => "first"
And if you come from Python, and you like that language's slice notation, you can just as easily chop up the string that way:
string[3, 5] # => "first"
Unlike in most programming languages, Ruby strings are mutable: you can change them after they are declared. Below we see the difference between the methods String#upcase and String#upcase!:
string.upcase # => "MY FIRST STRING"
string # => "My first string"
string.upcase! # => "MY FIRST STRING"
string # => "MY FIRST STRING"
This is one of Ruby's syntactical conventions. "Dangerous" methods (generally those that modify their object in place) usually have an exclamation mark at the end of their name. Another syntactical convention is that predicates, methods that return a true/false value, have a question mark at the end of their name (as in some varieties of Lisp):
string.empty? # => false
string.include? 'MY' # => true
This use of English punctuation to provide the programmer with information is an example of Matz's design philosophy: that Ruby is a language primarily for humans to read and write, and secondarily for computers to interpret.
An interactive Ruby session is an indispensable tool for learning and experimenting with these methods. Again, we encourage you to type the sample code shown in these recipes into an irb or fxri session, and try to build upon the examples as your knowledge of Ruby grows.
Here are some extra resources for using strings in Ruby:
- You can get information about any built-in Ruby method with the ri command; for instance, to see more about the String#upcase! method, issue the command ri "String#upcase!" from the command line.
- "why the lucky stiff" has written an excellent introduction to installing Ruby, and using irb and ri: http://poignantguide.net/ruby/expansion-pak-1.html
- For more information about the design philosophy behind Ruby, read an interview with Yukihiro "Matz" Matsumoto, creator of Ruby: http://www.artima.com/intv/ruby.html
第一章 Strings
Ruby 是一种对程序员友好的语言。如果你已经熟悉面向对象程序设计,Ruby会很快成为你的第二大帮手。如果你已正努力学习面向对象程序设计或还不熟悉它,Ruby能给你比面向对象语言更深的认识因为Ruby的方法始终如一地简洁,通常地做出你希望的。
贯穿这本书,我们通过交互地描述Ruby session.Strings是一个很好的入手点,因为它不仅是一种有用的数据类型,也简单易于创建使用。他们提供了一简单的Ruby介绍,Ruby与其它语言的比较你也许知道一点,一种可以接受的方法来介绍Ruby的重要概念比如duck typing,open classes,symbols,甚至Ruby gems.
如果你使用的是已安装了Ruby的Mac Os X 或者Unix 环境,现在打开你的命令行并键入irb.如果你用的Windows,你可以从http://rubyforge.org/projects/rubyinstaller/处下载并安装One-Click Installer后,在命令行做相同的事(你也可以运行fxri程序,如果那更适合你)。你现在已经可以交互在键入Ruby shell,你可以跟随着书中的代码。
在Ruby中的字符串与其它语动态语言比如Perl,Python与PHP的字符串很相似,它们与Java和C中的字符串区别很大。Ruby的字符串是动态的,易变的,灵活的。开始健入这一行字符到你的交互Ruby session:
String=”My first string”
你可以看到一些如下面的输出:
=> "My first string"
你输入的Ruby表达示创建了一字符串"My first string",并且把它赋值给一个字符串变量。表达示值是一个新的字符串,这个字符串是你与Ruby session交互并输出到箭头右边的。整个这本书,我们都将以下面的形式重再这种交互:
对,这部分在前言已提到,但不是每个人都会读前言的。
string = "My first string" # => "My first string"
在Ruby中,所有能够被赋值到一个变量的都是一个对象。这里,字符串变量指了一个字符串类。那个类定义了一百多个内置方法:检查指定的代码段并调节字符串。我们将在整章甚至整书中找出这些。让我们试一下:String#length,它们返回字符串的字节数。这里有Ruby方法调用:
string.length # => 15
很多程序语言都要求你在方法调节用的后面放置圆括号:
string.length() # => 15
在Ruby中,圆括号常常是可选的。尤其在我们没有参数传递入String#length中的情况下,它们更是可选的。如果你正传递参数到方法中,把参数放入到圆括号中代码更易读:
string.count 'i' # => 2 # "i" occurs twice.
string.count('i') # => 2
调用方法的返回值本身就是一个对象。在String#length这个例子中,它的返回值为数字15,一个Fixnum的实例类。我们也可调用在对象之上调用一个方法。
string.length.next # => 16
让我们看更多复杂的例子:一个包含非ASCII字符的字符串。这个字符串包含以法语"il était une fois," UTF-8编码后:[2]
[2] "/xc3/xa9"是一以Ruby字符串表示的UTF-8编码的Unicode 字符é.
french_string = "il /xc3/xa9tait une fois" # => "il /303/251tait une fois"
很多程序语言(特别是Java)把字符串看成一系列字符的集合。Ruby把字符串当作字节的集合。这个法语字符串包含14个字母与3个空格,因此,你也许会以为Ruby会认为这个字符串的长度为17.但是其中一个字母(标有重音的e)是当作两个字节的长度,所以Ruby统计得出:
french_string.length # => 18
你可以在字符串在使用特殊的字符(比如在法语中的二进制数据)。Ruby有不同的字符类型以避免依赖怎样创建字符串。当你用引号括起一字符串的时候,你可以二进制编码的形式存放字符串。(在上面法语的例子中),你可以与其它程序设计语言一样以"/n"编码换行符到字符串中:
puts "This string/ncontains a newline"
# This string
# contains a newline
当你用单引号括起一字符串的时候,你可以在字符串中使用"/'"来获得单引号。使"//"为得到”/”符:
puts 'it may look like this string contains a newline/nbut it doesn/'t'
# it may look like this string contains a newline/nbut it doesn't
puts 'Here is a backslash: //'
# Here is a backslash: /
这些更详细的细节在Recipe 1.5.你也可见Recipes 1.2 and 1.3来得到更多的例子
另一更有用的字符串初始化方式是以" 这里是文件"的风格来进行:
long_string = <<EOF
Here is a long string
With many paragraphs
EOF
# => "Here is a long string/nWith many paragraphs/n"
puts long_string
# Here is a long string
# With many paragraphs
如更多Ruby’s内建类一样,Ruby的字符串是以不同的方法来实现相同的功能,所以你可以使用你认为对你更适用的方式。比如说,你打算从一个长字符串中取得一个子字符串(如 Recipe 1.13一样)。如果你是一个面向对象的纯理论者,你可以使用String#slice方法如下:
string # => "My first string"
string.slice(3, 5) # => "first"
但是如果你是C语言出生,你认为字符串是一字节数组,Ruby也可以适合你。把从字符串返回的字节当作一个数字,选择一个字节。
string.chr + string.chr + string.chr + string.chr + string.chr
# => "first"
如果你来自于Python,并且你喜欢中括号你可以方便地以下面的方式取出字符串:
string[3, 5] # => "first"
不像大多数其它语言,Ruby字符串是易变的:你可以地声明它们后改变他们。下面我们可以在调用String#upcase与String#upcase方法时看到不同点:
string.upcase # => "MY FIRST STRING"
string # => "My first string"
string.upcase! # => "MY FIRST STRING"
string # => "MY FIRST STRING"
这是Ruby的语法习惯。“危险”的方法(通常,在之前修改对象的方法)常常以感叹号结束修改。别一语法习惯为断言,方法返回true/false值,有一个问号标记在他们名字后面:
string.empty? # => false
string.include? 'MY' # => true
在程序中使用英语标点符号是Matz's设计哲学的一个例子:Ruby首先是一人类读与写的语言,其次才是让计算机翻译。
一个交互的Ruby session是一学习与实验不可缺少的工具。再次,我们鼓励你把简单的代码输入到irb与fxri session中,并在这些例子的基础上使你的Ruby知识提高。
这里有一些关于Ruby字符串的额外的资源:
你可以使用ri命令取得一些关于Ruby内建方法的信息;比如,想了解更多关于String#upcase!方法的信息,你可以从命令行键入"String#upcase!"命令。
"why the lucky stiff"写了一篇很好的关于安装Ruby的简介,及irb与ri的使用:
http://poignantguide.net/ruby/expansion-pak-1.html
更多关于在Ruby背后的设计哲学,参照Ruby的发明者Yukihiro "Matz" Matsumoto的介绍绍,