Strings

Strings

字符串

Conceptually, Java strings are sequences of Unicode characters. For example, the string "Java/u2122" consists of the five Unicode characters J, a, v, a, and ™. Java does not have a built-in string type. Instead, the standard Java library contains a predefined class called, naturally enough, String. Each quoted string is an instance of the String class:

顾名思义,Java字符串就是Unicode的字符序列。例如,字符串"Java/u2122"Java 五个Unicode字符串组成。Java没有内建的字符串类型。替代的,标准Java库包含一个预定义类调用,自然足够,String(译者:水平有限,暂译如此)。每一个引用的字符串都是String类的一个实例。

 

String e = ""; // an empty string 一个空字符串
String greeting = "Hello";

 

Code Points and Code Units

代码点和代码单位

Java strings are implemented as sequences of char values. As we discussed on page 41, the char data type is a code unit for representing Unicode code points in the UTF-16 encoding. The most commonly used Unicode characters can be represented with a single code unit. The supplementary characters require a pair of code units.

Java字符串依靠char的值的序列来实现。正如我们在第41页讨论过的,char数据类型是表示基于UTF-16编码的Unicode代码点的代码单位。最常用的Unicode字符被表示为一个单一的代码单位。增补的字符需要一对代码单位。

The length method yields the number of code units required for a given string in the UTF-16 encoding. For example:

length方法得到一个给定的UTF-16编码字符串所需的代码单位的数目。例如:

 

String greeting = "Hello";
int n = greeting.length(); // is 5. 是5。

 

To get the true length, that is, the number of code points, call

要得到真正的长度,换句话说,代码点的数目,调用

 

int cpCount = greeting.codePointCount(0, greeting.length());

 

The call s.charAt(n) returns the code unit at position n, where n is between 0 and s.length() – 1. For example,

调用s.charAt(n)返回位置n的代码单位,n介于0s.length() – 1之间。例如,

 

char first = greeting.charAt(0); // first is 'H' 第一个是'H'
char last = greeting.charAt(4); // last is 'o' 最后一个是'o'

 

To get at the ith code point, use the statements

要得到第i个代码点,用语句

 

int index = greeting.offsetByCodePoints(0, i);
int cp = greeting.codePointAt(index);

 

NOTE

注释

Java counts the code units in strings in a peculiar fashion: the first code unit in a string has position 0. This convention originated in C, where there was a technical reason for counting positions starting at 0. That reason has long gone away and only the nuisance remains. However, so many programmers are used to this convention that the Java designers decided to keep it.

Java用一种特殊的方式计算字符串中代码单位的数量:字符串中第一个代码单位的位置是0。这个约定来源于C,从0开始计算位置是基于技术原因。这个原因已经过去很长时间,仅仅是个讨厌的历史遗留。然而,相当多的程序员已经习惯了这个约定,以至于Java的设计者决定将它保留下来。

 

Why are we making a fuss about code units? Consider the sentence

为什么我们对代码单位要“小题大做”?考虑句子

 

is the set of integers

 

The character requires two code units in the UTF-16 encoding. Calling

UTF-16编码的字符需要两个代码单位。调用

 

char ch = sentence.charAt(1)

 

doesn't return a space but the second code unit of . To avoid this problem, you should not use the char type. It is too low-level.

并不返回空白,但第二个代码单位确实是。为了避免这个问题,你最好不要用char类型。它太低级。

If your code traverses a string, and you want to look at each code point in turn, use these statements:

如果你的代码是旋转一个字符串,你要看到的是反转每一个代码点,用这些语句:

 

int cp = sentence.codePointAt(i);
if (Character.isSupplementaryCodePoint(cp)) i += 2;
else i++;

 

Fortunately, the codePointAt method can tell whether a code unit is the first or second half of a supplementary character, and it returns the right result either way. That is, you can move backwards with the following statements:

幸运的是,方法能知道代码单位是增补字符的前一半还是后一半,它总能返回正确的值。确切的说,你可以用下列语句向后移动:

 

i--;
int cp = sentence.codePointAt(i);
if (Character.isSupplementaryCodePoint(cp)) i--;

 

Substrings

子串

You extract a substring from a larger string with the substring method of the String class. For example,

你可以用String类的substring方法在一个较大的字符串中提取一个子串。例如,

 

String greeting = "Hello";
String s = greeting.substring(0, 3);

 

creates a string consisting of the characters "Hel".

建立一个由字符"Hel"组成的字符串。

The second parameter of substring is the first code unit that you do not want to copy. In our case, we want to copy the code units in positions 0, 1, and 2 (from position 0 to position 2 inclusive). As substring counts it, this means from position 0 inclusive to position 3 exclusive.

substring的第二个参数是你不想复制的第一个代码单位。在我们的例子里,我们要复制位置012(从位置1到位置2,含位置2)的代码单位。当substring计算它时,意思就是从位置0(含)到位置3(不含)。

There is one advantage to the way substring works: Computing the number of code units in the substring is easy. The string s.substring(a, b) always has b - a code units. For example, the substring "Hel" has 3 – 0 = 3 code units.

substring的工作方式有一个优势:在子串中计算代码单位的数量要容易。字符串s.substring(a, b)总是有b - a个代码单位。例如,子串"Hel"3 – 0 = 3个代码单位。

String Editing

串编辑

The String class gives no methods that let you change a character in an existing string. If you want to turn greeting into "Help!", you cannot directly change the last positions of greeting into 'p' and '!'. If you are a C programmer, this will make you feel pretty helpless. How are you going to modify the string? In Java, it is quite easy: concatenate the substring that you want to keep with the characters that you want to replace.

String类并没有提供让你在已经存在的字符串中更改字符的方法。如果打算把字符串greeting变成"Help!",你无法直接把字符串greeting的后半部分变成'p''!'。如果你是一个C程序员,这会使你感到相当的无助。那么你要如何修改这个字符串呢?在Java中,非常简单:将你想保留的子串和你想替换的字符连接起来就行了。

 

greeting = greeting.substring(0, 3) + "p!";

 

This declaration changes the current value of the greeting variable to "Help!".

这个表达式将变量greeting的当前值变为"Help!"

Because you cannot change the individual characters in a Java string, the documentation refers to the objects of the String class as being immutable. Just as the number 3 is always 3, the string "Hello" will always contain the code unit sequence describing the characters H, e, l, l, o. You cannot change these values. You can, as you just saw however, change the contents of the string variable greeting and make it refer to a different string, just as you can make a numeric variable currently holding the value 3 hold the value 4.

因为你不能改变Java字符串中的个别字符,所以这个文档将String类的对象作为不可变对象引用。就好比数字3总是3一样,字符串也将总是包含描述字符Hello的代码单位序列。你不能改变这些值。然而,正如你刚才所见,你可以改变字符串变量greeting的内容,让他引用不同的字符串,就像你通常可以将值为3数字变量赋值为4一样。

Isn't that a lot less efficient? It would seem simpler to change the code units than to build up a whole new string from scratch. Well, yes and no. Indeed, it isn't efficient to generate a new string that holds the concatenation of "Hel" and "p!". But immutable strings have one great advantage: the compiler can arrange that strings are shared.

那不是非常低效吗?看起来改变代码单位要比重新建立一个完整的新字符串要简单得多。那么,既是也不是。确实,产生一个连接"Hel""p!"的新字符串并没有效率。但不可变的字符串有一个非常大的优点:编译器会将字符串作为共享资源编排。

To understand how this works, think of the various strings as sitting in a common pool. String variables then point to locations in the pool. If you copy a string variable, both the original and the copy share the same characters. Overall, the designers of Java decided that the efficiency of sharing outweighs the inefficiency of string editing by extracting substrings and concatenating.

要明白这是怎么回事,只要想象各种各样的字符串就像坐在一个共同的池中。字符串变量则指向池中的位置。如果你复制字符串变量,原字符串和复件都将共享相同的字符。总的来说,Java的设计者坚信共享的效率要远比通过提取子串和连接的方式编辑字符串的效率来得重要。

Look at your own programs; we suspect that most of the time, you don't change strings—you just compare them. Of course, in some cases, direct manipulation of strings is more efficient. (One example is assembling strings from individual characters that come from a file or the keyboard.) For these situations, Java provides a separate StringBuilder class that we describe in Chapter 12. If you are not concerned with the efficiency of string handling, you can ignore StringBuilder and just use String.

看看你自己的程序;我们猜想大部分时间你并没有去改变字符串——你仅仅是对它们进行比较。当然,在有些例子中,直接处理字符串更有效率。(有个例子是将来自文件或键盘的零散的字符组合成字符串。)为了应付这些情况,Java提供了一个单独的StringBuilder类,我们将在第12章讨论。如果你不关心字符串处理的效率,你可以忽略StringBuilder而只用String

C++ NOTE

C++注释

C programmers generally are bewildered when they see Java strings for the first time because they think of strings as arrays of characters:

C程序员通常在第一次看到Java的字符串时都会感到迷惑,因为你们把字符串当成是字符的数组开看待。

 

char greeting[] = "Hello";

 

That is the wrong analogy: a Java string is roughly analogous to a char* pointer,

这有个不恰当的类比:Java字符串可以粗略的看作是char*指针。

 

char* greeting = "Hello";

 

When you replace greeting with another string, the Java code does roughly the following:

当你用另一个字符串替换greeting时,Java代码大致相当于:

 

char* temp = malloc(6);
strncpy(temp, greeting, 3);
strncpy(temp + 3, "p!", 3);
greeting = temp;

 

Sure, now greeting points to the string "Help!". And even the most hardened C programmer must admit that the Java syntax is more pleasant than a sequence of strncpy calls. But what if we make another assignment to greeting?

当然,greeting现在指向了字符串"Help!"。甚至连最顽固的C程序员都一定会承认Java的语法比一连串的strncpy调用令人愉快得多。但是我们给greeting赋另外一个值会怎样呢?

 

greeting = "Howdy";

 

Don't we have a memory leak? After all, the original string was allocated on the heap. Fortunately, Java does automatic garbage collection. If a block of memory is no longer needed, it will eventually be recycled.

我们没有存储漏洞吗?毕竟,原字符串已经被保留在堆上了。幸运的是,Java自动进行碎片收集。如果某一块内存区块不再被使用了,它将最终被收回。

If you are a C++ programmer and use the string class defined by ANSI C++, you will be much more comfortable with the Java String type. C++ string objects also perform automatic allocation and deallocation of memory. The memory management is performed explicitly by constructors, assignment operators, and destructors. However, C++ strings are mutable—you can modify individual characters in a string.

如果你是一个使用ANSI C++定义的String类的C++程序员,你在使用Java String类型时将会感到非常舒适。C++ string对象也自动执行内存的分配和解除分配。内存管理通过构造程序、分配操作符和解除程序明确的执行。然而,C++字符串是可变的——你可以在串中修改个别的字符。

 

Concatenation

连接字符串

Java, like most programming languages, allows you to use the + sign to join (concatenate) two strings.

Java,像大多数编程语言一样,允许你使用+号连接两个字符串。

 

String expletive = "Expletive";
String PG13 = "deleted";
String message = expletive + PG13;

 

The above code sets the variable message to the string "Expletivedeleted". (Note the lack of a space between the words: the + sign joins two strings in the order received, exactly as they are given.)

上述代码将变量message赋值成了字符串"Expletivedeleted"。(注意在词与词之间没有空格:+号严格按照接收到的顺序连接两个字符串。)

When you concatenate a string with a value that is not a string, the latter is converted to a string. (As you see in Chapter 5, every Java object can be converted to a string.) For example:

当你用字符串连接非字符串值的时候,后者将被转换成字符串。(在第五章你将看到,每一个Java对象都可以被转换成字符串。)例如:

 

int age = 13;
String rating = "PG" + age;

 

sets rating to the string "PG13".

rating赋值为字符串"PG13"

This feature is commonly used in output statements. For example,

这个特性通常用于输出语句,例如,

 

System.out.println("The answer is " + answer);

 

is perfectly acceptable and will print what one would want (and with the correct spacing because of the space after the word is).

是完全允许的,并且会输出想要得到的东西(以及恰当的空格,因为在单词is的后边有个空格)。

Testing Strings for Equality

检验相等字符串

To test whether two strings are equal, use the equals method. The expression

检验两个字符串是否相等,用equals方法。表达式

 

s.equals(t)

 

returns TRue if the strings s and t are equal, false otherwise. Note that s and t can be string variables or string constants. For example, the expression

如果字符串st相当则返回TRue,否则返回false。注意st可以是字符串变量也可以是字符串常量。例如,表达式

 

"Hello".equals(greeting)

 

is perfectly legal. To test whether two strings are identical except for the upper/lowercase letter distinction, use the equalsIgnoreCase method.

是完全合法的。检验两个字符串除大小写外相同,用equalsIgnoreCase方法。

 

"Hello".equalsIgnoreCase("hello")

 

Do not use the == operator to test whether two strings are equal! It only determines whether or not the strings are stored in the same location. Sure, if strings are in the same location, they must be equal. But it is entirely possible to store multiple copies of identical strings in different places.

千万不要用==运算符检验两个字符串是否相同!它仅仅是用来确定字符串是否存储在同一个位置的。的确,如果字符串在同一个位置,他们必定相同。但完全可能把同一个字符串的多个复件存储在不同的地方。

 

String greeting = "Hello"; //initialize greeting to a string 初始化
if (greeting == "Hello") . . .
   // probably true 可能是真值
if (greeting.substring(0, 3) == "Hel") . . .
   // probably false 可能是假值

 

If the virtual machine would always arrange for equal strings to be shared, then you could use the == operator for testing equality. But only string constants are shared, not strings that are the result of operations like + or substring. Therefore, never use == to compare strings lest you end up with a program with the worst kind of bug—an intermittent one that seems to occur randomly.

如果虚拟机总是安排相同的字符串被共享,那你就可以用==运算符来检验相等。但仅仅是字符串常量被共享,像+substring的操作结果字符串并不被共享。因此,永远都不要用==来比较字符串,以免你的程序以糟糕的bug而告终——一个间歇性的,看起来随机出现的bug

C++ NOTE

C++注释

If you are used to the C++ string class, you have to be particularly careful about equality testing. The C++ string class does overload the == operator to test for equality of the string contents. It is perhaps unfortunate that Java goes out of its way to give strings the same "look and feel" as numeric values but then makes strings behave like pointers for equality testing. The language designers could have redefined == for strings, just as they made a special arrangement for +. Oh well, every language has its share of inconsistencies.

如果你习惯了C++string类,你就不得不在做相等检验的时候格外小心。C++ string类确实把检测字符串内容相等的重担都交给了==操作符。Java用过时的办法让字符串“看起来和感觉上”就像数值一样,而另一方面却使字符串的行为在相等检验时像指针,这也许是不幸的(译者:水平有限,暂译如此)。语言的设计者可以为了字符串重新定义==,就像他们对+有个特殊的安排一样。那么好,每种语言都有它的不兼容部分。

C programmers never use == to compare strings but use strcmp instead. The Java method compareTo is the exact analog to strcmp. You can use

C程序员从来不用==比较字符串,而是用strcmp代替。Java compareTo方法完全模仿strcmp。你可以用

 

if (greeting.compareTo("Hello") == 0) . . .

 

but it seems clearer to use equals instead.

但是用equals来代替看起来要清楚一些。

 

The String class in Java contains more than 50 methods. A surprisingly large number of them are sufficiently useful so that we can imagine using them frequently. The following API note summarizes the ones we found most useful.

JavaString类包含50多个方法。它们当中的相当数量都十分有用,以至于能够设想我们会频繁的使用它们。下面的API注释汇总了我们找到的大部分有用的方法。

NOTE

注释

You will find these API notes throughout the book to help you understand the Java Application Programming Interface (API). Each API note starts with the name of a class such as java.lang.String—the significance of the so-called package name java.lang is explained in Chapter 4. The class name is followed by the names, explanations, and parameter descriptions of one or more methods.

你会发现这些贯穿全书的API注释将帮助你理解Java应用程序接口(API)。每个API注释都以一个类的名称作为开头,就像java.lang. String——java.lang所谓的包名称的意义将在第4章进行说明。类名称后面紧跟着一个或几个方法的名称、说明和参数描述。

We typically do not list all methods of a particular class but instead select those that are most commonly used, and describe them in a concise form. For a full listing, consult the on-line documentation.

我们并没有将特定的类的所有方法全部列出来,取而代之的是选择了有代表性的通常能够用到的部分,简要地做了一个表单来说明它们。要获得完整的清单,可以参考在线文档。

We also list the version number in which a particular class was introduced. If a method has been added later, it has a separate version number.

在介绍特定的类的时候我们也列出了版本号,如果某个方法是后来加入的,它就会单独标注版本号。

 


 

 

 

java.lang.String 1.0

 

·         char charAt(int index)

returns the code unit at the specified location. You probably don't want to call this method unless you are interested in low-level code units.

返回指定位置的代码单位。你也许不会调用这个方法,除非你对低级的代码单位感兴趣。

·         int codePointAt(int index) 5.0

returns the code point that starts or ends at the specified location.

返回从指定位置开始或在指定位置结束的代码点。(译者:即指定位置所在的代码点)。

·         int offsetByCodePoints(int startIndex, int cpCount) 5.0

returns the index of the code point that is cpCount code points away from the code point at startIndex.

返回从编号startIndex的代码点开始的cpCount个代码点的索引。

·         int compareTo(String other)

returns a negative value if the string comes before other in dictionary order, a positive value if the string comes after other in dictionary order, or 0 if the strings are equal.

如果字符串的字典顺序排在other之前则返回负值,排在other之后则返回正值。如果相等则返回0

·         boolean endsWith(String suffix)

returns TRue if the string ends with suffix.

如果字符串以suffix结束,则返回TRue

·         boolean equals(Object other)

returns true if the string equals other.

如果字符串和other相等则返回true

·         boolean equalsIgnoreCase(String other)

returns true if the string equals other, except for upper/lowercase distinction.

如果字符串除了大小写之外和other相等,则返回true

·         int indexOf(String str)

·         int indexOf(String str, int fromIndex)

·         int indexOf(int cp)

·         int indexOf(int cp, int fromIndex)

return the start of the first substring equal to the string str or the code point cp, starting at index 0 or at fromIndex, or -1 if str does not occur in this string.

返回从索引0位置或fromIndex位置开始的第一个与字符串str或代码点cp相同的子字符串的起始点,或者这个字符串中没有出现过str则返回-1

·         int lastIndexOf(String str)

·         int lastIndexOf(String str, int fromIndex)

·         int lastindexOf(int cp)

·         int lastindexOf(int cp, int fromIndex)

return the start of the last substring equal to the string str or the code point cp, starting at the end of the string or at fromIndex.

返回从字符串结尾或fromIndex位置开始的最后一个与字符串str或代码点cp相同的子字符串的起始点。

·         int length()

returns the length of the string.

返回字符串的长度。

·         int codePointCount(int startIndex, int endIndex) 5.0

returns the number of code points between startIndex and endIndex - 1. Unpaired surrogates are counted as code points.

返回从位置到位置之间的代码点的个数。不成对的代用品当作代码点来计数。

·         String replace(CharSequence oldString, CharSequence newString)

returns a new string that is obtained by replacing all substrings matching oldString in the string with the string newString. You can supply String or StringBuilder objects for the CharSequence parameters.

返回一个通过将字符串中所有与oldString匹配的子串用字符串newString替换所得到的新字符串。你可以给StringStringBuilder对象提供CharSequence类型的参数。

·         boolean startsWith(String prefix)

returns true if the string begins with prefix.

如果字符串以prefix开头,则返回true

·         String substring(int beginIndex)

·         String substring(int beginIndex, int endIndex)

return a new string consisting of all code units from beginIndex until the end of the string or until endIndex - 1.

返回一个由从位置beginIndex开始,一直到字符串结束或者是到位置endIndex - 1的所有代码单位组成的新字符串。

·         String toLowerCase()

returns a new string containing all characters in the original string, with uppercase characters converted to lower case.

返回一个由原字符串的所有字符组成,并将大写字符转换成小写字符后的新字符串。

·         String toUpperCase()

returns a new string containing all characters in the original string, with lowercase characters converted to upper case.

返回一个由原字符串的所有字符组成,并将小写字符转换成大写字符后的新字符串。

·         String trim()

returns a new string by eliminating all leading and trailing spaces in the original string.

返回一个去除原字符串中所有的前导和尾随空格的新字符串。

Reading the On-Line API Documentation

阅读在线API文档

As you just saw, the String class has lots of methods. Furthermore, there are thousands of classes in the standard libraries, with many more methods. It is plainly impossible to remember all useful classes and methods. Therefore, it is essential that you become familiar with the on-line API documentation that lets you look up all classes and methods in the standard library. The API documentation is part of the JDK. It is in HTML format. Point your web browser to the docs/api/index.html subdirectory of your JDK installation. You will see a screen like that in Figure 3-2.

就像你刚刚看到的,String类有大量的方法。而且,标准库中有数千的类以及更多的方法。很明显,我们不可能把所有有用的类和方法都记住。因此,经常浏览在线API文档就非常必要,你会在上面找到标准库中所有的类和方法。API文档是JDK的一部分。它是HTML格式的。将你的Web浏览器指向你的JDK安装目录的docs/api/index.html子目录。你会看到如图3-2所示的界面。

Figure 3-2. The three panes of the API documentation

3-2. API文档的三个窗格

[View full size image]

 

The screen is organized into three frames. A small frame on the top left shows all available packages. Below it, a larger frame lists all classes. Click on any class name, and the API documentation for the class is displayed in the large frame to the right (see Figure 3-3). For example, to get more information on the methods of the String class, scroll the second frame until you see the String link, then click on it.

这个屏幕有三个框架组成。左上角的想框架显示所有可用的包。在它的下面,一个较大的框架列出了所有的类。在任意的类名称上单击,这个类的API文档就会显示在右侧最大的框架中(参见图3-3)。例如,要得到String类的方法得更多信息,滚动第二个框架直到你看见String的链接,然后单击这个链接。

Figure 3-3. Class description for the String class

3-3. String类的类说明

[View full size image]

 

Then scroll the frame on the right until you reach a summary of all methods, sorted in alphabetical order (see Figure 3-4). Click on any method name for a detailed description of that method (see Figure 3-5). For example, if you click on the compareToIgnoreCase link, you get the description of the compareToIgnoreCase method.

接着滚动右侧的框架直到你到达所有方法的摘要,它们按字母顺序排列(参见图3-4)。单击随便哪个方法的名字,都可以得到那个方法的详细说明(参见图3-5)。例如,如果点击compareToIgnoreCase链接,你就会得到compareToIgnoreCase方法的说明。

Figure 3-4. Method summary of the String class

3-4. String类的方法汇总

[View full size image]

 

Figure 3-5. Detailed description of a String method

3-5. String的方法的详细说明

[View full size image]

 

TIP

提示

Bookmark the docs/api/index.html page in your browser right now.

马上将docs/api/index.html页加入到你的浏览器收藏夹中。

阅读更多
个人分类: Core Java 2 Vol I 7th
上一篇Operators
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页

关闭
关闭
关闭