String utilities
Joiner
以分隔符连接字符串序列是必须的操作,但是如果字符序列中包含null就会很难处理。Joiner会让这一切变得简单易用。
Joiner joiner = Joiner.on("; ").skipNulls();
return joiner.join("Harry", null, "Ron", "Hermione");
返回的字符串为 "Harry; Ron; Hermione". 当然, 不使用 skipNulls
方法,你可以使用useForNull(String)方法
将null替换掉。
也可以将Joiner
方法使用在一个对象上,这可以通过他们的 toString()
方法然后连接。
Joiner.on(",").join(Arrays.asList(1, 5, 7)); // returns "1,5,7"
Warning: joiner 实例是不可变的。 joiner 的配置方法始终返回一个新的 Joiner
对象,您必须使用它来获得所需的语义。这使得 Joiner
线程安全,可以作为 static final
常数使用。
Splitter
Java 中分割字符串的工具有一些奇怪的行为。例如,String.split
会默默丢弃最后的分隔符, 以及 StringTokenizer
完全支持五个空白字符。
Quiz: ",a,,b,".split(",")
这个方法将会返回什么?
"", "a", "", "b", ""
null, "a", null, "b", null
"a", null, "b"
"a", "b"
- None of the above
正确的结果是第五个选项: "", "a", "", "b"
. 仅仅末尾的空字符串被跳过。
Splitter
允许使用令人满意的直截了当的流畅模式完全控制这些混乱的行为
Splitter.on(',') .trimResults() .omitEmptyStrings() .split("foo,bar,, qux");
返回包含"foo", "bar", "qux"的 Iterable<String>
. Splitter
可以分离的类型有 Pattern
, char
, String
, 或者 CharMatcher
.
Base Factories
Method | Description | Example |
---|---|---|
Splitter.on(char) | Split on occurrences of a specific, individual character. | Splitter.on(';') |
Splitter.on(CharMatcher) | Split on occurrences of any character in some category. | Splitter.on(CharMatcher.BREAKING_WHITESPACE) Splitter.on(CharMatcher.anyOf(";,.")) |
Splitter.on(String) | Split on a literal String . | Splitter.on(", ") |
Splitter.on(Pattern) Splitter.onPattern(String) | Split on a regular expression. | Splitter.onPattern("\r?\n") |
Splitter.fixedLength(int) | Splits strings into substrings of the specified fixed length. The last piece can be smaller than length , but will never be empty. | Splitter.fixedLength(3) |
Modifiers
Method | Description | Example |
---|---|---|
omitEmptyStrings() | Automatically omits empty strings from the result. | Splitter.on(',').omitEmptyStrings().split("a,,c,d") returns "a", "c", "d" |
trimResults() | Trims whitespace from the results; equivalent to trimResults(CharMatcher.WHITESPACE) . | Splitter.on(',').trimResults().split("a, b, c, d") returns "a", "b", "c", "d" |
trimResults(CharMatcher) | Trims characters matching the specified CharMatcher from results. | Splitter.on(',').trimResults(CharMatcher.is('_')).split("_a ,_b_ ,c__") returns "a ", "b_ ", "c" . |
limit(int) | Stops splitting after the specified number of strings have been returned. | Splitter.on(',').limit(3).split("a,b,c,d") returns "a", "b", "c,d" |
想要得到一个List的时候请使用splitToList()
方法。
Warning: splitter 的实例始终是不可变的。 splitter 配置方法始终返回一个新的 Splitter
对象。这使得 Splitter
是线程安全的,可以当做static final
常数被使用。
Map Splitters
还可以使用splitter通过指定第二定界符使用 withKeyValueSeparator()
来反序列化map。产生的 MapSplitter
将使用分割器的分隔符将输入分割成entries,然后使用给定的键值分隔符将这些entries分割成键和值,返回Map<String,String>。
CharMatcher
之前, StringUtil
类不受限制,有很多这样的方法:
allAscii
collapse
collapseControlChars
collapseWhitespace
lastIndexNotOf
numSharedChars
removeChars
removeCrLf
retainAllChars
strip
stripAndCollapse
stripNonDigits
它们代表了两个概念的部分交叉积:
- 是什么构成了 "matching" 字符?
- 这些 "matching" 字符是干什么的?
为了简化这个困境,我们开发了 CharMatcher
.
你可以认为 CharMatcher
代表着一个特殊的字符类,像数字和空白一样。实际上, CharMatcher
只是一个字符上的布尔谓词 -- CharMatcher
实现了 [Predicate<Character>
] -- 但是其太普遍代表着的是 "all whitespace characters" 或者 "all lowercase letters," Guava 为字符提供专门的语法和API。
CharMatcher的效用在于它允许在指定的字符类出现时执行的操作: trimming, collapsing, removing, retaining, 等.
String noControl = CharMatcher.javaIsoControl().removeFrom(string); // remove control
characters String theDigits = CharMatcher.digit().retainFrom(string); // only the digits
String spaced = CharMatcher.whitespace().trimAndCollapseFrom(string, ' '); // trim whitespace at ends, and replace/collapse whitespace into single spaces
String noDigits = CharMatcher.javaDigit().replaceFrom(string, "*"); // star out all digits
String lowerAndDigit = CharMatcher.javaDigit().or(CharMatcher.javaLowerCase()).retainFrom(string); // eliminate all characters that aren't digits or lowercase
Note: CharMatcher
仅处理 char
值; 其不理解 0x10000 到 0x10FFFF.的补充Unicode码点。这样的逻辑字符使用代理对编码为String,CharMatcher将这些字符视为两个独立的字符。
Obtaining CharMatchers
Many needs can be satisfied by the provided CharMatcher
factory methods:
any()
none()
whitespace()
breakingWhitespace()
invisible()
digit()
javaLetter()
javaDigit()
javaLetterOrDigit()
javaIsoControl()
javaLowerCase()
javaUpperCase()
ascii()
singleWidth()
Other common ways to obtain a CharMatcher
include:
Method | Description |
---|---|
anyOf(CharSequence) | Specify all the characters you wish matched. For example, CharMatcher.anyOf("aeiou") matches lowercase English vowels. |
is(char) | Specify exactly one character to match. |
inRange(char, char) | Specify a range of characters to match, e.g. CharMatcher.inRange('a', 'z') . |
Additionally, CharMatcher
has negate()
, and(CharMatcher)
, and or(CharMatcher)
. These provide simple boolean operations on CharMatcher
.
Using CharMatchers
CharMatcher
provides a wide variety of methods to operate on occurrences of the specified characters in any CharSequence
. There are more methods provided than we can list here, but some of the most commonly used are:
Method | Description |
---|---|
collapseFrom(CharSequence, char) | Replace each group of consecutive matched characters with the specified character. For example, WHITESPACE.collapseFrom(string, ' ') collapses whitespaces down to a single space. |
matchesAllOf(CharSequence) | Test if this matcher matches all characters in the sequence. For example, ASCII.matchesAllOf(string) tests if all characters in the string are ASCII. |
removeFrom(CharSequence) | Removes matching characters from the sequence. |
retainFrom(CharSequence) | Removes all non-matching characters from the sequence. |
trimFrom(CharSequence) | Removes leading and trailing matching characters. |
replaceFrom(CharSequence, CharSequence) | Replace matching characters with a given sequence. |
(Note: all of these methods return a String
, except for matchesAllOf
, which returns a boolean
.)
Charsets
Don't do this:
try {
bytes = string.getBytes("UTF-8");
} catch (UnsupportedEncodingException e) {
// how can this possibly happen?
throw new AssertionError(e);
}
Do this instead:
bytes = string.getBytes(Charsets.UTF_8);
Charsets
provides constant references to the six standard Charset
implementations guaranteed to be supported by all Java platform implementations. Use them instead of referring to charsets by their names.
TODO: an explanation of charsets and when to use them
(Note: If you're using JDK7, you should use the constants in StandardCharsets
CaseFormat
CaseFormat
is a handy little class for converting between ASCII case conventions — like, for example, naming conventions for programming languages. Supported formats include:
Format | Example |
---|---|
LOWER_CAMEL | lowerCamel |
LOWER_HYPHEN | lower-hyphen |
LOWER_UNDERSCORE | lower_underscore |
UPPER_CAMEL | UpperCamel |
UPPER_UNDERSCORE | UPPER_UNDERSCORE |
Using it is relatively straightforward:
CaseFormat.UPPER_UNDERSCORE.to(CaseFormat.LOWER_CAMEL, "CONSTANT_NAME"); // returns "constantName"
We find this especially useful, for example, when writing programs that generate other programs.
Strings
A limited number of general-purpose String
utilities reside in the Strings
class.