Strings in Swift 2

来源:点击打开链接

Swift provides a performant, Unicode-compliant string implementation as part of its Standard Library. In Swift 2, theString type no longer conforms to the CollectionType protocol, where String was previously a collection ofCharacter values, similar to an array. Now, String provides a characters property that exposes a character collection view.

Why the change? Although it may seem natural to model a string as a collection of characters, the String type behaves quite differently from collection types like ArraySet, or Dictionary. This has always been true, but with the addition of protocol extensions to Swift 2 these differences made it necessary to make several fundamental changes.

Different Than the Sum of Its Parts

When you add an element to a collection, you expect that the collection will contain that element. That is, when you append a value to an array, the array then contains that value. The same applies to a dictionary or a set. However, when you append a combining mark character to a string, the contents of the string itself change.

Consider the string cafe, which has four characters: caf, and e:

var letters: [Character] = ["c", "a", "f", "e"]
var string: String = String(letters)

print(letters.count) // 4
print(string) // cafe
print(string.characters.count) // 4

If you append the combining acute accent character U+0301 ´ the string still has four characters, but the last character is now é:

let acuteAccent: Character = "\u{0301}" // ´ COMBINING ACUTE ACCENT' (U+0301)

string.append(acuteAccent)
print(string.characters.count) // 4
print(string.characters.last!) // é

The string’s characters property does not contain the original lowercase e, nor does it contain the combining acute accent ´ that was just appended. Instead, the string now contains a lowercase “e” with acute accent é:

string.characters.contains("e") // false
string.characters.contains("´") // false
string.characters.contains("é") // true

If we were to treat strings like any other collection, this result would be as surprising as adding UIColor.redColor()and UIColor.greenColor() to a set and the set then reporting that it contains UIColor.yellowColor().

Judged by the Contents of Its Characters

Another difference between strings and collections is the way they determine equality.

  • Two arrays are equal only if both have the same count, and each pair of elements at corresponding indices are equal.
  • Two sets are equal only if both have the same count, and each element contained in the first set is also contained in the second.
  • Two dictionaries are equal only if they have the same set of key, value pairs.

However, String determines equality based on being canonically equivalent. Characters are canonically equivalent if they have the same linguistic meaning and appearance, even if they are composed from different Unicode scalars behind the scenes.

Consider the Korean writing system, which consists of 24 letters, or Jamo, representing individual consonants and vowels. When written out these letters are combined into characters for each syllable. For example, the character “가” ([ga]) is composed of the letters “ᄀ” ([g]) and “ᅡ” [a]. In Swift, strings are considered equal regardless of whether they are constructed from decomposed or precomposed character sequences:

let decomposed = "\u{1100}\u{1161}" // ᄀ + ᅡ
let precomposed = "\u{AC00}" // 가

decomposed == precomposed // true

Again, this behavior differs greatly from any of Swift’s collection types. It would be as surprising as an array with values  and  being considered equal to .

Depends on Your Point of View

Strings are not collections. But they do provide views that conform to CollectionType:

If we take the previous example of the word “café”, comprised of the decomposed characters [ c, a, f, e ] and [ ´ ], here's what the various string views would consist of:


  • The characters property segments the text into extended grapheme clusters, which are an approximation of user-perceived characters (in this case: c, a, f, and é). Because a string must iterate through each of its positions within the overall string (each position is called a code point) in order to determine character boundaries, accessing this property is executed in linear O(n) time. When processing strings that contain human-readable text, high-level locale-sensitive Unicode algorithms, such as those used by the localizedStandardCompare(_:) method and thelocalizedLowercaseString property, should be preferred to character-by-character processing.
  • The unicodeScalars property exposes the underlying scalar values stored in the string. If the original string were created with the precomposed character é instead of the decomposed e + ´, this would be reflected by the Unicode scalars view. Use this API when you are performing low-level manipulation of character data.
  • The utf8 and utf16 properties provide code points for the UTF–8 and UTF–16 representations, respectively. These values correspond to the actual bytes written to a file when translated to and from a particular encoding. UTF-8 code units are used by many POSIX string processing APIs, whereas UTF-16 code units are used throughout Cocoa & Cocoa Touch to represent string lengths and offsets.

For more information about working with Strings and Characters in Swift, read The Swift Programming Languageand the Swift Standard Library Reference.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值