R for Data Science总结之——Strings

R for Data Science总结之——Strings

这一章关注R中字符串和正则表达式的处理,主要函数有:
str_length() str_c() str_sub() str_sort() str_to_upper() str_view() str_detect() str_count() str_extract() str_match() str_replace() str_split() str_locate()

library(tidyverse)
library(stringr)

在R中,字符串是用""或’'括起来表示的:

string1 <- "This is a string"
string2 <- 'If I want to include a "quote" inside a string, I use single quotes'

如果想在字符串中添加"或’,需要用\符:

double_quote <- "\"" # or '"'
single_quote <- '\'' # or "'"

同理想在字符串中添加\:

x <- c("\"", "\\")
x
#> [1] "\"" "\\"
writeLines(x)
#> "
#> \

其他类似的包括"\n"换行,"\t"跳格等,可以通过 ?"’“或?’”'查看,也可用于输出非英语字符:

x <- "\u00b5"
x
#> [1] "µ"

字符串向量:

c("one", "two", "three")
#> [1] "one"   "two"   "three"

str_length()

str_length(c("a", "R for data science", NA))
#> [1]  1 18 NA

str_c()

str_c("x", "y")
#> [1] "xy"
str_c("x", "y", "z")
#> [1] "xyz"

sep关键字:

str_c("x", "y", sep = ", ")
#> [1] "x, y"

NA处理:

x <- c("abc", NA)
str_c("|-", x, "-|")
#> [1] "|-abc-|" NA
str_c("|-", str_replace_na(x), "-|")
#> [1] "|-abc-|" "|-NA-|"

合并字符串还会自动循环较短的字符串与长字符串吻合:

str_c("prefix-", c("a", "b", "c"), "-suffix")
#> [1] "prefix-a-suffix" "prefix-b-suffix" "prefix-c-suffix"

若想将一个字符串向量整合成一个单一字符串,可使用collapse关键字:

str_c(c("x", "y", "z"), collapse = ", ")
#> [1] "x, y, z"

str_sub()

x <- c("Apple", "Banana", "Pear")
str_sub(x, 1, 3)
#> [1] "App" "Ban" "Pea"
# negative numbers count backwards from end
str_sub(x, -3, -1)
#> [1] "ple" "ana" "ear"

若超出字符串长度范围会尽量返回更长的字符:

str_sub("a", 1, 5)
#> [1] "a"

也可以用子集模式对字符串进行修饰:

str_sub(x, 1, 1) <- str_to_lower(str_sub(x, 1, 1))
x
#> [1] "apple"  "banana" "pear"

除去str_to_lower(),还有str_to_upper()和str_to_title()等。

str_to_upper(), str_sort()

对于不同语言,其字母变换规则不同,这时需要指定locale关键字:

# Turkish has two i's: with and without a dot, and it
# has a different rule for capitalising them:
str_to_upper(c("i", "ı"))
#> [1] "I" "I"
str_to_upper(c("i", "ı"), locale = "tr")
#> [1] "İ" "I"

关键字的制定符合ISO 639规则。若对字符串进行排序,base R中的order()和sort()遵循当前的locale,而str_sort()和str_order()则可以添加一个locale关键字:

str_sort(x, locale = "en")  # English
#> [1] "apple"    "banana"   "eggplant"

str_sort(x, locale = "haw") # Hawaiian
#> [1] "apple"    "eggplant" "banana"

str_view()

x <- c("apple", "banana", "pear")
str_view(x, "an")

.匹配任意字符串

str_view(x, ".a.")

若要匹配.则需要\,但正则表达式本身是字符串,则输入\本身还需要一个\,则需要用"\."匹配一个.:

# To create the regular expression, we need \\
dot <- "\\."

# But the expression itself only contains one:
writeLines(dot)
#&g
  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值