R for Data Science总结之——Strings

最新推荐文章于 2024-06-09 21:26:20 发布

我要养只哈士奇

最新推荐文章于 2024-06-09 21:26:20 发布

阅读量334

点赞数 1

分类专栏： R Data Science R语言数据挖掘tidyverse框架

本文链接：https://blog.csdn.net/weixin_38423453/article/details/84379520

版权

R for Data Science总结之——Strings

这一章关注R中字符串和正则表达式的处理，主要函数有：
str_length() str_c() str_sub() str_sort() str_to_upper() str_view() str_detect() str_count() str_extract() str_match() str_replace() str_split() str_locate()

library(tidyverse)
library(stringr)

在R中，字符串是用""或’'括起来表示的：

string1 <- "This is a string"
string2 <- 'If I want to include a "quote" inside a string, I use single quotes'

如果想在字符串中添加"或’，需要用\符：

double_quote <- "\"" # or '"'
single_quote <- '\'' # or "'"

同理想在字符串中添加\：

x <- c("\"", "\\")
x
#> [1] "\"" "\\"
writeLines(x)
#> "
#> \

其他类似的包括"\n"换行，"\t"跳格等，可以通过 ?"’“或?’”'查看，也可用于输出非英语字符：

x <- "\u00b5"
x
#> [1] "µ"

字符串向量：

c("one", "two", "three")
#> [1] "one"   "two"   "three"

str_length()

str_length(c("a", "R for data science", NA))
#> [1]  1 18 NA

str_c()

str_c("x", "y")
#> [1] "xy"
str_c("x", "y", "z")
#> [1] "xyz"

sep关键字：

str_c("x", "y", sep = ", ")
#> [1] "x, y"

NA处理：

x <- c("abc", NA)
str_c("|-", x, "-|")
#> [1] "|-abc-|" NA
str_c("|-", str_replace_na(x), "-|")
#> [1] "|-abc-|" "|-NA-|"

合并字符串还会自动循环较短的字符串与长字符串吻合：

str_c("prefix-", c("a", "b", "c"), "-suffix")
#> [1] "prefix-a-suffix" "prefix-b-suffix" "prefix-c-suffix"

若想将一个字符串向量整合成一个单一字符串，可使用collapse关键字：

str_c(c("x", "y", "z"), collapse = ", ")
#> [1] "x, y, z"

str_sub()

x <- c("Apple", "Banana", "Pear")
str_sub(x, 1, 3)
#> [1] "App" "Ban" "Pea"
# negative numbers count backwards from end
str_sub(x, -3, -1)
#> [1] "ple" "ana" "ear"

若超出字符串长度范围会尽量返回更长的字符：

str_sub("a", 1, 5)
#> [1] "a"

也可以用子集模式对字符串进行修饰：

str_sub(x, 1, 1) <- str_to_lower(str_sub(x, 1, 1))
x
#> [1] "apple"  "banana" "pear"

除去str_to_lower()，还有str_to_upper()和str_to_title()等。

str_to_upper(), str_sort()

对于不同语言，其字母变换规则不同，这时需要指定locale关键字：

# Turkish has two i's: with and without a dot, and it
# has a different rule for capitalising them:
str_to_upper(c("i", "ı"))
#> [1] "I" "I"
str_to_upper(c("i", "ı"), locale = "tr")
#> [1] "İ" "I"

关键字的制定符合ISO 639规则。若对字符串进行排序，base R中的order()和sort()遵循当前的locale，而str_sort()和str_order()则可以添加一个locale关键字：

str_sort(x, locale = "en")  # English
#> [1] "apple"    "banana"   "eggplant"

str_sort(x, locale = "haw") # Hawaiian
#> [1] "apple"    "eggplant" "banana"

str_view()

x <- c("apple", "banana", "pear")
str_view(x, "an")

.匹配任意字符串

str_view(x, ".a.")

若要匹配.则需要\，但正则表达式本身是字符串，则输入\本身还需要一个\，则需要用"\."匹配一个.：

# To create the regular expression, we need \\
dot <- "\\."

# But the expression itself only contains one:
writeLines(dot)
#&g

最低0.47元/天解锁文章

我要养只哈士奇

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
R for Data Science总结之——Strings

R for Data Science之——Strings这一章关注R中字符串和正则表达式的处理：library(tidyverse)library(stringr)在R中，字符串是用&amp;amp;quot;&amp;amp;quot;或’'括起来表示的：string1 &amp;amp;amp;lt;- &amp;amp;quot;This is a string&amp;amp;quot;string2 &amp;amp;am
复制链接

扫一扫