R语言正则表达式 stringr包

最新推荐文章于 2023-10-27 11:00:42 发布

zxy_clover

最新推荐文章于 2023-10-27 11:00:42 发布

阅读量2.3k

点赞数 1

本文链接：https://blog.csdn.net/zxy_clover/article/details/78312655

版权

stringr包函数处理文本是游刃有余的

一、元字符

在正则表达式中，有12个字符有特殊用途

字符	含义
[ ]	括号内的任意字符串
\	有两个含义：1、对元字符串进行转义 2、一些以 \ 开头的特殊序列表达了一些字符串组
^	匹配字符串的开始，将^置于character class的首位表达的意思是取反义如：[^5]表示除了‘5’以外的任何字符
$	匹配字符串的结束。但将它置于character class内则消除了它的特殊含义。如： [akm$]将匹配’a’,’k’,’m’或者’$’
.	匹配除换行符以外的任意字符。
\|	或者
？	前面的字符(组)最多被匹配一次
*	前面的字符(组)将被匹配零次或多次
+	前面的字符(组)将被匹配一次或多次
()	表示一个字符组，括号内的字符串将作为一个整体被匹配。

1.1 重复

1.2 转义

如果我们想查找元字符本身，如”?”和”*“，我们需要提前告诉编译系统，取消这些字符的特殊含义。

这个时候，就需要用到转义字符\，即使用\?和\*.当然，如果我们要找的是\,则使用\\进行匹配。

注：R中的转义字符则是双斜杠：\\

1.3 R中预定义的字符组

1.4 代表字符组的特殊符号

二、主要函数

str_extract() 提取首个匹配模式的字符

str_extract_all(shopping_list, "\\b[a-z]+\\b")

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
str_extract(shopping_list, "\\d")
[1] "4" NA  NA  "2"
str_extract_all(shopping_list, "\\b[a-z]+\\b")
[[1]]
[1] "apples"

[[2]]
[1] "bag"   "of"    "flour"

[[3]]
[1] "bag"   "of"    "sugar"

[[4]]
[1] "milk"

str_locate() 返回首个匹配模式的字符的位置

str_locate_all() 返回所有匹配模式的字符的位置

fruit <- c("apple", "banana", "pear", "pineapple")
str_locate(fruit, "a")
     start end
[1,]     1   1
[2,]     2   2
[3,]     3   3
[4,]     5   5

 str_locate_all(fruit, "a")
[[1]]
     start end
[1,]     1   1

[[2]]
     start end
[1,]     2   2
[2,]     4   4
[3,]     6   6

[[3]]
     start end
[1,]     3   3

[[4]]
     start end
[1,]     5   5

str_replace() 替换首个匹配模式

str_replace_all() 替换所有匹配模式

fruits <- c("one apple", "two pears", "three bananas")
str_replace(fruits, "[aeiou]", "_")
[1] "_ne apple"     "tw_ pears"     "thr_e bananas"

str_replace_all(fruits, "([aeiou])", "")
[1] "n ppl"    "tw prs"   "thr bnns"

str_split() 按照模式分割字符串

str_split_fixed() 按照模式将字符串分割成指定个数

fruits <- c(
     "apples and oranges and pears and bananas",
     "pineapples and mangos and guavas"
   )
   str_split(fruits, " and ")
[[1]]
[1] "one apple"

[[2]]
[1] "two pears"

[[3]]
[1] "three bananas"

str_split(fruits, " and ", simplify = TRUE)
     [,1]           
[1,] "one apple"    
[2,] "two pears"    
[3,] "three bananas"

str_split_fixed(fruits, " and ", 2)
     [,1]            [,2]
[1,] "one apple"     ""  
[2,] "two pears"     ""  
[3,] "three bananas" ""

str_detect() 检测字符是否存在某些指定模式

fruit <- c("apple", "banana", "pear", "pinapple")
str_detect(fruit, "a")
[1] TRUE TRUE TRUE TRUE

str_count() 返回指定模式出现的次数

fruit <- c("apple", "banana", "pear", "pineapple")
str_count(fruit, "a")
[1] 1 3 1 1

三、其他重要函数

str_sub() 提取指定位置的字符

hw <- "Hadley Wickham"
str_sub(hw, 1, 6)
[1] "Hadley"

str_dup() 重复指定位置的字符

fruit <- c("apple", "pear", "banana")
str_dup(fruit, 2)
[1] "appleapple"   "pearpear"     "bananabanana"

str_length() 返回字符的长度

fruit <- c("apple", "pear", "banana")
str_length(fruit)
[1] 5 4 6

str_pad() 填补字符

str_pad(c("a", "abc", "abcdef"), 10)
[1] "    a"                "         a"           "                   a"

str_trim() 丢弃填充，如去掉字符前后的空格

str_trim("  String with trailing and leading white space\t")
[1] "String with trailing and leading white space"

str_trim("\n\nString with trailing and leading white space\n\n")
[1] "String with trailing and leading white space"

str_c() 连接字符

str_c(letters, collapse = ", ")
[1] "a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y,

zxy_clover

关注

1
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
R语言正则表达式 stringr包

stringr包函数处理文本是游刃有余的一、元字符在正则表达式中，有12个字符有特殊用途字符含义[ ]括号内的任意字符串\有两个含义：1、对元字符串进行转义 2、一些以 \ 开头的特殊序列表达了一些字符串组^匹配字符串的开始，将^置于character clas
复制链接

扫一扫