R语言 正则表达式 stringr包

stringr包函数处理文本是游刃有余的

一、元字符

在正则表达式中,有12个字符有特殊用途

字符含义
[ ]括号内的任意字符串
\有两个含义:1、 对元字符串进行转义     2、一些以 \ 开头的特殊序列表达了一些字符串组
^匹配字符串的开始,将^置于character class的首位表达的意思是取反义
如:[^5]表示除了‘5’以外的任何字符
$匹配字符串的结束。但将它置于character class内则消除了它的特殊含义。
如: [akm$]将匹配’a’,’k’,’m’或者’$’
.匹配除换行符以外的任意字符。
|或者
前面的字符(组)最多被匹配一次
*前面的字符(组)将被匹配零次或多次
+前面的字符(组)将被匹配一次或多次
()表示一个字符组,括号内的字符串将作为一个整体被匹配。
  

1.1  重复



1.2 转义

如果我们想查找元字符本身,如”?”和”*“,我们需要提前告诉编译系统,取消这些字符的特殊含义。

这个时候,就需要用到转义字符\,即使用\?和\*.当然,如果我们要找的是\,则使用\\进行匹配。

注:R中的转义字符则是双斜杠:\\


1.3 R中预定义的字符组


1.4 代表字符组的特殊符号


二、主要函数

str_extract()   提取首个匹配模式的字符
str_extract_all(shopping_list, "\\b[a-z]+\\b")
shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
str_extract(shopping_list, "\\d")
[1] "4" NA  NA  "2"
str_extract_all(shopping_list, "\\b[a-z]+\\b")
[[1]]
[1] "apples"

[[2]]
[1] "bag"   "of"    "flour"

[[3]]
[1] "bag"   "of"    "sugar"

[[4]]
[1] "milk"

str_locate()           返回首个匹配模式的字符的位置 
str_locate_all()    返回所有匹配模式的字符的位置 
fruit <- c("apple", "banana", "pear", "pineapple")
str_locate(fruit, "a")
     start end
[1,]     1   1
[2,]     2   2
[3,]     3   3
[4,]     5   5

 str_locate_all(fruit, "a")
[[1]]
     start end
[1,]     1   1

[[2]]
     start end
[1,]     2   2
[2,]     4   4
[3,]     6   6

[[3]]
     start end
[1,]     3   3

[[4]]
     start end
[1,]     5   5

str_replace()        替换首个匹配模式 
str_replace_all() 替换所有匹配模式 
fruits <- c("one apple", "two pears", "three bananas")
str_replace(fruits, "[aeiou]", "_")
[1] "_ne apple"     "tw_ pears"     "thr_e bananas"

str_replace_all(fruits, "([aeiou])", "")
[1] "n ppl"    "tw prs"   "thr bnns"

str_split()              按照模式分割字符串 

str_split_fixed()   按照模式将字符串分割成指定个数 

fruits <- c(
     "apples and oranges and pears and bananas",
     "pineapples and mangos and guavas"
   )
   str_split(fruits, " and ")
[[1]]
[1] "one apple"

[[2]]
[1] "two pears"

[[3]]
[1] "three bananas"

str_split(fruits, " and ", simplify = TRUE)
     [,1]           
[1,] "one apple"    
[2,] "two pears"    
[3,] "three bananas"

str_split_fixed(fruits, " and ", 2)
     [,1]            [,2]
[1,] "one apple"     ""  
[2,] "two pears"     ""  
[3,] "three bananas" ""  

str_detect()   检测字符是否存在某些指定模式 

fruit <- c("apple", "banana", "pear", "pinapple")
str_detect(fruit, "a")
[1] TRUE TRUE TRUE TRUE

str_count()  返回指定模式出现的次数 

fruit <- c("apple", "banana", "pear", "pineapple")
str_count(fruit, "a")
[1] 1 3 1 1

三、其他重要函数

str_sub()   提取指定位置的字符 

hw <- "Hadley Wickham"
str_sub(hw, 1, 6)
[1] "Hadley"

str_dup() 重复指定位置的字符 

fruit <- c("apple", "pear", "banana")
str_dup(fruit, 2)
[1] "appleapple"   "pearpear"     "bananabanana"


str_length() 返回字符的长度 

fruit <- c("apple", "pear", "banana")
str_length(fruit)
[1] 5 4 6


str_pad()  填补字符 

str_pad(c("a", "abc", "abcdef"), 10)
[1] "    a"                "         a"           "                   a"


str_trim() 丢弃填充,如去掉字符前后的空格 

str_trim("  String with trailing and leading white space\t")
[1] "String with trailing and leading white space"

str_trim("\n\nString with trailing and leading white space\n\n")
[1] "String with trailing and leading white space"


str_c() 连接字符 

str_c(letters, collapse = ", ")
[1] "a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y,


  • 1
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值