R语言自学笔记：stringr-CSDN博客

本文链接：https://blog.csdn.net/swiiss/article/details/123147012

一.string基础

1. 几个键入特殊字符的例子：

( double_quote <- "\"" );
## [1] "\""
( single_quote <- '\'' );
## [1] "'"
( x <- "\u00b5" )
## [1] "µ"
## 注意不同！！！！
( y <- "\\" )
## [1] "\\"
writeLines( y );
## \

可见writeLines这个函数，可以将string变量输出成打印的样式；

2. 统计string的长度

（1）nchar：

> nchar( c("a", "Hello world", NA) );
[1]  1 11 NA

（2）str_length：来自于stringr

> str_length( c("a", "Hello world", NA) );
[1]  1 11 NA

3. 将string/character进行合并

可以使用paste或者来自stringr的str_c：

> paste( "a", "b", "c", sep = "" );
[1] "abc"
> str_c( "a", "b", "c" );
[1] "abc"
> paste( c( "a", "b", "c" ), 1, sep = "" );
[1] "a1" "b1" "c1"
> str_c( c( "a", "b", "c" ), 1 );
[1] "a1" "b1" "c1"

paste函数要强调sep参数

二. 正则匹配

1. 正则的匹配符号

（1）正则匹配符如下图：

（2）限定匹配的位置：

注意，在这里有一个单词的概念，在一个string中，以空格为分割的部分是可以被识别出来的：

> a <- c('i am a pig' ,'i am a cat','it iss 9:00 am','hello world')
> str_subset(a,"am")
[1] "i am a pig"     "i am a cat"     "it is 9:00 am"
> str_subset(a,"am$")
[1] "it is 9:00 am"
> str_subset(a,"am\\b")
[1] "i am a pig"     "i am a cat"     "it is 9:00 am"

（3）限定匹配的数量

（4）特别的群组

（5）R语言的转义符

?Quotes

可以输入以上字段来进行查询

2. 相关的函数

（1）提取匹配的部分

str_extract()提取恰好匹配的字段,返回第一个匹配的向量。若用str_extrac_all()则匹配一个string中所有符合的向量

> "1234abc"%>%str_extract("\\d{2,6}")
[1] "1234"
> "1234abc"%>%str_extract("\\d{3}")
[1] "123"

str_subse() 提取符合条件的子集，注意subset由参数negate，其值为TRUE时可以达到反选的目的。

str_match()功能和str_extract差不多，但是提取后的结果储存在一个矩阵里：

For str_match, a character matrix. First column is the complete match, followed by one column for each capture group. For str_match_all, a list of character matrices.

(2) 查看目标string里有误partten

str_detect（）

c("123","abc","hellof5")%>%str_detect("\\d+");
##[1]TRUE FALSE TRUE

（3）定位匹配单位的位置

str_locate（）

返回一个矩阵，每行是不同的匹配单位，第一列是该单位的起始位置，第二列是结束位置

（4）更换匹配内容

str_replace（）

> str_replace(c("123","abc","hellof5"),"\\d+","###");
[1] "###"       "abc"       "hellof###"

（5）实用小函数

str_to_title()将首字母大写

将string移到新长度string的某个位置：

> str_pad("hadley", 30, "left")
[1] "                        hadley"

更多相关函数见https://stringr.tidyverse.org