一.string基础
1. 几个键入特殊字符的例子:
( double_quote <- "\"" );
## [1] "\""
( single_quote <- '\'' );
## [1] "'"
( x <- "\u00b5" )
## [1] "µ"
## 注意不同!!!!
( y <- "\\" )
## [1] "\\"
writeLines( y );
## \
可见writeLines这个函数,可以将string变量输出成打印的样式;
2. 统计string的长度
(1)nchar:
> nchar( c("a", "Hello world", NA) );
[1] 1 11 NA
(2)str_length:来自于stringr
> str_length( c("a", "Hello world", NA) );
[1] 1 11 NA
3. 将string/character进行合并
可以使用paste或者来自stringr的str_c:
> paste( "a", "b", "c", sep = "" );
[1] "abc"
> str_c( "a", "b", "c" );
[1] "abc"
> paste( c( "a", "b", "c" ), 1, sep = "" );
[1] "a1" "b1" "c1"
> str_c( c( "a", "b", "c" ), 1 );
[1] "a1" "b1" "c1"
paste函数要强调sep参数
二. 正则匹配
1. 正则的匹配符号
(1)正则匹配符如下图:
(2)限定匹配的位置:
注意,在这里有一个单词的概念,在一个string中,以空格为分割的部分是可以被识别出来的:
> a <- c('i am a pig' ,'i am a cat','it iss 9:00 am','hello world')
> str_subset(a,"am")
[1] "i am a pig" "i am a cat" "it is 9:00 am"
> str_subset(a,"am$")
[1] "it is 9:00 am"
> str_subset(a,"am\\b")
[1] "i am a pig" "i am a cat" "it is 9:00 am"
(3)限定匹配的数量
(4)特别的群组
(5)R语言的转义符
?Quotes
可以输入以上字段来进行查询
2. 相关的函数
(1)提取匹配的部分
str_extract()提取恰好匹配的字段,返回第一个匹配的向量。若用str_extrac_all()则匹配一个string中所有符合的向量
> "1234abc"%>%str_extract("\\d{2,6}")
[1] "1234"
> "1234abc"%>%str_extract("\\d{3}")
[1] "123"
str_subse() 提取符合条件的子集,注意subset由参数negate,其值为TRUE时可以达到反选的目的。
str_match()功能和str_extract差不多,但是提取后的结果储存在一个矩阵里:
For str_match, a character matrix. First column is the complete match, followed by one column for each capture group. For str_match_all, a list of character matrices.
(2) 查看目标string里有误partten
str_detect()
c("123","abc","hellof5")%>%str_detect("\\d+");
##[1]TRUE FALSE TRUE
(3)定位匹配单位的位置
str_locate()
返回一个矩阵,每行是不同的匹配单位,第一列是该单位的起始位置,第二列是结束位置
(4)更换匹配内容
str_replace()
> str_replace(c("123","abc","hellof5"),"\\d+","###");
[1] "###" "abc" "hellof###"
(5)实用小函数
str_to_title()将首字母大写
将string移到新长度string的某个位置:
> str_pad("hadley", 30, "left")
[1] " hadley"