1.字符串操作函数
可以用character()函数构造初始字符型向量
> character(length=5)
[1] "" "" "" "" ""
grep
grep(pattern,x)语句在字符串向量x里捜索给定子字符串pattern。
如果x有n个元素,即包含n个字符串,则grep(pattern,x)会返回一个长度不超过n的向量。这个向量的每个元素是x的索引,表示在索引对应的元素x[i]中有与pattern匹配的子字符串。
> grep("Pole",c("Equator","North Pole","South Pole"))
[1] 2 3
> grep("pole",c("Equator","North Pole","South Pole"))
integer(0)
nchar
nchar()函数:统计字符型向量中每个分量的字符个数
> y<-c("er","sdf","eir","jk","dim")
> y
[1] "er" "sdf" "eir" "jk" "dim“
> nchar(y)
[1] 2 3 3 2 3
paste
paste()函数:将多个对象(如向量)“粘贴”在一起
> paste("North","Pole")[1] "North Pole"
> paste("North","Pole",sep="")[1] "NorthPole"
> paste("North","Pole",sep=".")[1] "North.Pole"
> paste("North","and","South","Poles")[1] "North and South Poles"
> paste(1:5)
[1] "1" "2" "3" "4" "5"
> paste("1","2","3","4","5")
[1] "1 2 3 4 5"
> paste(c("x","y"),1:6,seq="")
[1] "x 1 " "y 2 " "x 3 " "y 4 " "x 5 " "y 6 "
> paste("result.",1:4,sep="")
[1] "result.1" "result.2" "result.3" "result.4"
sprintf
sprintf()函数:把若干个字符串或变量按一定格式“打印”到字符串里
> i <- 8
> s <- sprintf("the square of %d is %d",i,i^2)
> s
[1] "the square of 8 is 64"
substr()、substring()函数
substr()函数和substring()函数是截取字符串最常用的函数,两个函数功能方面是一样的。
- substr()函数:必须设置参数start和stop,如果缺少将出错。
- substring()函数:可以只设置first参数,last参数若不设置,则默认为1000000L,通常是指字符串的最大长度。
substr(x, start, stop)
substring(text, first, last = 1000000L)substr(x, start, stop) <- value
substring(text, first, last = 1000000L) <- value
substr()函数:对给定的一个字符串执行一次求子串操作
substring()函数:对给定的一个字符串执行给定次求子串操作
> substr("abcdef",2,4)
[1] "bcd“
> substring("abcdef",2,4)
[1] "bcd"
区别来了:
> substr("abcdef",1:6,1:6)
[1] "a“
> substring("abcdef",1:6,1:6)
> #1-1 2-2 3-3 4-4 5-5 6-6
[1] "a" "b" "c" "d" "e" "f"
> substr("abcdef",1:6,6)
[1] "abcdef"
> substring("abcdef",1:6,6)
> # 1-6 2-6 3-6 4-6 5-6 6-6
[1] "abcdef" "bcdef" "cdef" "def" "ef" "f"
下面变一样了!!
> substr(rep("abcdef",4),1:4,4:5)
[1] "abcd" "bcde" "cd" "de"
> substring(rep("abcdef",4),1:4,4:5)
[1] "abcd" "bcde" "cd" "de"
> x<-c("asfef", "qwerty", "yuiop[", "b", "stuff.blah.yech")
> substr(x, 2, 5)
[1] "sfef" "wert" "uiop" "" "tuff"
> x<-c("asfef", "qwerty", "yuiop[", "b", "stuff.blah.yech")
> substring(x, 2, 5)
[1] "sfef" "wert" "uiop" "" "tuff"
对给定的多个字符串,每个依序执行一次。此时,substr()、substring()函数等效
> x<-c("asfef", "qwerty", "yuiop[", "b", "stuff.blah.yech")
> substr(x, 1:3, 5:7)
[1] "asfef" "werty" "iop[" "b" "tuff."
> substring(x, 1:3, 5:7)
[1] "asfef" "werty" "iop[" "b" "tuff."
strplit
strsplit()函数:将字符型向量分解成多个字符串。返回值为列表。
unlist()函数:将列表转换成向量。
> strsplit("aa bb cc dd ee ff"," ")
[[1]]
[1] "aa" "bb" "cc" "dd" "ee" "ff"
> unlist(strsplit("a.b.c.e.f",".",fixed=TRUE))
[1] "a" "b" "c" "e" "f"
regexpr
regexpr(pattern,text)在字符串text中寻找pattern,返回与pattern匹配的第一个子字符串的起始字符位置。
> regexpr("uat","Equator")
[1] 3
attr(,"match.length")
[1] 3
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
gregexpr(pattern,text)的功能与regexpr()一样,不过它会寻找与pattern匹配的全部子字符串的起始位置。
> gregexpr("iss","Mississippi")
[[1]]
[1] 2 5
attr(,"match.length")
[1] 3 3
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
正则表达式
正则表达式是一种通配符,它是用来描述一系列字符串的简略表达式。例如,表达式"[au]"表示的是含有字母a或u的字符串。可以这样使用:
> grep("[au]",c("Equator","North Pole","South Pole"))
[1] 1 3
英文句点(.)表示任意一个字符
> grep("o.e",c("Equator","North Pole","South Pole"))
[1] 2 3
> grep("N..t",c("Equator","North Pole","South Pole"))
[1] 2
> grep(".",c("abc","de","f.g"))
[1] 1 2 3
> grep("\\.",c("abc","de","f.g"))
[1] 3
结束啦!