R语言-第十一章字符串操作

1.字符串操作函数

可以用character()函数构造初始字符型向量

> character(length=5)
[1] ""  ""  ""  ""  ""

grep

grep(pattern,x)语句在字符串向量x里捜索给定子字符串pattern。
如果x有n个元素,即包含n个字符串,则grep(pattern,x)会返回一个长度不超过n的向量。这个向量的每个元素是x的索引,表示在索引对应的元素x[i]中有与pattern匹配的子字符串。

> grep("Pole",c("Equator","North Pole","South Pole"))
[1] 2 3
> grep("pole",c("Equator","North Pole","South Pole"))
integer(0)

nchar

nchar()函数:统计字符型向量中每个分量的字符个数

> y<-c("er","sdf","eir","jk","dim")

> y
[1] "er"  "sdf" "eir" "jk"  "dim“

> nchar(y)
[1] 2 3 3 2 3

paste

paste()函数:将多个对象(如向量)“粘贴”在一起

> paste("North","Pole")[1] "North Pole"
> paste("North","Pole",sep="")[1] "NorthPole"
> paste("North","Pole",sep=".")[1] "North.Pole"
> paste("North","and","South","Poles")[1] "North and South Poles"
> paste(1:5)
[1] "1" "2" "3" "4" "5"

> paste("1","2","3","4","5")
[1] "1 2 3 4 5"
> paste(c("x","y"),1:6,seq="")
[1] "x 1 " "y 2 " "x 3 " "y 4 " "x 5 " "y 6 "
> paste("result.",1:4,sep="")
[1] "result.1" "result.2" "result.3" "result.4"

sprintf

sprintf()函数:把若干个字符串或变量按一定格式“打印”到字符串里


> i <- 8
> s <- sprintf("the square of %d is %d",i,i^2)
> s
[1] "the square of 8 is 64"

substr()、substring()函数

substr()函数和substring()函数是截取字符串最常用的函数,两个函数功能方面是一样的。

  • substr()函数:必须设置参数start和stop,如果缺少将出错。
  • substring()函数:可以只设置first参数,last参数若不设置,则默认为1000000L,通常是指字符串的最大长度。

substr(x, start, stop)
substring(text, first, last = 1000000L)

substr(x, start, stop) <- value

substring(text, first, last = 1000000L) <- value

substr()函数:对给定的一个字符串执行一次求子串操作
substring()函数:对给定的一个字符串执行给定次求子串操作

> substr("abcdef",2,4)
[1] "bcd“
> substring("abcdef",2,4)
[1] "bcd"

区别来了:

> substr("abcdef",1:6,1:6)
[1] "a“
> substring("abcdef",1:6,1:6)
> #1-1 2-2 3-3 4-4 5-5 6-6
[1] "a" "b" "c" "d" "e" "f"
> substr("abcdef",1:6,6)
[1] "abcdef"
> substring("abcdef",1:6,6)
> # 1-6 2-6 3-6 4-6 5-6 6-6
[1] "abcdef" "bcdef"  "cdef"   "def"    "ef"     "f" 

下面变一样了!!

> substr(rep("abcdef",4),1:4,4:5)
[1] "abcd" "bcde" "cd"   "de"
> substring(rep("abcdef",4),1:4,4:5)
[1] "abcd" "bcde" "cd"   "de" 
> x<-c("asfef", "qwerty", "yuiop[", "b", "stuff.blah.yech")

> substr(x, 2, 5)
[1] "sfef" "wert" "uiop" ""     "tuff"
> x<-c("asfef", "qwerty", "yuiop[", "b", "stuff.blah.yech")

> substring(x, 2, 5)
[1] "sfef" "wert" "uiop" ""     "tuff"

对给定的多个字符串,每个依序执行一次。此时,substr()、substring()函数等效

> x<-c("asfef", "qwerty", "yuiop[", "b", "stuff.blah.yech")

> substr(x, 1:3, 5:7)
[1] "asfef" "werty" "iop["  "b"     "tuff."

> substring(x, 1:3, 5:7)
[1] "asfef" "werty" "iop["  "b"     "tuff."

strplit

strsplit()函数:将字符型向量分解成多个字符串。返回值为列表。
unlist()函数:将列表转换成向量。

> strsplit("aa bb cc dd ee ff"," ")
[[1]]
[1] "aa" "bb" "cc" "dd" "ee" "ff"
> unlist(strsplit("a.b.c.e.f",".",fixed=TRUE))
[1] "a" "b" "c" "e" "f"

regexpr

regexpr(pattern,text)在字符串text中寻找pattern,返回与pattern匹配的第一个子字符串的起始字符位置。

> regexpr("uat","Equator")
[1] 3
attr(,"match.length")
[1] 3
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE

gregexpr(pattern,text)的功能与regexpr()一样,不过它会寻找与pattern匹配的全部子字符串的起始位置。

> gregexpr("iss","Mississippi")
[[1]]
[1] 2 5
attr(,"match.length")
[1] 3 3
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE

正则表达式

正则表达式是一种通配符,它是用来描述一系列字符串的简略表达式。例如,表达式"[au]"表示的是含有字母a或u的字符串。可以这样使用:

> grep("[au]",c("Equator","North Pole","South Pole"))
[1] 1 3

英文句点(.)表示任意一个字符

> grep("o.e",c("Equator","North Pole","South Pole"))
[1] 2 3

> grep("N..t",c("Equator","North Pole","South Pole"))
[1] 2
> grep(".",c("abc","de","f.g"))
[1] 1 2 3

> grep("\\.",c("abc","de","f.g"))
[1] 3

结束啦!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值