1. Basic functions
The basic functions of R to deal with regular expression is nchar, tolower, toupper, chartr, paste
.
nchar
gives the number of characters of each element in a character vector.
> temp <- c("Hello", "Kitty", "word")
> nchar(temp)
[1] 5 5 4
tolower, toupper
is translate lower and upper cases.
> tolower(temp)
[1] "hello" "kitty" "word"
> toupper(temp)
[1] "HELLO" "KITTY" "WORD"
chartr(old, new, x)
translates each character in x that is specified in old to the corresponding character specified in new
> chartr("HKw", "ABs", temp)
[1] "Aello" "Bitty" "sord"
paste
concatenates vectors after converting to character
> paste ("A", 1:3, sep = "")
[1] "A1" "A2" "A3"
> paste ("A", 1:3, sep = "*")
[1] "A*1" "A*2" "A*3"
> paste(c("A", "B"), 1:7)
[1] "A 1" "B 2" "A 3" "B 4" "A 5" "B 6" "A 7"
> paste("A", 1:5, sep="", collapse = "-")
[1] "A1-A2-A3-A4-A5"
> paste("A", 1:5, sep="", collapse = "*")
[1] "A1*A2*A3*A4*A5"
2.Complex functions
grep
searches for match and return their subscript(place in the vector).grepl
searches for match and return the logical value for each element in the vector if it matches.
> temp1 <- c("abs", "abd", "bss")
> grep("s$", temp1)
[1] 1 3
> grepl("s$", temp1)
[1] TRUE FALSE TRUE
> temp1[grep("s$", temp1)]
[1] "abs" "bss"
> temp1[grepl("s$", temp1)]
[1] "abs" "bss"
regexpr、gregexpr、regexec
search for march and return the place of matching characters in each element in the vector, but the format of their return is different.
> regexpr("s$", temp1)
[1] 3 -1 3
attr(,"match.length")
[1] 1 -1 1
attr(,"useBytes")
[1] TRUE
###############################
> gregexpr("s$", temp1)
[[1]]
[1] 3
attr(,"match.length")
[1] 1
attr(,"useBytes")
[1] TRUE
[[2]]
[1] -1
attr(,"match.length")
[1] -1
attr(,"useBytes")
[1] TRUE
[[3]]
[1] 3
attr(,"match.length")
[1] 1
attr(,"useBytes")
[1] TRUE
##########################################
> regexec("s$", temp1)
[[1]]
[1] 3
attr(,"match.length")
[1] 1
[[2]]
[1] -1
attr(,"match.length")
[1] -1
[[3]]
[1] 3
attr(,"match.length")
[1] 1
sub、gsub
search for match and replace them. Butsub
only replace the first match in each element (replace for all element but only the first match).gsub
replace all the matches(replace all matches for all elements)
> temp2 <- c("HelloHello", "Kitty", "Hello", "word")
> sub("Hello", "Hi", temp2)
[1] "HiHello" "Kitty" "Hi" "word"
> gsub("Hello", "Hi", temp2)
[1] "HiHi" "Kitty" "Hi" "word"