基本操作
1.str_length() :get the number of characters of all the elements
2.str_c() 组合
例:
str_c("x","y",sep=",") 得到x,y
str_c(c("x","y"),collapse=",") 得到x,y
3.str_to_upper() str_to_lower() 变大写小写
4. str_sub() 截取
例:x <- c("apple")
str_sub(x,1,3) 得app
str_sub(x,-3,-1) 得ple #负号表示从后往前数的位置
匹配
str_view( ) 匹配第一个
str_view_all( ) 匹配所有
形为:str_view(_,"")
str_subset():form a vector that contains only the elements that match
#特殊字符,用\来抓,如:\^,\",\\
#add match = T
or match = F
to show only the elements that match or do not match
几种用法:
1. ^始$终 例: ^...$ ,^an,ppl$
2. &且|或 例:gr(a|e)y
3. \\d:任意数字 \\d{3}:任意3个数字 \\s:空格
4. [abc]:a或b或c [^abc]:any character except a,b or c
[a-z]:any letter from a to z
[A-Za-z]:any letter
#This is only one character
5. Times of Repetition(!targeted to one character before)
?:0 or 1
+:no less than 1
*:any times
{n}:n
{n,m}:n to m
{n,}:at least n
{,m}:no more than m
#Usually R matches as much as possible.Add a ?(example:P{2,4}?)to let it match as little as possible.
6. Backreference
(.)(.)\\2\\1 match XYYX
(..)\\1 match X1X2X1X2
#It is not necessarily (.).It can be some other characters inside.
检测
str_detect():return to a logical value showing whether the normal expression can match the string
str_count():count the number of times that it matches
# words[str_detect(words," ")] can show the result
##a special way to match by several more steps
#Find all words that start with a vowel and end with a consonant.
str_subset(words, "^[aeiou].*[^aeiou]$") %>% head()
#> [1] "about" "accept" "account" "across" "act" "actual"
words[str_detect(words, "^[aeiou]") & str_detect(words, "[^aeiou]$")] %>% head()
#> [1] "about" "accept" "account" "across" "act" "actual"
get all the variables that ends with “time” in flights:
end_time <- names(flights) %>%
str_subset(".*time$")
flights %>%
select(end_time)
提取
str_extract()::extract the first match
str_extract_all():extract all the matches
#Add simplify=T to form a matrix
A very awesome way of grouped matching:
#Example:
tibble(st=str_subset(sentences,"\\s([^ ]+)\\'([^ ]+)")) %>%
extract(st,c("before","after"),"\\s([^ ]+)\\'([^ ]+)",remove = F)
#remove=F can leave the original strings
#I can get a matrix showing the group.Here ( ) marks the group.
#Usually ([^ ]+) gets the one that matches.
Replacing matches
x <- c("apple", "pear", "banana")
str_replace(x, "[aeiou]", "-")
str_replace_all(x, "[aeiou]", "-")
[1] "-pple" "p-ar" "b-nana"
[2] "-ppl-" "p--r" "b-n-n-"
#replace before with after
x <- c("1 house", "2 cars", "3 people")
str_replace_all(x, c("1" = "one", "2" = "two", "3" = "three"))
[1] "one house" "two cars" "three people"
#use a vector to replace matches
sentences %>%
str_replace("([^ ]+) ([^ ]+) ([^ ]+)", "\\1 \\3 \\2")
#flip the order
Splitting
str_split(sentences," ") #split the vector by space
#add simplify=T to get a matrix
str_split(sentences,boundary("word"),simplify=T) %>% tibble() #a very good way of splitting
Split up a string like “apples, pears, and bananas” into individual components.
str_split("apples, pears, and bananas",", (and )?")