Regular Expression

最新推荐文章于 2023-11-16 20:35:08 发布

向爱我

最新推荐文章于 2023-11-16 20:35:08 发布

阅读量115

点赞数

分类专栏： R

本文链接：https://blog.csdn.net/weixin_51674826/article/details/116298273

版权

R 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

基本操作

1.str_length() :get the number of characters of all the elements
2.str_c()  组合
例：
str_c("x","y",sep=",") 得到x,y
str_c(c("x","y"),collapse=",") 得到x,y
3.str_to_upper()  str_to_lower()  变大写小写
4. str_sub() 截取
例：x <- c("apple")
str_sub(x,1,3)  得app
str_sub(x,-3,-1) 得ple  #负号表示从后往前数的位置

匹配

str_view( ) 匹配第一个
str_view_all( ) 匹配所有
形为：str_view(_,"")
str_subset():form a vector that contains only the elements that match
#特殊字符，用\来抓，如：\^,\",\\
#add match = T or match = F to show only the elements that match or do not match

几种用法：
1.  ^始$终   例：  ^...$ ,^an,ppl$
2. &且|或    例：gr(a|e)y
3. \\d：任意数字 \\d{3}:任意3个数字  \\s：空格
4. [abc]：a或b或c  [^abc]：any character except a,b or c 
  [a-z]:any letter from a to z
  [A-Za-z]:any letter
  #This is only one character 
5. Times of Repetition(!targeted to one character before)
?:0 or 1
+:no less than 1
*:any times
{n}:n
{n,m}:n to m
{n,}:at least n
{,m}:no more than m
#Usually R matches as much as possible.Add a ?(example:P{2,4}?)to let it match as little as possible.
6.  Backreference
(.)(.)\\2\\1  match XYYX
(..)\\1        match X1X2X1X2
#It is not necessarily (.).It can be some other characters inside.

检测

str_detect():return to a logical value showing whether the normal expression can match the string
str_count():count the number of times that it matches
# words[str_detect(words,"  ")]  can show the result

##a special way to match by several more steps
#Find all words that start with a vowel and end with a consonant.
str_subset(words, "^[aeiou].*[^aeiou]$") %>% head()
#> [1] "about"   "accept"  "account" "across"  "act"     "actual"
words[str_detect(words, "^[aeiou]") & str_detect(words, "[^aeiou]$")] %>% head()
#> [1] "about"   "accept"  "account" "across"  "act"     "actual"

get all the variables that ends with “time” in flights:

end_time <- names(flights) %>%
  str_subset(".*time$")
flights %>%
  select(end_time)

提取

str_extract()::extract the first match
str_extract_all():extract all the matches 
#Add simplify=T to form a matrix

A very awesome way of grouped matching:

#Example:
tibble(st=str_subset(sentences,"\\s([^ ]+)\\'([^ ]+)")) %>%
extract(st,c("before","after"),"\\s([^ ]+)\\'([^ ]+)",remove = F)

#remove=F can leave the original strings
#I can get a matrix showing the group.Here ( ) marks the group.
#Usually ([^ ]+) gets the one that matches.

Replacing matches

x <- c("apple", "pear", "banana")
str_replace(x, "[aeiou]", "-")
str_replace_all(x, "[aeiou]", "-")
[1] "-pple"  "p-ar"   "b-nana"
[2] "-ppl-"  "p--r"   "b-n-n-"
#replace before with after

x <- c("1 house", "2 cars", "3 people")
str_replace_all(x, c("1" = "one", "2" = "two", "3" = "three"))
[1] "one house"    "two cars"     "three people"
#use a vector to replace matches

sentences %>% 
  str_replace("([^ ]+) ([^ ]+) ([^ ]+)", "\\1 \\3 \\2")
  #flip the order

Splitting

str_split(sentences," ")  #split the vector by space 
#add simplify=T to get a matrix
str_split(sentences,boundary("word"),simplify=T) %>% tibble() #a very good way of splitting

Split up a string like “apples, pears, and bananas” into individual components.

str_split("apples, pears, and bananas",", (and )?")

向爱我

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Regular Expression

基本操作1.str_length() 看长度（含空格）2.str_c() 组合例：str_c("x","y",sep=",") 得到x,ystr_c(c("x","y"),collapse=",") 得到x,y3.str_to_upper() str_to_lower() 变大写小写4. str_sub() 截取例：x <- c("apple")str_sub(x,1,3) 得appstr_sub(x,-3,-1) 得ple #负号表示从后往前数的位置匹配str_
复制链接

扫一扫