正则表达式
regular expressions
什么是正则表达式?
一系列 character and meta character 组成 search strings 用来匹配字符。
^a 以a开头
a$ 以a结尾
. any character
\\s 空格
[0-9]+ 数字0-9, 至少1次(+)
([0-9]+) 括号的作用:
重复
?至多1次
+ 至少1次
* 任意次数
查找
grepl 返回逻辑值, TRUE, FALSE
grep 返回 TRUE 的位置
grepl(pattern, x)
grep(pattern, x)
替换
sub 只替换第一个匹配的字符
gsub 替换所有匹配的字符
sub(pattern, replacement, x)
gsub(pattern, replacement, x)
emails <- c("john.doe@ivyleague.edu", "education@world.gov", "dalai.lama@peace.org",
"invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")
grepl("edu", emails)
# [1] TRUE TRUE FALSE TRUE TRUE FALSE
grep("edu", emails)
# [1] 1 2 4 5
hits <- grep("edu", emails)
emails[hits]
# [1] "john.doe@ivyleague.edu" "education@world.gov"
# [3] "invalid.edu" "quant@bigdatacollege.edu"
reference
https://campus.datacamp.com/courses/intermediate-r/chapter-5-utilities?ex=7