R语言学习之数据的清理和转化

处理字符串

grep grepl 和regexpr函数都能找到与模式相匹配的字符串 sub 和 gsub函数能替换匹配的字符串

加载strngr包,fixed里面为要匹配的字符串 返回匹配的字符串序列

> library(stringr)
> multiple <- str_detect(english_monarchs$domain,fixed(","))
> english_monarchs[multiple,c("name","domain")]
                                        name                    domain
17                                      Offa       East Anglia, Mercia
18                                      Offa East Anglia, Kent, Mercia
19                         Offa and Ecgfrith East Anglia, Kent, Mercia
20                                  Ecgfrith East Anglia, Kent, Mercia
22                            C<U+009C>nwulf East Anglia, Kent, Mercia
23               C<U+009C>nwulf and Cynehelm East Anglia, Kent, Mercia
24                            C<U+009C>nwulf East Anglia, Kent, Mercia
25                                  Ceolwulf East Anglia, Kent, Mercia
26                                 Beornwulf       East Anglia, Mercia
82             Ecgbehrt and <U+00C6>thelwulf              Kent, Wessex
83             Ecgbehrt and <U+00C6>thelwulf      Kent, Mercia, Wessex
84             Ecgbehrt and <U+00C6>thelwulf              Kent, Wessex
85    <U+00C6>thelwulf and <U+00C6>eelstan I              Kent, Wessex
86                          <U+00C6>thelwulf              Kent, Wessex
87 <U+00C6>thelwulf and <U+00C6>eelberht III              Kent, Wessex
88                      <U+00C6>eelberht III              Kent, Wessex
89                         <U+00C6>thelred I              Kent, Wessex
95                                     Oswiu       Mercia, Northumbria

使用正则表达式来匹配多个要匹配的字符串,这是来匹配逗号和and

> ruler <- str_detect(english_monarchs$name,",|and")
> english_monarchs[ruler & !is.na(ruler)]
把name一列拆分掉,则可以使用str_splist函数

> indival <- str_split(english_monarchs$name,",|and")
> head(indival[sapply(indival,length)>1])
[[1]]
[1] "Sigeberht " " Ecgric"   

[[2]]
[1] "Hun"      " Beonna " " Alberht"

[[3]]
[1] "Offa "     " Ecgfrith"

[[4]]
[1] "C\u009cnwulf " " Cynehelm"    

[[5]]
[1] "Sighere " " Sebbi"  

[[6]]
[1] "Sigeheard " " Swaefred" 

st_count是用来统计有多少个字符串

> str_count(english_monarchs$name,th)

str_replace函数来代替字符串中的某一个

ignore.case来忽略某一个字符或字符串


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值