学习目标
我们所采用的学习内容来自B站的Lizongzhang老师的R语言的学习分享
今天学习的主要内容是关于
dplyr的函数使用
学习内容
下面是学习的主要内容
dplyr函数的分类
对于不同的管理吗dplyr
给出了不同的管理办法
- columns:
– select() 对数据进行挑选
– rename() 改名
– mutate() 改变列向量的值并创造新的列
– relocate() 改变列的顺序 - Rows:
– filter()
–slice()
–arrange() - 行组
–summarise()
下面是使用R语言进行数据分析
采用到的数据是星际大战当中的数据集然后进行一系列的具体操作
首先是对数据预处理的提取和选择所需要的变量
#R的函数
#dataset:starwars
library(dplyr)
data()#存在一个自带的数据集
#select()
#以下函数可以设置选择变量的条件
#contians()
#starts_with()
#ends_with()
#last_col
#列出数据框所有变量名
names(starwars)
#选择需要的变量
#ctrl+shift+m可以写出 %>%
data1<-starwars %>% select(name,mass,
hair_color,
height,skin_color)
data1
data2<-starwars %>% select(name:mass)
data2
data3<-starwars %>% select(1:5)
data3
data4<-starwars %>%
select(1:4,6:8)
data4
#contains()的使用
data5<-starwars %>%
select(name,sex,contains("gender"))
#将包含有"gender"的都提取出来
data5
#对变量进行重命名
data6<- starwars %>%
select(name,height,weight=mass)
data6
这里data5和data6的运行结果图像如下:
data5
# A tibble: 87 × 3
name sex gender
<chr> <chr> <chr>
1 Luke Skywalker male masculine
2 C-3PO none masculine
3 R2-D2 none masculine
4 Darth Vader male masculine
5 Leia Organa female feminine
6 Owen Lars male masculine
7 Beru Whitesun lars female feminine
8 R5-D4 none masculine
9 Biggs Darklighter male masculine
10 Obi-Wan Kenobi male masculine
# … with 77 more rows
# ℹ Use `print(n = ...)` to see more rows
data6
# A tibble: 87 × 3
name height weight
<chr> <int> <dbl>
1 Luke Skywalker 172 77
2 C-3PO 167 75
3 R2-D2 96 32
4 Darth Vader 202 136
5 Leia Organa 150 49
6 Owen Lars 178 120
7 Beru Whitesun lars 165 75
8 R5-D4 97 32
9 Biggs Darklighter 183 84
10 Obi-Wan Kenobi 182 77
# … with 77 more rows
# ℹ Use `print(n = ...)` to see more rows
使用条件判断语句
#if_else
#对变量值进行条件替换
library(tidyr)
data7<-data1 %>%
drop_na(mass) %>%
mutate(weight_size=if_else(mass>100,
"large",
"small"))
data7
data7的返回如下
A tibble: 59 × 6
name mass hair_…¹ height skin_…² weigh…³
<chr> <dbl> <chr> <int> <chr> <chr>
1 Luke Skywalk… 77 blond 172 fair small
2 C-3PO 75 NA 167 gold small
3 R2-D2 32 NA 96 white,… small
4 Darth Vader 136 none 202 white large
5 Leia Organa 49 brown 150 light small
6 Owen Lars 120 brown,… 178 light large
7 Beru Whitesu… 75 brown 165 light small
8 R5-D4 32 NA 97 white,… small
9 Biggs Darkli… 84 black 183 light small
10 Obi-Wan Keno… 77 auburn… 182 fair small
# … with 49 more rows, and abbreviated variable
# names ¹hair_color, ²skin_color, ³weight_size
# ℹ Use `print(n = ...)` to see more rows
>
在选取的过程当中调用if_else语句,筛选处理若mass>100
则判断为large
,反之则认定为small
data7a<-data1 %>%
mutate(weigth_size=if_else(mass>100,"large",
"small"),
BMI=mass/(height/100)^2)
data7a
data7a的返回值如下
A tibble: 87 × 7
name mass hair_…¹ height skin_…² weigt…³ BMI
<chr> <dbl> <chr> <int> <chr> <chr> <dbl>
1 Luke S… 77 blond 172 fair small 26.0
2 C-3PO 75 NA 167 gold small 26.9
3 R2-D2 32 NA 96 white,… small 34.7
4 Darth … 136 none 202 white large 33.3
5 Leia O… 49 brown 150 light small 21.8
6 Owen L… 120 brown,… 178 light large 37.9
7 Beru W… 75 brown 165 light small 27.5
8 R5-D4 32 NA 97 white,… small 34.0
9 Biggs … 84 black 183 light small 25.1
10 Obi-Wa… 77 auburn… 182 fair small 23.2
# … with 77 more rows, and abbreviated variable
# names ¹hair_color, ²skin_color, ³weigth_size
# ℹ Use `print(n = ...)` to see more rows
#recode()
#recode_factor()创建一个有序因子
data8<-data7 %>%
mutate(weigth_size=if_else(mass>100,
"large",
"small"))
data8
data8的返回值如下
A tibble: 59 × 7
name mass hair_…¹ height skin_…² weigh…³ weigt…⁴
<chr> <dbl> <chr> <int> <chr> <chr> <chr>
1 Luke… 77 blond 172 fair small small
2 C-3PO 75 NA 167 gold small small
3 R2-D2 32 NA 96 white,… small small
4 Dart… 136 none 202 white large large
5 Leia… 49 brown 150 light small small
6 Owen… 120 brown,… 178 light large large
7 Beru… 75 brown 165 light small small
8 R5-D4 32 NA 97 white,… small small
9 Bigg… 84 black 183 light small small
10 Obi-… 77 auburn… 182 fair small small
使用fitter进行个案筛选
在操作这里是要主要前面是否运行了library(dylyr)
这一条命令
#filter--筛选个案
#library(dplyr)这条命令看前面的程序是否有运行
unique(starwars$species)
#进行个案的提取
data_9<-starwars %>%
select(name,height,mass,sex,species)%>%
filter(species=="Droid") %>%
arrange(height)
data_9
data_9表示的是提取name,height,mass,sex,species这些特征当中,species为"Droid"的值并且按照height进行升序排序
data_9的结果如下
# A tibble: 6 × 5
name height mass sex species
<chr> <int> <dbl> <chr> <chr>
1 R2-D2 96 32 none Droid
2 R4-P17 96 NA none Droid
3 R5-D4 97 32 none Droid
4 C-3PO 167 75 none Droid
5 IG-88 200 140 none Droid
6 BB8 NA NA none Droid
data_10<-starwars %>%
select(name,height,mass,sex,species)%>%
filter(species=="Human") %>%
arrange(-height)#降序
#降序的另一种操作方法:arrange(des(height))
data_10
这里和data_9有着大同小异的做法,只不过是按照降序排序
返回值如下
A tibble: 35 × 5
name height mass sex species
<chr> <int> <dbl> <chr> <chr>
1 Darth Vader 202 136 male Human
2 Qui-Gon Jinn 193 89 male Human
3 Dooku 193 80 male Human
4 Bail Prestor Organa 191 NA male Human
5 Anakin Skywalker 188 84 male Human
6 Mace Windu 188 84 male Human
7 Raymus Antilles 188 79 male Human
8 Gregar Typho 185 85 male Human
9 Biggs Darklighter 183 84 male Human
10 Boba Fett 183 78.2 male Human
#%in%表示的意思是属于
data_11<-starwars %>%
select(name,height,mass,sex,species)%>%
filter((species%in%c("Human","Droid"))
& height<250)%>%
arrange(-height)
data_11
使用%in%进行再一次的筛选
#filter(!is.na())查找缺失值
data_12<-starwars %>%
select(name,height,mass,sex,species)%>%
filter(!is.na(height))%>%
arrange(-height)
data_12
is.na(starwars$height)
is.na(x)
返回一个与x等长的逻辑向量,并且由相应位置的元素是否是NA来决定这个逻辑向量相应位置的元素是TRUE还是FALSE。
data_12的返回值如下
data_12
# A tibble: 81 × 5
name height mass sex species
<chr> <int> <dbl> <chr> <chr>
1 Yarael Poof 264 NA male Quermian
2 Tarfful 234 136 male Wookiee
3 Lama Su 229 88 male Kaminoan
4 Chewbacca 228 112 male Wookiee
5 Roos Tarpals 224 82 male Gungan
6 Grievous 216 159 male Kaleesh
7 Taun We 213 NA female Kaminoan
8 Rugor Nass 206 NA male Gungan
9 Tion Medon 206 80 male Pau'an
10 Darth Vader 202 136 male Human
# … with 71 more rows
# ℹ Use `print(n = ...)` to see more rows
其中is.na(starwars$height)
如下
[9] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
[33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[41] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[57] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[81] FALSE TRUE TRUE TRUE TRUE TRUE FALSE
内容小结
这一部分的内容是对数据集进行一个简单的操作,大家要在实际应用当中看自己应该如何进行使用