R语言之dplyr函数的使用

最新推荐文章于 2024-07-29 15:11:22 发布

无你想你

最新推荐文章于 2024-07-29 15:11:22 发布

阅读量929

点赞数 1

分类专栏： R语言学习文章标签： r语言开发语言

本文链接：https://blog.csdn.net/qq_62904883/article/details/126208556

版权

R语言学习专栏收录该内容

11 篇文章 1 订阅

订阅专栏

文章目录

学习目标
学习内容
内容小结

学习目标

我们所采用的学习内容来自B站的Lizongzhang老师的R语言的学习分享
今天学习的主要内容是关于
dplyr的函数使用

学习内容

下面是学习的主要内容

dplyr函数的分类

对于不同的管理吗dplyr给出了不同的管理办法

columns:
– select() 对数据进行挑选
– rename() 改名
– mutate() 改变列向量的值并创造新的列
– relocate() 改变列的顺序
Rows:
– filter()
–slice()
–arrange()
行组
–summarise()

下面是使用R语言进行数据分析
采用到的数据是星际大战当中的数据集然后进行一系列的具体操作
首先是对数据预处理的提取和选择所需要的变量

#R的函数
#dataset:starwars
library(dplyr)
data()#存在一个自带的数据集
#select()

#以下函数可以设置选择变量的条件
#contians()
#starts_with()
#ends_with()
#last_col
#列出数据框所有变量名
names(starwars)
#选择需要的变量
#ctrl+shift+m可以写出 %>% 
data1<-starwars %>% select(name,mass,
                           hair_color,
                           height,skin_color)
data1

data2<-starwars %>% select(name:mass)
data2

data3<-starwars %>% select(1:5)
data3

data4<-starwars %>% 
  select(1:4,6:8)
data4
#contains()的使用
data5<-starwars %>% 
  select(name,sex,contains("gender"))
#将包含有"gender"的都提取出来
data5
#对变量进行重命名
data6<- starwars %>% 
  select(name,height,weight=mass)
data6

这里data5和data6的运行结果图像如下:

data5
# A tibble: 87 × 3
   name               sex    gender   
   <chr>              <chr>  <chr>    
 1 Luke Skywalker     male   masculine
 2 C-3PO              none   masculine
 3 R2-D2              none   masculine
 4 Darth Vader        male   masculine
 5 Leia Organa        female feminine 
 6 Owen Lars          male   masculine
 7 Beru Whitesun lars female feminine 
 8 R5-D4              none   masculine
 9 Biggs Darklighter  male   masculine
10 Obi-Wan Kenobi     male   masculine
# … with 77 more rows
# ℹ Use `print(n = ...)` to see more rows

data6
# A tibble: 87 × 3
   name               height weight
   <chr>               <int>  <dbl>
 1 Luke Skywalker        172     77
 2 C-3PO                 167     75
 3 R2-D2                  96     32
 4 Darth Vader           202    136
 5 Leia Organa           150     49
 6 Owen Lars             178    120
 7 Beru Whitesun lars    165     75
 8 R5-D4                  97     32
 9 Biggs Darklighter     183     84
10 Obi-Wan Kenobi        182     77
# … with 77 more rows
# ℹ Use `print(n = ...)` to see more rows

使用条件判断语句

#if_else
#对变量值进行条件替换
library(tidyr)
data7<-data1 %>% 
  drop_na(mass) %>% 
  mutate(weight_size=if_else(mass>100,
                             "large",
                             "small"))
data7

data7的返回如下

 A tibble: 59 × 6
   name           mass hair_…¹ height skin_…² weigh…³
   <chr>         <dbl> <chr>    <int> <chr>   <chr>  
 1 Luke Skywalk…    77 blond      172 fair    small  
 2 C-3PO            75 NA         167 gold    small  
 3 R2-D2            32 NA          96 white,… small  
 4 Darth Vader     136 none       202 white   large  
 5 Leia Organa      49 brown      150 light   small  
 6 Owen Lars       120 brown,…    178 light   large  
 7 Beru Whitesu…    75 brown      165 light   small  
 8 R5-D4            32 NA          97 white,… small  
 9 Biggs Darkli…    84 black      183 light   small  
10 Obi-Wan Keno…    77 auburn…    182 fair    small  
# … with 49 more rows, and abbreviated variable
#   names ¹hair_color, ²skin_color, ³weight_size
# ℹ Use `print(n = ...)` to see more rows
>

在选取的过程当中调用if_else语句,筛选处理若mass>100则判断为large,反之则认定为small

data7a<-data1 %>% 
  mutate(weigth_size=if_else(mass>100,"large",
                             "small"),
         BMI=mass/(height/100)^2)
data7a

data7a的返回值如下

 A tibble: 87 × 7
   name     mass hair_…¹ height skin_…² weigt…³   BMI
   <chr>   <dbl> <chr>    <int> <chr>   <chr>   <dbl>
 1 Luke S…    77 blond      172 fair    small    26.0
 2 C-3PO      75 NA         167 gold    small    26.9
 3 R2-D2      32 NA          96 white,… small    34.7
 4 Darth …   136 none       202 white   large    33.3
 5 Leia O…    49 brown      150 light   small    21.8
 6 Owen L…   120 brown,…    178 light   large    37.9
 7 Beru W…    75 brown      165 light   small    27.5
 8 R5-D4      32 NA          97 white,… small    34.0
 9 Biggs …    84 black      183 light   small    25.1
10 Obi-Wa…    77 auburn…    182 fair    small    23.2
# … with 77 more rows, and abbreviated variable
#   names ¹hair_color, ²skin_color, ³weigth_size
# ℹ Use `print(n = ...)` to see more rows

#recode()
#recode_factor()创建一个有序因子
data8<-data7 %>% 
  mutate(weigth_size=if_else(mass>100,
                             "large",
                             "small"))
data8

data8的返回值如下

 A tibble: 59 × 7
   name   mass hair_…¹ height skin_…² weigh…³ weigt…⁴
   <chr> <dbl> <chr>    <int> <chr>   <chr>   <chr>  
 1 Luke…    77 blond      172 fair    small   small  
 2 C-3PO    75 NA         167 gold    small   small  
 3 R2-D2    32 NA          96 white,… small   small  
 4 Dart…   136 none       202 white   large   large  
 5 Leia…    49 brown      150 light   small   small  
 6 Owen…   120 brown,…    178 light   large   large  
 7 Beru…    75 brown      165 light   small   small  
 8 R5-D4    32 NA          97 white,… small   small  
 9 Bigg…    84 black      183 light   small   small  
10 Obi-…    77 auburn…    182 fair    small   small

使用fitter进行个案筛选

在操作这里是要主要前面是否运行了library(dylyr)这一条命令

#filter--筛选个案
#library(dplyr)这条命令看前面的程序是否有运行
unique(starwars$species)

#进行个案的提取
data_9<-starwars %>% 
  select(name,height,mass,sex,species)%>%
  filter(species=="Droid") %>% 
  arrange(height)
data_9

data_9表示的是提取name,height,mass,sex,species这些特征当中,species为"Droid"的值并且按照height进行升序排序
data_9的结果如下

# A tibble: 6 × 5
  name   height  mass sex   species
  <chr>   <int> <dbl> <chr> <chr>  
1 R2-D2      96    32 none  Droid  
2 R4-P17     96    NA none  Droid  
3 R5-D4      97    32 none  Droid  
4 C-3PO     167    75 none  Droid  
5 IG-88     200   140 none  Droid  
6 BB8        NA    NA none  Droid

data_10<-starwars %>% 
  select(name,height,mass,sex,species)%>%
  filter(species=="Human") %>% 
  arrange(-height)#降序
#降序的另一种操作方法:arrange(des(height))
data_10

这里和data_9有着大同小异的做法,只不过是按照降序排序
返回值如下

A tibble: 35 × 5
   name                height  mass sex   species
   <chr>                <int> <dbl> <chr> <chr>  
 1 Darth Vader            202 136   male  Human  
 2 Qui-Gon Jinn           193  89   male  Human  
 3 Dooku                  193  80   male  Human  
 4 Bail Prestor Organa    191  NA   male  Human  
 5 Anakin Skywalker       188  84   male  Human  
 6 Mace Windu             188  84   male  Human  
 7 Raymus Antilles        188  79   male  Human  
 8 Gregar Typho           185  85   male  Human  
 9 Biggs Darklighter      183  84   male  Human  
10 Boba Fett              183  78.2 male  Human

#%in%表示的意思是属于
data_11<-starwars %>% 
  select(name,height,mass,sex,species)%>%
  filter((species%in%c("Human","Droid"))
         & height<250)%>% 
  arrange(-height)
data_11

使用%in%进行再一次的筛选

#filter(!is.na())查找缺失值
data_12<-starwars %>% 
  select(name,height,mass,sex,species)%>%
  filter(!is.na(height))%>% 
  arrange(-height)
data_12
is.na(starwars$height)

is.na(x)返回一个与x等长的逻辑向量，并且由相应位置的元素是否是NA来决定这个逻辑向量相应位置的元素是TRUE还是FALSE。
data_12的返回值如下

 data_12
# A tibble: 81 × 5
   name         height  mass sex    species 
   <chr>         <int> <dbl> <chr>  <chr>   
 1 Yarael Poof     264    NA male   Quermian
 2 Tarfful         234   136 male   Wookiee 
 3 Lama Su         229    88 male   Kaminoan
 4 Chewbacca       228   112 male   Wookiee 
 5 Roos Tarpals    224    82 male   Gungan  
 6 Grievous        216   159 male   Kaleesh 
 7 Taun We         213    NA female Kaminoan
 8 Rugor Nass      206    NA male   Gungan  
 9 Tion Medon      206    80 male   Pau'an  
10 Darth Vader     202   136 male   Human   
# … with 71 more rows
# ℹ Use `print(n = ...)` to see more rows

其中is.na(starwars$height)如下

 [9] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
[33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[41] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[57] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[81] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE