b站课程视频链接:
https://www.bilibili.com/video/BV19x411X7C6?p=1
腾讯课堂(最新,但是要花钱,我花99😢😢元买了,感觉讲的没问题,就是知识点结构有点乱,有点废话):
https://ke.qq.com/course/3707827#term_id=103855009
本笔记前面的笔记参照b站视频,【后面的画图】参考了付费视频
笔记顺序做了些调整【个人感觉逻辑顺畅】,并删掉一些不重要的内容,以及补充了个人理解
系列笔记目录【持续更新】:https://blog.csdn.net/weixin_42214698/category_11393896.html
数据转换之tidyr包
tidy (adj)整洁的
install.packages("tidyr")
library(tidyr)
1. gather与spread函数
gather()将宽数据转化成长数据
spread()将长数据转化成宽数据
相当于reshape2包下的melt()和cast(),只不过gather与spread更容易用
gather( )例子:
> tdata <- mtcars[1:10,1:3]
> tdata
mpg cyl disp
Mazda RX4 21.0 6 160.0
Mazda RX4 Wag 21.0 6 160.0
Datsun 710 22.8 4 108.0
Hornet 4 Drive 21.4 6 258.0
Hornet Sportabout 18.7 8 360.0
Valiant 18.1 6 225.0
Duster 360 14.3 8 360.0
Merc 240D 24.4 4 146.7
Merc 230 22.8 4 140.8
Merc 280 19.2 6 167.6
> tdata <- data.frame(name=rownames(tdata),tdata) #增加了一列name
> tdata
name mpg cyl disp
Mazda RX4 Mazda RX4 21.0 6 160.0
Mazda RX4 Wag Mazda RX4 Wag 21.0 6 160.0
Datsun 710 Datsun 710 22.8 4 108.0
Hornet 4 Drive Hornet 4 Drive 21.4 6 258.0
Hornet Sportabout Hornet Sportabout 18.7 8 360.0
Valiant Valiant 18.1 6 225.0
Duster 360 Duster 360 14.3 8 360.0
Merc 240D Merc 240D 24.4 4 146.7
Merc 230 Merc 230 22.8 4 140.8
Merc 280 Merc 280 19.2 6 167.6
> #宽数据变长数据,类似于melt,tdata是数据框,key为标签,value为对应值
> gather(tdata,key = "Key",value = "Value",cyl,disp,mpg) # 按cyl,disp,mpg的顺序
name Key Value
1 Mazda RX4 cyl 6.0
2 Mazda RX4 Wag cyl 6.0
3 Datsun 710 cyl 4.0
4 Hornet 4 Drive cyl 6.0
5 Hornet Sportabout cyl 8.0
6 Valiant cyl 6.0
7 Duster 360 cyl 8.0
8 Merc 240D cyl 4.0
9 Merc 230 cyl 4.0
10 Merc 280 cyl 6.0
11 Mazda RX4 disp 160.0
12 Mazda RX4 Wag disp 160.0
13 Datsun 710 disp 108.0
14 Hornet 4 Drive disp 258.0
15 Hornet Sportabout disp 360.0
16 Valiant disp 225.0
17 Duster 360 disp 360.0
18 Merc 240D disp 146.7
19 Merc 230 disp 140.8
20 Merc 280 disp 167.6
21 Mazda RX4 mpg 21.0
22 Mazda RX4 Wag mpg 21.0
23 Datsun 710 mpg 22.8
24 Hornet 4 Drive mpg 21.4
25 Hornet Sportabout mpg 18.7
26 Valiant mpg 18.1
27 Duster 360 mpg 14.3
28 Merc 240D mpg 24.4
29 Merc 230 mpg 22.8
30 Merc 280 mpg 19.2
> gather(tdata,key = "Key",value = "Value",cyl:disp) # cyl列 到 disp列之间
name mpg Key Value
1 Mazda RX4 21.0 cyl 6.0
2 Mazda RX4 Wag 21.0 cyl 6.0
3 Datsun 710 22.8 cyl 4.0
4 Hornet 4 Drive 21.4 cyl 6.0
5 Hornet Sportabout 18.7 cyl 8.0
6 Valiant 18.1 cyl 6.0
7 Duster 360 14.3 cyl 8.0
8 Merc 240D 24.4 cyl 4.0
9 Merc 230 22.8 cyl 4.0
10 Merc 280 19.2 cyl 6.0
11 Mazda RX4 21.0 disp 160.0
12 Mazda RX4 Wag 21.0 disp 160.0
13 Datsun 710 22.8 disp 108.0
14 Hornet 4 Drive 21.4 disp 258.0
15 Hornet Sportabout 18.7 disp 360.0
16 Valiant 18.1 disp 225.0
17 Duster 360 14.3 disp 360.0
18 Merc 240D 24.4 disp 146.7
19 Merc 230 22.8 disp 140.8
20 Merc 280 19.2 disp 167.6
> gather(tdata,key = "Key",value = "Value",mpg,cyl,-disp) # 负号代表不要
name disp Key Value
1 Mazda RX4 160.0 mpg 21.0
2 Mazda RX4 Wag 160.0 mpg 21.0
3 Datsun 710 108.0 mpg 22.8
4 Hornet 4 Drive 258.0 mpg 21.4
5 Hornet Sportabout 360.0 mpg 18.7
6 Valiant 225.0 mpg 18.1
7 Duster 360 360.0 mpg 14.3
8 Merc 240D 146.7 mpg 24.4
9 Merc 230 140.8 mpg 22.8
10 Merc 280 167.6 mpg 19.2
11 Mazda RX4 160.0 cyl 6.0
12 Mazda RX4 Wag 160.0 cyl 6.0
13 Datsun 710 108.0 cyl 4.0
14 Hornet 4 Drive 258.0 cyl 6.0
15 Hornet Sportabout 360.0 cyl 8.0
16 Valiant 225.0 cyl 6.0
17 Duster 360 360.0 cyl 8.0
18 Merc 240D 146.7 cyl 4.0
19 Merc 230 140.8 cyl 4.0
20 Merc 280 167.6 cyl 6.0
spread( )例子:
> gdata <- gather(tdata,key = "Key",value = "Value",2:4) # 2:4 即tdata 的mpg、cyl、disp
> gdata
name Key Value
1 Mazda RX4 mpg 21.0
2 Mazda RX4 Wag mpg 21.0
3 Datsun 710 mpg 22.8
4 Hornet 4 Drive mpg 21.4
5 Hornet Sportabout mpg 18.7
6 Valiant mpg 18.1
7 Duster 360 mpg 14.3
8 Merc 240D mpg 24.4
9 Merc 230 mpg 22.8
10 Merc 280 mpg 19.2
11 Mazda RX4 cyl 6.0
12 Mazda RX4 Wag cyl 6.0
13 Datsun 710 cyl 4.0
14 Hornet 4 Drive cyl 6.0
15 Hornet Sportabout cyl 8.0
16 Valiant cyl 6.0
17 Duster 360 cyl 8.0
18 Merc 240D cyl 4.0
19 Merc 230 cyl 4.0
20 Merc 280 cyl 6.0
21 Mazda RX4 disp 160.0
22 Mazda RX4 Wag disp 160.0
23 Datsun 710 disp 108.0
24 Hornet 4 Drive disp 258.0
25 Hornet Sportabout disp 360.0
26 Valiant disp 225.0
27 Duster 360 disp 360.0
28 Merc 240D disp 146.7
29 Merc 230 disp 140.8
30 Merc 280 disp 167.6
> spread(gdata,key = "Key",value = "Value") # 列名自动从小到大排序
name cyl disp mpg
1 Datsun 710 4 108.0 22.8
2 Duster 360 8 360.0 14.3
3 Hornet 4 Drive 6 258.0 21.4
4 Hornet Sportabout 8 360.0 18.7
5 Mazda RX4 6 160.0 21.0
6 Mazda RX4 Wag 6 160.0 21.0
7 Merc 230 4 140.8 22.8
8 Merc 240D 4 146.7 24.4
9 Merc 280 6 167.6 19.2
10 Valiant 6 225.0 18.1
2. separate与unite函数
separate()函数将一列变多列
> df <- data.frame(x = c(NA, "a.b", "a.d", "b.c"))
> df
x
1 <NA>
2 a.b
3 a.d
4 b.c
#将df数据框中的x列,分割为A、B两列,默认识别分隔符为“.”
> separate(df,col=x,into = c("A","B"),seq=".") # seq="."可不写
A B
1 <NA> <NA>
2 a b
3 a d
4 b c
unite()函数将多列变成一列
> df2 <- data.frame(x = c(NA, "a.b-c", "a-d", "b-c"))
> df2
x
1 <NA>
2 a.b-c
3 a-d
4 b-c
> y <- separate(df2,x,into = c("A","B"),sep = "-")
> y
A B
1 <NA> <NA>
2 a.b c
3 a d
4 b c
# 将sep中的A和B列,用连接符“-”连接,组成列AB
> unite(y,col ="AB",A,B,sep="-")
AB
1 NA-NA
2 a.b-c
3 a-d
4 b-c