dplyr是HW的数据处理的利器
dplyr的介绍dplyr
备忘:
distinct去重
distinct(select(flights, tailnum))
#> Source: local data frame [4,044 x 1]
#>
#> tailnum
#> (chr)
#> 1 N14228
#> 2 N24211
#> 3 N619AA
#> 4 N804JB
#> .. ...
sample_n; sample_frac抽样
sample_n(flights, 10)
#> Source: local data frame [10 x 16]
#>
#> year month day dep_time dep_delay arr_time arr_delay carrier tailnum
#> (int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr)
#> 1 2013 6 12 1428 -7 1733 -17 DL N370NW
#> 2 2013 3 23 600 -7 955 23 UA N510UA
#> 3 2013 3 29 1814 14 1920 3 B6 N238JB
#> 4 2013 2 25 1957 -10 2156 -10 EV N19966
#> .. ... ... ... ... ... ... ... ... ...
#> Variables not shown: flight (int), origin (chr), dest (chr), air_time
#> (dbl), distance (dbl), hour (dbl), minute (dbl)
sample_frac(flights, 0.01)
#> Source: local data frame [3,368 x 16]
#>
#> year month day dep_time dep_delay arr_time arr_delay carrier tailnum
#> (int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr)
#> 1 2013 5 7 1156 -4 1321 -17 UA N432UA
#> 2 2013 4 18 1543 -5 1755 -25 DL N369NB
#> 3 2013 3 26 1408 -7 1623 -8 DL N344NB
#> 4 2013 1 24 2001 66 2211 71 MQ N526MQ
#> .. ... ... ... ... ... ... ... ... ...
#> Variables not shown: flight (int), origin (chr), dest (chr), air_time
#> (dbl), distance (dbl), hour (dbl), minute (dbl)
top_n
排名前n的
lag
实现diff类的功能
tally
summarise(n())