i2ds——tidyverse笔记

这篇博客探讨了tidyverse包在R语言中处理数据的重要性和使用方法,包括将数据转换为tidy格式,使用dplyr进行数据框操作如添加列、子集选择、排序,通过管道操作符简化流程,以及使用group_by和summarize进行数据概括。此外,还介绍了tibbles的特性,如其更友好的显示方式和对复杂条目的支持,以及purrr包中的函数如何增强数据处理能力。
摘要由CSDN通过智能技术生成

Tidy data

#>       country year fertility
#> 1     Germany 1960      2.41
#> 2 South Korea 1960      6.16
#> 3     Germany 1961      2.44
#> 4 South Korea 1961      5.99
#> 5     Germany 1962      2.47
#> 6 South Korea 1962      5.79

This is a tidy dataset because each row presents one observation with the three variables being country, year, and fertility rate.

#>       country 1960 1961 1962
#> 1     Germany 2.41 2.44 2.47
#> 2 South Korea 6.16 5.99 5.79

The same information is provided, but there are two important differences in the format: 1) each row includes several observations and 2) one of the variables, year, is stored in the header.

For the tidyverse packages to be optimally used, data need to be reshaped into tidy format.

Exercises

  1. Examine the built-in dataset co2, which is not tidy: to be tidy we would have to wrangle it to have three columns (year, month and value), then each co2 observation would have a row.

  2. Examine the built-in dataset ChickWeight, which is tidy: each observation (a weight) is represented by one row. The chick from which this measurement came is one of the variables.

  3. Examine the built-in dataset BOD, which is tidy: each row is an observation with two values (time and demand).

  4. Which of the following built-in datasets is tidy (you can pick more than one):

a. BJsales
b. EuStockMarkets
c. DNase
d. Formaldehyde
e. Orange
f. UCBAdmissions

b-f

Manipulating data frames

For instance, to change the data table by adding a new column, we use mutate. To filter the data table to a subset of rows, we use filter. Finally, to subset the data by selecting specific columns, we use select.

Adding a column with mutate

The function mutate takes the data frame as a first argument and the name and values of the variable as a second argument using the convention name = values.

library(dslabs)
data("murders")
murders <- mutate(murders, rate = total / population * 100000)

Subsetting with filter

To do this we use the filter function, which takes the data table as the first argument and then the conditional statement as the second.

filter(murders, rate <= 0.71)

Selecting columns with select

If we want to view just a few columns, we can use the dplyr select function.

new_table <- select(murders, state, region, rate)
filter(new_table, rate <= 0.71)

Unlike select which is for columns, filter is for rows.

Exercises

  • Suppose you want to live in the Northeast or West and want the murder rate to be less than 1. We want to see the data for the states satisfying these options. Note that you can use logical operators with filter. Here is an example in which we filter to keep only small states in the Northeast region.
filter(murders, population < 5000000 & region == "Northeast")

Make sure murders has been defined with rate and rank and still has all states. Create a table called my_states that contains rows for states satisfying both the conditions: it is in the Northeast or West and the murder rate is less than 1. Use select to show only the state name, the rate, and the rank.

The pipe: %>%

original data → \rightarrow select → \rightarrow filter
In general, the pipe sends the result of the left side of the pipe to be the first argument of the function on the right side of the pipe. So we can define other arguments as if the first argument is already defined

Summarizing data

summarize

The summarize function in dplyr provides a way to compute summary statistics with intuitive and readable code.

library(dplyr)
library(dslabs)
data(heights)
s <- heights %>% 
  filter(sex == "Female") %>%
  summarize(average = mean(height), standard_deviation = sd(height))
s
#>   average standard_deviation
#> 1    64.9               3.76
us_murder_rate <- murders %>%
  summarize(rate = sum(total) / sum
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值