####今天学习了一个读取数据速度很快的包→readr包。
###读取文件函数
#readr包
read_delim(file, delim, quote = "\"", escape_backslash = FALSE,
escape_double = TRUE, col_names = TRUE, col_types = NULL,
locale = default_locale(), na = c("", "NA"), quoted_na = TRUE,
comment = "", trim_ws = FALSE, skip = 0, n_max = Inf,
guess_max = min(1000, n_max), progress = show_progress(),
skip_empty_rows = TRUE)
read_csv(file, col_names = TRUE, col_types = NULL,
locale = default_locale(), na = c("", "NA"), quoted_na = TRUE,
quote = "\"", comment = "", trim_ws = TRUE, skip = 0,
n_max = Inf, guess_max = min(1000, n_max),
progress = show_progress(), skip_empty_rows = TRUE)
read_excel(path, sheet = NULL, range = NULL, col_names = TRUE,
col_types = NULL, na = "", trim_ws = TRUE, skip = 0,
n_max = Inf, guess_max = min(1000, n_max),
progress = readxl_progress(), .name_repair = "unique")
read_xls(path, sheet = NULL, range = NULL, col_names = TRUE,
col_types = NULL, na = "", trim_ws = TRUE, skip = 0,
n_max = Inf, guess_max = min(1000, n_max),
progress = readxl_progress(), .name_repair = "unique")
###参数
参数 | 描述 |
file | 字符串型或向量型参数;表示数据名称,可以为数据文件的路径、链接或者数据本身(至少包括一行数据) |
delim | 字符型参数;指定分隔符 |
escape_backslash | 逻辑型参数;是否使用反斜杠转义特殊字符 |
escape_double | 逻辑型参数;是否通过双写来转义引号;若为 TRUE,"""”表示“” |
col_names | 逻辑型或字符向量型参数;若为 TRUE,输人数据的第一行为列名;若为FALSE,自动分配列名为 X1,X2,X3,.. ;若为字符向量,则以该向量作为列名 |
col_types | NULL,col()指定参数或字符串型参数;若为 NULL,每一列的参数类型由前1000 行的数据类型决定;若由 col()指定,每一列必须包含一个列类型指定字段;若为字符串类型,使用规定的字符表示数据类型(c=character,i= integer, n = number, d = double, ..) |
locale | 字符串型参数;用于控制数据地区来源,便于因地制宜地编码 |
na | 字符串型参数;用于处理缺失值 |
quoted_na | 逻辑型参数;若为TRUE,则把缺失值作为缺失值处理;否则当作字符串处理 |
quote | 字符型参数用于引用字符串的单个字符 |
comment | 字符串型参数;用于指定注释字符,位于注释字符后的字符都会被忽略 |
trim_ws | 逻辑型参数;指定是否需要去除首尾空格 |
skip | 整型参数;指定被跳过的行数 |
n_max | 整型参数;指定读取的最大记录数 |
guess_max | 整型参数;用于猜测列数据类型的最大记录数 |
progress | 逻辑参数;指定是否显示进度条 |
path | 字符串参数; 只针对read_excel函数 |
sheet | 字符串或整形参数; 只针对read_excel函数,指明表名或位置 |
range | 字符串参数;只针对read_excel函数,指明需要写入数据的格子范围 |
### read_csv是平常读取数据最常用的函数
####读取一个完整的数据,第一行视为标题
A <- read_csv("原始数据.csv")
Rows: 16136 Columns: 28
── Column specification ──────────────────────────────────────────────────────────────
Delimiter: ","
chr (12): Year, Gender, Race, edu, recreational.activity, smoke, DM, Hypertension,...
dbl (16): seqn, weight, newweight, sdmvpsu, sdmvstra, Age, PIR, VAI, BMI, Weight, ...
#####数据可能没有列名,可以不将第一行视为标题
B <- read_csv("原始数据.csv", col_names = FALSE)
Rows: 16137 Columns: 28
── Column specification ──────────────────────────────────────────────────────────────
Delimiter: ","
chr (28): X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, X16, X...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#####写出文件函数
write_delim(x, path, delim = " ", na = "NA", append = FALSE,
col_names = !append, quote_escape = "double")
write_csv(x, path, na = "NA", append = FALSE, col_names = !append,
quote_escape = "double")
write_excel_csv(x, path, na = "NA", append = FALSE,
col_names = !append, delim = ",", quote_escape = "double")
###参数
参数 | 描述 |
x | data. frame 对象 |
path | 字符串型参数;表示读入的文件的路径 |
delim | 字符型参数:只针对 write_delim 函数,表示数据分隔符 |
na | 字符串型参数;用于处理缺失值 |
append | 逻辑型参数;若为 FALSE,则覆盖已有文件;否则将数据添加到已有文件中。 |
col_names | 逻辑型参数;表示是否需要在文件头处读入列名 |
C <- data.frame(x = c(1,2,3,4,5),
y = c(9,8,7,6,NA))
C
x y
1 1 9
2 2 8
3 3 7
4 4 6
5 5 NA
##下面三个例子都是保存为CSV格式
##将缺失值变成符号 ‘-’
write_csv(C, "C_1.csv", na = "-")
##col_names默认为TRUE,不读取列名将其改为col_names=F
write_csv(C, "C_2.csv", col_names = F)
write_delim(C, "C_3.csv", delim = ",")
C1 C2 C3