read.table--R语言

最新推荐文章于 2024-11-16 00:46:22 发布

牵牛花主人

最新推荐文章于 2024-11-16 00:46:22 发布

阅读量1.3w

点赞数 17

分类专栏： R

原文链接：http://127.0.0.1:28183/library/utils/html/read.table.html

版权

R 专栏收录该内容

24 篇文章

订阅专栏

本文介绍R语言中read.table函数的功能及用法，包括如何指定文件路径、处理变量名、设定分隔符等关键参数，帮助读者高效读取各种格式的数据文件。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

函数功能：

Reads a file in table format and creates a data frame from it, 
with cases corresponding to lines and variables to fields 
in the file

读取表格形式的文件，并创建数据框

函数语法：

read.table(file, header = FALSE, sep = "", quote = "\"'",
           dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
           row.names, col.names, as.is = !stringsAsFactors,
           na.strings = "NA", colClasses = NA, nrows = -1,
           skip = 0, check.names = TRUE, fill = !blank.lines.skip,
           strip.white = FALSE, blank.lines.skip = TRUE,
           comment.char = "#",
           allowEscapes = FALSE, flush = FALSE,
           stringsAsFactors = default.stringsAsFactors(),
           fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)

函数参数：

file：
the name of the file which the data are to be read from. 
Each row of the table appears as one line of the file. 
If it does not contain an absolute path, 
the file name is relative to the current working directory, getwd(). 
Tilde-expansion is performed where supported. 
This can be a compressed file (see file).

读取的文件名。表格中的每行数据显示为文件的一行。若没有使用绝对路径，则文件名应该相对于当前工作目录的相对路径(当前工作目录使用getwd()查看)。还可以使用压缩文件。

注意：
1.文件路径为斜杠：/ 或者双斜杠\\
2.文件路径必须加：‘’
3.文件路径必须包含文件类型：.*

Alternatively, file can be a readable text-mode connection 
(which will be opened for reading if necessary, 
and if so closed (and hence destroyed) at the end of the function call). 
(If stdin() is used, the prompts for lines may be somewhat confusing. 
Terminate input with a blank line or an EOF signal, Ctrl-D on Unix and Ctrl-Z on Windows. 
Any pushback on stdin() will be cleared before return.)

或者，文件可以是可读的文本模式连接

file can also be a complete URL.

文件也可以是完整的URL。

header	
a logical value indicating whether the file contains the names of the variables as its first line. 
If missing, the value is determined from the file format: 
header is set to TRUE if and only if the first row contains one fewer field 
than the number of columns.

标题：逻辑值（TRUE/FALSE）。用于表明是否变量名作为第一行。若省略参数header，则取决于读取的文件格式,read.table默认为FALSE，read.csv默认为TRUE；read.table函数当且仅当第一行长度比列数少1个时，默认设置会是TRUE（比如有4列数据，但是第一行的长度为3）。
当参数 $h e a d e r = F A L S E$ ，则把数据的第一行作为变量名，R控制台展示中将增加列标题：V1,V2,V3,…
在这里插入图片描述
当参数 $h e a d e r = T R U E$ ，包含变量名作为第一行。

sep	
the field separator character. 
Values on each line of the file are separated by this character. 
If sep = "" (the default for read.table) the separator is ‘white space’, 
that is one or more spaces, tabs, newlines or carriage returns.

分隔符：文件中每行数据之间的分隔符。 read.table默认读取的是空格分隔的文件。

col.names	
a vector of optional names for the variables. 
The default is to use "V" followed by the column number.

列名
列变量名称，默认为使用“ V”，后跟列号，如V1，V2，…
在默认header=False情况下，新增列名以V1，V2，…展示。在这里插入图片描述
指定列名：此时列变量名称为指定的a,b,c

指定第一列为列名，并用a,b,c命名

row.names	
a vector of row names. 
This can be a vector giving the actual row names, 
or a single number giving the column of the table 
which contains the row names, 
or character string giving the name of the table column 
containing the row names.

If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names. Otherwise if row.names is missing, the rows are numbered.

Using row.names = NULL forces row numbering. Missing or NULL row.names generate row names that are considered to be ‘automatic’

行名：使用向量的形式给出行名。默认行名为递增的数字

as.is	
the default behavior of read.table is to convert character variables 
(which are not converted to logical, numeric or complex) to factors. 
The variable as.is controls the conversion of columns not otherwise specified by colClasses. 
Its value is either a vector of logicals (values are recycled if necessary), 
or a vector of numeric or character indices 
which specify which columns should not be converted to factors.

read.table的默认操作是会将字符串转化为因子，逻辑型，数值与复数型则默认不会转化。as.is指定哪些列不转化为因子。取值为逻辑型向量（可循环）或者数值或字符向量指定哪些列不转化为因子
在这里插入图片描述
由此可见，as.is取值为T的不转化为因子，取值为F的转化为因子

na.strings	
a character vector of strings which are to be interpreted as NA values. 
Blank fields are also considered to be missing values in logical, integer, numeric and complex fields. 
Note that the test happens after white space is stripped from the input, 
so na.strings values may need their own white space stripped in advance.

缺失值
被解释成缺失值的字符串向量。在逻辑，整数，数字和复数字段中，空白字段也被视为缺少值。
默认空白值作为缺失值
在这里插入图片描述
同时还可以定义哪些值作为缺失值：如：

nrows	
integer: the maximum number of rows to read in. 
Negative and other invalid values are ignored.

行数：
读取数据的最大行数。负数和其他无效值均会被忽略
在这里插入图片描述

as.is
the default behavior of read.table is to convert character variables 
(which are not converted to logical, numeric or complex) to factors. 
The variable as.is controls the conversion of columns 
not otherwise specified by colClasses. 
Its value is either a vector of logicals (values are recycled if necessary), 
or a vector of numeric or character indices 
which specify which columns should not be converted to factors.
Note: to suppress all conversions including those of numeric columns, 
set colClasses = "character".
Note that as.is is specified per column (not per variable) 
and so includes the column of row names (if any) 
and any columns to be skipped.

R语言默认会将字符串变量转化成因子（不会转换逻辑型、数值型与复数型变量）

skip	
integer: the number of lines of the data file to skip 
before beginning to read data.

跳过
取值为整数：读取数据时忽略的行数

如设置skip=5，则数据从第6行开始读取
在这里插入图片描述

stringsAsFactors	
logical: should character vectors be converted to factors? Note that this is overridden by as.is and colClasses, both of which allow finer control.

逻辑型变量：字符型是否转化为因子：默认是转化，

check.names	
logical. If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names. If necessary they are adjusted (by make.names) so that they are, and also to ensure that there are no duplicates.

检查变量名称。
逻辑值，若取值为T，则数据框中的变量名将会进行检查以确保变量名命名符合规范。必要的时候将会调整变量名，以确保没有重复。默认取值为T。
在这里插入图片描述

fill	
logical. If TRUE then in case the rows have unequal length, 
blank fields are implicitly added.

填充
逻辑型值。若取值为T，当行的长度不等时，将会隐式填补空白（不理解）

read.csv and read.csv2 are identical to read.table except for the defaults. 
They are intended for reading ‘comma separated value’ files (‘.csv’) or 
(read.csv2) the variant used in countries that use a comma as decimal point 
and a semicolon as field separator.
Similarly, read.delim and read.delim2 are for reading delimited files, 
defaulting to the TAB character for the delimiter. Notice that header = TRUE 
and fill = TRUE in these variants, and that the comment character is disabled.

read.csv和read.csv2与read.table除默认设置外相同。它们用于读取“逗号分隔值”文件（.csv）或（read.csv2）在使用逗号作为小数点和使用分号作为字段分隔符的国家/地区中使用的变体。同样的，read.delim和read.delim2用于读取定界文件，默认为定界符的TAB字符。请注意，这些变体中的header = TRUE和fill = TRUE，并且注释字符已禁用。
注意：read.csv这里关于header的默认设置与read.table不同

参考文档read.table