R----数据文件的处理

最新推荐文章于 2024-02-26 22:14:58 发布

小小练习生

最新推荐文章于 2024-02-26 22:14:58 发布

阅读量2.8k

点赞数

分类专栏： R

本文链接：https://blog.csdn.net/qq_42149144/article/details/106740024

版权

R 专栏收录该内容

4 篇文章 1 订阅

订阅专栏

数据文件的处理

R可从键盘、文本文件、Excel、Acess、流行的统计软件、特殊格式文件、多种关系型数据库以及网页获取数据。
从键盘输入数据：R中的函数edit()会自动调用一个允许手动输入数据的文本编辑器（数据框编辑器）。或用fix()。

数据编辑器

edit()

edit()如果不将其赋值到一个目标，则不会保留改动

 # 首先创建空的数据集，设置标签，即为mysql创建新表一样
 > mydata<-data.frame(id=numeric(0),age=numeric(0),gender=character(0),weight=numeric(0),height=numeric(0))
 > mydata
 [1] id     age    gender weight height
 <0 行> (或0-长度的row.names)
 
 # 开始 edit()

在这里插入图片描述

 # 写入一行数据关闭
 > edit(mydata) # 关闭时自动显示输入内容
   id age gender weight height
 1  1  28     男     60   1.78
 
 查看
 > mydata
 [1] id     age    gender weight height
 <0 行> (或0-长度的row.names)    # 没有保存
 
 对edit() 建立新的变量
 > new_data <- edit(mydata)  # 赋值截图省略
 > new_data
   id age gender weight height
 1  1  30     男   60.2   1.76

fix()

如果不将其赋值到一个目标，fix（）函数会保留改动

> fix(mydata) # 填写数据的截图省略
> mydata
  id age gender weight height
1  1  27     男     59   1.67
2  2  22     女     50   1.66
3  3  24     男     67   1.77

在R中保存你的内容

保存命名对象

> save(list,file='file_url') # file_url 及保存的文件名的绝对路径

list 指令可以通过以下两种方式之一执行：
- 输入要保存的对象名称，以逗号分隔
- 引用其他创建的名称列表

> save(bf,bf.lm,bf.beta,file="xxx.Rdata")
> save(list=ls(pattern="^bf"),file="xxxx.RData") # 注意 ls() 创建以 bf 开头的对象的列表

> mydata <- c("bf","bf.lm","bf.beta")
> save(mydata,file="xxxxx.RData")

保存所有操作

# 方法一
> save(list=ls(all=TRUE)，file="file_url") # file_url 及保存的文件名的绝对路径

# 方法二
> save.image(file="filename")

以文本文件形式保存数据到磁盘

cat(... , file = "", sep = " ", fill = FALSE, labels = NULL,
    append = FALSE)

将向量对象写入磁盘

write

write(x, file = "data",
      ncolumns = if(is.character(x)) 1 else 5,
      append = FALSE, sep = " ")

将矩阵和数据帧写入磁盘

write.table()

write.table(x, file = "", append = FALSE, quote = TRUE, sep = " ",
            eol = "\n", na = "NA", dec = ".", row.names = TRUE,
            col.names = TRUE, qmethod = c("escape", "double"),
            fileEncoding = "")

write.csv()
```
同上
```

将列表对象写入磁盘

由 dput() 命令生成

dput(x, file = "",control = c("keepNA", "keepInteger","niceNames","showAttributes"))

由 dget() 命令调出
```
dget(file,keep.source = FALSE)
```

数据的输入

scan()

使用 c() 命令时，你可能觉得键入所有分隔数值的逗号有些乏味。作为替代，可以使用 scan() 命令来完成的相应的任务，但是不需要使用逗号
```
> our.data<-scan()
1: 1 2 3 4 5 6 7
8:  # 按 enter
Read 7 items
> our.data
[1] 1 2 3 4 5 6 7
```

输入文本作为数据

scan函数要求被读入的各列数据有相同的存储类型。

读文本数据到向量中：scan()格式：

scan(file=“文件名”，skip=行数，what= 存储类型转换函数（）)

参数	描述
file	读取对象
skip	指定从文件的第几行开始读数据
what	通常对数值型数据采用 double()

> day<-scan(what="character")   # 添加 what 选项
1: Mon Tue Wed
4: Thu Fri
6: 
Read 5 items
> day
  [1] "Mon" "Tue" "Wed" "Thu" "Fri"

cat("TITLE extra line", "2 3 5 7", "11 13 17", file = "ex.data", sep = "\n")
pp <- scan("ex.data", skip = 1, quiet = TRUE)
scan("ex.data", skip = 1)
scan("ex.data", skip = 1, nlines = 1) # only 1 line after the skipped one
scan("ex.data", what = list("","","")) # flush is F -> read "7"
scan("ex.data", what = list("","",""), flush = TRUE)
unlink("ex.data") # tidy up

使用剪贴板制作数据

对剪贴板使用 scan() 命令，可以从其他的程序（如电子表格）中输入数据：
- 根据电子表格数据类型，指定 what 选项，enter
- 切换到包含数据的电子表格
- 选中需要的单元格，将其复制到剪贴板
- 返回R，粘贴，即可

数据的读取

从带分隔符的文本文件导入数据

read.table()

read.table(file, header = FALSE, sep = "", quote = "\"'",
           dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
           row.names, col.names, as.is = !stringsAsFactors,
           na.strings = "NA", colClasses = NA, nrows = -1,
           skip = 0, check.names = TRUE, fill = !blank.lines.skip,
           strip.white = FALSE, blank.lines.skip = TRUE,
           comment.char = "#",
           allowEscapes = FALSE, flush = FALSE,
           stringsAsFactors = default.stringsAsFactors(),
           fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
# 注：sep省略时默认的分 隔符为 空格、制表符、换行符或 回车

他需要的数据是在 ASCII格式的
一个用 Windows 记事本或任何其他纯文本编辑器创建的 “平板文件”
读取的返回是一个数据框，其中文件中的每一行包含来自一个对象的所有数据，以特殊的顺序，用空格或其他分隔符分开
文件的第一行可能包含一个给出变量名称的标头信息
- file.chose() 可以直接选取文件(不能在linux使用)

获取和设置工作目录

可以使用getwd()函数来检查R工作区指向哪个目录，使用setwd()函数设置新的工作目录

> getwd()
[1] "C:/Users/21323/Documents/学习笔记_R/day_3 文件的处理"
> setwd("C:\Users\21323\Documents") # 注意输入格式
错误: 由""C:\U"开头的字符串中有'\U'，它必需同hex数字一起来用
> setwd("C:\\Users\\21323\\Documents")
> getwd()
[1] "C:/Users/21323/Documents"

读取CSV 文件

csv文件
```
read.csv(file,header=true/false,sep=',' )
```
- 从带逗号分隔符的文本文件导入数据，sep默认为逗号。
read.csv()从带逗号分隔符的文本文件中导入数据，并保存为一个数据框。

读取Excel数据文件

```
read.xlsx(file=”Excel文件名”,工作表编号， header=true/false,as.data.frame=true/false) 
```
(需要安装xlsx包,rJava包,需要在本地配置好java)

读取一个Excel文件的最好方式就是在Excel中将其导出为一个csv文件，再用read.csv（）将其读入R中.

读取二进制文件

R有两个函数用来创建和读取二进制文件，它们分别是：WriteBin()和readBin()函数。

writeBin(object, con)
readBin(con, what, n )

读取xml 文件

install.packages("XML")

R使用xmlParse()函数来读取xml文件，它作为列表存储在R中。

读取JSON文件

install.packages("rjson")

result <- fromJSON(file = "input.json")

读取 web数据

需要以下包才能处理URL和链接到文件。如果它们在R环境中不可用，则可以使用以下命令安装它们。

install.packages("RCurl")
install.packages("XML")
install.packages("stringr")
install.packages("plyr")

我们将使用函数getHTMLLinks()来收集文件的URL。然后将使用函数download.file()将文件保存到本地系统。由于我们将为多个文件一次又一次地应用相同的代码，所以将创建一个被多次调用的函数。文件名作为参数以R列表对象的形式传递给此函数。

# Read the URL.
url <- "http://www.geos.ed.ac.uk/~weather/jcmb_ws/"

# Gather the html links present in the webpage.
links <- getHTMLLinks(url)

# Identify only the links which point to the JCMB 2015 files. 
filenames <- links[str_detect(links, "JCMB_2015")]

# Store the file names as a list.
filenames_list <- as.list(filenames)

# Create a function to download the files by passing the URL and filename list.
downloadcsv <- function (mainurl,filename) {
   filedetails <- str_c(mainurl,filename)
   download.file(filedetails,filename)
}

# Now apply the l_ply function and save the files into the current R working directory.
l_ply(filenames,downloadcsv,mainurl = "http://www.geos.ed.ac.uk/~weather/jcmb_ws/")

运行上述代码后，可以在当前R工作目录中找到以下文件。

"JCMB_2015.csv" "JCMB_2015_Apr.csv" "JCMB_2015_Feb.csv" "JCMB_2015_Jan.csv"
   "JCMB_2015_Mar.csv"

连接MySQL读取数据库文件

R有一个名为RMySQL的内置包，它提供与MySql数据库之间的本机连接。您可以使用以下命令在R环境中安装此软件包。

install.packages("RMySQL")

将R连接到MySql

当安装了软件包(RMySQL)之后，我们在R中创建一个连接对象以连接到数据库。它需要用户名，密码，数据库名称和主机名等数据库连接所需要的信息。

library("RMySQL");
# Create a connection Object to MySQL database.
# We will connect to the sampel database named "testdb" that comes with MySql installation.
mysqlconnection = dbConnect(MySQL(), user = 'root', password = '123456', dbname = 'testdb',
   host = 'localhost')

# List the tables available in this database.
dbListTables(mysqlconnection)

当我们执行上述代码时，会产生以下结果(当前数据中的所有表) -

 [1] "articles"       "contacts"       "demos"          "divisions"     
 [5] "items"          "luxuryitems"    "order"          "persons"       
 [9] "posts"          "revenues"       "special_isnull" "t"             
[13] "tbl"            "tmp"            "v1"             "vparts"

查询表

可以使用dbSendQuery()函数查询MySQL中的数据库表。该查询在MySql中执行，并使用R 的fetch()函数返回结果集,最后将此结果作为数据帧存储在R中。
假设要查询的表是：persons，其创建语句和数据如下 -

/*
Navicat MySQL Data Transfer

Source Server         : localhost-57
Source Server Version : 50709
Source Host           : localhost:3306
Source Database       : testdb

Target Server Type    : MYSQL
Target Server Version : 50709
File Encoding         : 65001

Date: 2017-08-24 00:35:17
*/

SET FOREIGN_KEY_CHECKS=0;

-- ----------------------------
-- Table structure for `persons`
-- ----------------------------
DROP TABLE IF EXISTS `persons`;
CREATE TABLE `persons` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `full_name` varchar(255) NOT NULL,
  `date_of_birth` date NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;

-- ----------------------------
-- Records of persons
-- ----------------------------
INSERT INTO `persons` VALUES ('1', 'John Doe', '1990-01-01');
INSERT INTO `persons` VALUES ('2', 'David Taylor', '1989-06-06');
INSERT INTO `persons` VALUES ('3', 'Peter Drucker', '1988-03-02');
INSERT INTO `persons` VALUES ('4', 'Lily Minsu', '1992-05-05');
INSERT INTO `persons` VALUES ('5', 'Mary William', '1995-12-01');

将上述表导入到数据库中，并创建以下R代码，用来执行从数据库的表中查询数据 -

library("RMySQL");
# Create a connection Object to MySQL database.
# We will connect to the sampel database named "testdb" that comes with MySql installation.
mysqlconnection = dbConnect(MySQL(), user = 'root', password = '123456', dbname = 'testdb',
   host = 'localhost');
# Query the "actor" tables to get all the rows.
result = dbSendQuery(mysqlconnection, "select * from persons")

# Store the result in a R data frame object. n = 5 is used to fetch first 5 rows.
data.frame = fetch(result, n = 5)
print(data.frame)

执行上面示例代码，得到以下结果 -

  id     full_name date_of_birth
1  1      John Doe    1990-01-01
2  2  David Taylor    1989-06-06
3  3 Peter Drucker    1988-03-02
4  4    Lily Minsu    1992-05-05
5  5  Mary William    1995-12-01

使用过滤子句查询

我们可以传递任何有效的选择查询来获取结果，如下代码所示 -

library("RMySQL");
# Create a connection Object to MySQL database.
# We will connect to the sampel database named "testdb" that comes with MySql installation.
mysqlconnection = dbConnect(MySQL(), user = 'root', password = '123456', dbname = 'testdb',
   host = 'localhost');
result = dbSendQuery(mysqlconnection, "select * from persons where date_of_birth = '1990-01-01'")

# Fetch all the records(with n = -1) and store it as a data frame.
data.frame = fetch(result, n = -1)
print(data.frame)

当我们执行上述代码时，会产生以下结果 -

  id full_name date_of_birth
1  1  John Doe    1990-01-01

更新表中的行记录

可以通过将更新查询传递给dbSendQuery()函数来更新MySQL表中的行。

dbSendQuery(mysqlconnection, "update persons set date_of_birth = '1999-01-01' where id=3")

执行上述代码后，可以看到在MySql已经更新persons表中对应的记录。

将数据插入到表中

参考以下代码实现 -

library("RMySQL");
# Create a connection Object to MySQL database.
# We will connect to the sampel database named "testdb" that comes with MySql installation.
mysqlconnection = dbConnect(MySQL(), user = 'root', password = '123456', dbname = 'testdb',
   host = 'localhost');
dbSendQuery(mysqlconnection,
   "insert into persons(full_name, date_of_birth) values ('Maxsu', '1992-01-01')"
)

执行上述代码后，可以看到向MySql的persons表中，插入一行数据。

在MySql中创建表

我们通过使用dbWriteTable()函数向MySql中创建表。它会覆盖表，如果它已经存在并且以数据帧为输入。

library("RMySQL");
# Create the connection object to the testdb database where we want to create the table.
mysqlconnection = dbConnect(MySQL(), user = 'root', password = '123456', dbname = 'testdb',host = 'localhost')

# Use the R data frame "mtcars" to create the table in MySql.
# All the rows of mtcars are taken inot MySql.
dbWriteTable(mysqlconnection, "mtcars", mtcars[, ], overwrite = TRUE)

执行上述代码后，我们可以看到在MySql数据库中创建一个名称为：mtcars的表，并有填充了一些数据。

在MySql中删除表

我们可以删除MySql数据库中的表，将drop table语句传递到dbSendQuery()函数中，就像在SQL中查询表中的数据一样。

dbSendQuery(mysqlconnection, 'drop table if exists mtcars')

执行上述代码后，我们可以看到MySql数据库中的mtcars表被删除。

ysqlconnection = dbConnect(MySQL(), user = 'root', password = '123456', dbname = 'testdb',host = 'localhost')
#Use the R data frame "mtcars" to create the table in MySql.
#All the rows of mtcars are taken inot MySql.
dbWriteTable(mysqlconnection, "mtcars", mtcars[, ], overwrite = TRUE)

执行上述代码后，我们可以看到在MySql数据库中创建一个名称为：mtcars的表，并有填充了一些数据。

在MySql中删除表

我们可以删除MySql数据库中的表，将drop table语句传递到dbSendQuery()函数中，就像在SQL中查询表中的数据一样。

dbSendQuery(mysqlconnection, 'drop table if exists mtcars')

执行上述代码后，我们可以看到MySql数据库中的mtcars表被删除。

参考文档

小小练习生

关注

0
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
R----数据文件的处理

文章目录数据文件的处理数据编辑器edit()fix()在R中保存你的内容保存命名对象保存所有操作以文本文件形式保存数据到磁盘将向量对象写入磁盘将矩阵和数据帧写入磁盘将列表对象写入磁盘数据的输入scan()输入文本作为数据使用剪贴板制作数据数据的读取从带分隔符的文本文件导入数据获取和设置工作目录读取CSV 文件读取Excel数据文件读取二进制文件读取xml 文件读取JSON文件**读取 web数据**连接MySQL读取数据库文件将R连接到MySql查询表使用过滤子句查询更新表中的行记录将数据插入到表中在MyS
复制链接

扫一扫