R语言入门学习笔记1——5种数据模式：vector,matrix, factor, list, data frame

最新推荐文章于 2023-04-13 09:01:01 发布

阿熊少年

最新推荐文章于 2023-04-13 09:01:01 发布

阅读量6k

点赞数 5

分类专栏：学习笔记文章标签： R语言基础入门数据模式

本文链接：https://blog.csdn.net/weixin_41649768/article/details/85849759

版权

学习笔记专栏收录该内容

2 篇文章 1 订阅

订阅专栏

0. 开篇

这一系列是我学习R基础的笔记，它的特点是容易查找，当然也不太严谨。如果是想快速上手，或者喜欢通过多试代码而不是阅读的朋友，可以参考我的笔记。

这里是第一部分，关于R当中5种基础的数据模式 (mode)：vector,matrix, factor, list, data frame.

1. 基础数据模式的操作

记录向量(vector)、矩阵（matrix）、因素(factor)、列表(list)和数据框架(data frame)的创建、获取子集以及基本运算。

1.1 向量(vector)

1.1.1 创建

# 创建一个数值向量和一个文本向量
c(1,2,3,4)
c("a","b","c") 
# 会自动将向量内数据强制转换为同一类型
> c(1,"a",2,3)
[1] "1" "a" "2" "3"

1.1.2 命名（标签）

# 利用names()
suits <- c("hearts","spades","diamonds","clubs")
remain <- c(11,12,11,13)
names(remain) <- suits
# 创建时命名
remain <- c(hearts = 11, spades = 12, diamonds = 11, clubs = 13)

1.1.3 取子集

# 序列
remain[c(1,4)]
# 反取
remain[-c(1,4)]
# 名称
remain[c("spades","clubs")]
# 逻辑（会循环取）
remain[c(TRUE,FALSE)]

在接下来的其它数据模式中，都可以取以上4种方式取子集，举一反三，不再赘述。

1.1.4 运算

对向量进行加减乘除、逻辑等基本运算，都会对其中每个元素进行运算(element-wide)，返回一个向量。这一原则同样适用于接下来提到的其它数据模式。
另外一些基本的运算函数：

#平均值
mean(remain)
# 求和
sum(remain)
# 标准差
sd(remain)

1.2 矩阵(matrix)

1.2.1 创建

# 四种结果一致
mtrx <- matrix(1:6, nrow=2)
mtrx <- matrix(1:6, ncol=3)
mtrx <- cbind(1:2,3:4,5:6)
mtrx <- rbind(seq(1, 5, by = 2),seq(2, 6, by = 2))
# 默认按列放置元素，可以用byrow改变
mtrx <- matrix(1:6,nrow = 2, byrow = TRUE)

1.2.2 命名（标签）

# 利用rownames和colnames
rownames(mtrx) <- c("row1","row2")
colnames(mtrx) <- c("col1","col2","col3")
# 创建时命名
mtrx<-matrix(1:6,nrow=2,
          dimnames = list(c("row1","row2"),
                          c("col1","col2","col3")))

1.2.3 取子集

# 取第2行第2列的元素
mtrx[2,2]
# 取按列数第4个元素，返回值同上一行
mtrx[4]
# 取1、2行和1、3列得新矩阵
mtrx[c(1,2),c(1,3)]
# 以上方法使用名称和逻辑量同理

1.2.4 运算

# 每行求和得向量，列同理使用户colSums()，rowMeans同理
sum_of_rows_vector <- rowSums(mtrx)

1.3 因素(factor)

factor用于容纳一系列的类别数据(categorical)，可以是无序的（血型），或是有序的（衣服尺码）。

1.3.1 创建

# 先准备一个向量
blood <- c("B","AB","O","A","O","O","A","B")
# 创建factor，默认levels按字母顺序
blood_factor <- factor(blood)
# 自定义levels的顺序
blood_factor2 <- factor(blood,levels<-c("O","A","B","AB"))

1.3.2 命名（标签）

# 给levels命名
levels(blood_factor2) <- c("BT_O","BT_A","BT_B","BT_AB")
# 创建时命名
factor(blood, 
       levels <- c("O","A","B","AB"), 
       labels = c("BT_O","BT_A","BT_B","BT_AB"))

1.3.3 排序

# 设置排序
tshirt <- c("M","L","S","S","L")
tshirt_factor <- factor(tshirt,ordered = TRUE,
                        levels = c("S","M","L"),
                        labels = c("Small","Medium","Large"))
# 取子集比较
> tshirt[1] > tshirt[5]
[1] TRUE

1.4 列表(list)

1.4.1 创建、命名及扩展

# 两种创建并命名方式
song <- list("Rsome time",190,5,
             list(title = "R you on time",duration = 9))
names(song) <- c("title","duration","track","similar")
song <- list(title = "Rsome times",
             duration = 190,
             track = 5,
             similar = list(title = "R you on time",duration = 9))
# 扩展列表，以下两种操作结果不同
c(song, 
  similar = list(title = "R you on time",duration = 9),
  singer = c("M","J"))
c(song, 
  similar = list(title = "R you on time",duration = 9),
  singer = list(c("M","J")))

1.4.2 取子集

1.4.2.0 准备

# 以以上创立的列表song为例，其结构为
> str(song)
List of 4
 $ title   : chr "Rsome times"
 $ duration: num 190
 $ track   : num 5
 $ similar :List of 2
  ..$ title   : chr "R you on time"
  ..$ duration: num 9

1.4.2.1 取子列表

# 取一个子列表（一个包含一个列表的列表）
song[4]
# 即与下式结果相同
 list(similar = list(title = "R you on time",duration = 9))
# 取两行（一个向量和一个列表）构成的子列表
song[c(1,4)]

1.4.2.2 取元素

# 取一个元素
song[[4]]
song$similar
# 与下式结果相同
list(title = "R you on time",duration = 9)
# 取第4元素（列表）中的第一元素（文本）
song[[c(4,1)]]
song[[4]][[1]]

1.5 数据框架(data.frame)

1.5.1 创建和扩展

1.5.1.0 准备

# 准备向量
name <- c("A","B","C")
age <- c(28,30,21)
child <- c(FALSE,TRUE,TRUE)

1.5.1.1 创建

# 两种方法创建
df <- data.frame(name,age,child)
names(df) <- c("Names","Age","Child")
df <- data.frame(Name = name,Age = age,Child = child)

1.5.1.2 扩展

# 三种扩展方法
height <- c(180,175,177)
df$Height <- height
df[["Height"]] <- height
df <- cbind(df, Height = height)

1.5.2 取子集

一个data.frame既可以看做matrix，也可以看做list，所以可以用两者的一些方法取子集。接下来以上面创立的df为例：

1.5.2.1 用matrix的方法

# 取单个元素
df[2,3]
# 取一个子data.frame
df[c(1,3),c(FALSE,TRUE)]
# 单独取一行会返回list
df[-c(1,3),]
# 取一列会返回vector或者factor (对于text)
df[,c(TRUE,FALSE,FALSE,FALSE)]

1.5.2.2 用list的方法

# 取一个子data.frame
df["Age"]
df[2]
# 取一个vector或者factor
df[["Age"]]
df[[2]]

1.5.2.3 其它取子集方法

# 运用subset函数
subset(df,subset = Child == TRUE)
subset(df,subset = Child == TRUE & Age > 25)
# Alternatively，实际用到了逻辑取子集的方法
df[df$Child == TRUE,]
df[df$Child == TRUE & df$Age > 25,]

1.5.3 排序

# 将年龄升序排序
sort(df$Age)
# 得每一行的升序/降序排名构成的向量
ranks1 <- order(df$Age)
ranks2 <- order(df$Age,decreasing = TRUE)
# 将list重新排列
df[ranks1,]
df[ranks2,]

1.6 查看、转换mode和一点其它

# 查看mode，单个是数值也是向量 is.anymode()
> is.vector(3)
[1] TRUE
# 查看长度
length(c(1:6))
# 查看维度
> dim(as.matrix(c(1:3)))
[1] 3 1
# 查看结构
str()
# 转换mode as.anymode()
as.matrix(c(1:5))

阿熊少年

关注

5
点赞
踩
49

收藏

觉得还不错? 一键收藏
0
评论
R语言入门学习笔记1——5种数据模式：vector,matrix, factor, list, data frame

开篇笔者也是一名R语言才入门的小白，我学习R的主要用途时处理结构性数据（也就是表格），学习的途径主要是我所在的悉尼大学的课程以及edx平台。这篇博客主要是我复习R基础的笔记，用来记录一些函数的使用方法，它的特点是容易查找，当然也不太严谨，另外我接触的一般是英文资料，有的翻译不准确。虽然很基础，但注意这不是教程只是记录，都说没有必要重复造轮子，对于想零基础自己学习R的朋友，我推荐我自己使用的ed...
复制链接

扫一扫