R语言中的函数4：split()， cut(), subset()

zoujiahui_2018

已于 2024-06-03 21:09:05 修改

阅读量1.7w

点赞数 2

分类专栏： # R语言中的函数文章标签： r语言开发语言

于 2020-04-28 13:16:42 首次发布

本文链接：https://blog.csdn.net/qq_18055167/article/details/105811445

版权

R语言中的函数专栏收录该内容

28 篇文章 26 订阅

订阅专栏

文章目录

split()函数
subset()函数
cut()函数

split()函数

在R语言中，split()函数用于根据指定的因素（factor）或列表对数据进行分割（split）操作。通过split()函数，可以将数据框（data frame）、向量（vector）或列表（list）按照指定因素的取值进行分组，将数据拆分成多个子集。

#R
split(x, f, drop = FALSE, ...)
## Default S3 method:
split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, ...)

split(x, f, drop = FALSE, ...) <- value
unsplit(value, f, drop = FALSE)

x: 一个待分组的向量或者data frame
f: 函数，一个factor或者list（如果list中元素交互作用于分组中），以此为规则将x分组
drop: 逻辑值，如果f中的某一个level没有用上则被弃用
value: 一个储存向量的list，其形式类似于分组完成之后返回的那个list

实例1：


x=1:10
f=rep(c(1,0),5)
split(x,f)
# $`0`
# [1]  2  4  6  8 10
# 
# $`1`
# [1] 1 3 5 7 9

f=function(x){x%2}
split(x,f)
# $`0`
# [1]  2  4  6  8 10
# 
# $`1`
# [1] 1 3 5 7 9
f=c(1,2)
split(x,f)

# $`1`
# [1] 1 3 5 7 9
# 
# $`2`
# [1]  2  4  6  8 10

实例2：

d <- data.frame(gender=c("M","M","F","M","F","F"),age=c(47,59,21,32,33,24),income=c(55000,88000,32450,76500,123000,45650), over25=rep(c(1,1,0), times=2))
> d
# gender age income over25
# 1      M  47  55000      1
# 2      M  59  88000      1
# 3      F  21  32450      0
# 4      M  32  76500      1
# 5      F  33 123000      1
# 6      F  24  45650      0
res=split(d$income, list(d$gender,d$over25)) #将income按照gender、over25分组
# $`F.0`
# [1] 32450 45650
# $M.0
# numeric(0)
# $F.1
# [1] 123000
# $M.1
# [1] 55000 88000 76500

实例3：

require(stats) 
require(graphics)
n <- 10
nn <- 100
g <- factor(round(n * runif(n * nn)))
x <- rnorm(n * nn) + sqrt(as.numeric(g))
xg <- split(x, g)
#分割后可以作图
boxplot(xg, col = "lavender", notch = TRUE, varwidth = TRUE)
#apply()系函数可以直接作用
sapply(xg, length)
# 0   1   2   3   4   5   6   7   8   9  10 
# 50 103  95  98 112  93  93  94 104  99  59 
sapply(xg, mean)
# 0         1         2         3         4         5         6         7 
# 0.8923271 1.4324820 1.7787909 1.9254733 2.1598414 2.4312720 2.6217865 2.7740930 
# 8         9        10 
# 3.1816875 3.1226753 3.3066832

在这里插入图片描述

同样的python中也有split()函数，只不过python的split()函数用于对字符串的分割。如果想达到R中split()函数的分组的类似效果可以利用python中的过滤器。

x=range(1,11)
res=filter(lambda x:x%2,x)
list(res)
# [1, 3, 5, 7, 9]

subset()函数

subset()是一个用于数据子集筛选的函数。通过subset()函数，可以根据指定的条件从数据框（data frame）或向量中提取符合条件的子集数据。

subset(x, subset, select, drop = FALSE)

x: 数据框或向量，表示需要筛选的数据对象。
subset: 条件表达式，用于指定筛选条件。
select: 可选参数，用于选择要保留的列。
drop: 逻辑值，用于指定是否删除维度。

在这里插入图片描述

cut()函数

R语言中的cut()函数通过对一个序列变量进行分割，然后作成一个无序因子或者因子向量。

cut(x, breaks, labels = NULL, include.lowest = FALSE, 
    right = TRUE, dig.lab = 3, ordered_result = FALSE)

x：数值型向量；
breaks：切割点向量；
labels：水平的标签，其默认值为空，此时水平用(num1,mum2]或[num1,mum2)形式表示；
right：逻辑值，默认为TRUE,表示区间为(num1,mum2]形式；
include.lowest：逻辑值，默认为FALSE，当right=TRUE时表示是否包含最小值，当right=FALSE时表示是否包含最大值；
dig.lab：当参数labels=NULL时，设定(num1,mum2]或[num1,mum2)中数字的位数；
ordered_result：逻辑值，默认为FALSE，决定是否为有序因子。

实例:

某班级有36位学生，其成绩为百分制，对应的等级为：
优秀：scores >= 90
良好：90 > scores >= 80
中等：80 > scores >= 70
及格：70 > scores >= 60
不及格：60 > scores

在这里插入图片描述

zoujiahui_2018

关注

2
点赞
踩
33

收藏

觉得还不错? 一键收藏
2
评论
R语言中的函数4：split()， cut(), subset()

split()函数#Rsplit(x, f, drop = FALSE, ...)## Default S3 method:split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, ...)split(x, f, drop = FALSE, ...) <- valueunsplit(value, f, drop = FALS...
复制链接

扫一扫

专栏目录