环境
R语言 VScode
一、数据
mytxt <- "
date age gender
20201101 25 1
20210601 68 2
20200801 32 1
20221001 36 1
20200201 70 2
"
group <- c("old", "young")
x<-c(77,56,98,69,35,37,79,33,28,36,92,50)
二、问题描述
基于上述两个对象 mytxt
和 group
,实现下列操作,并用sink函数将结果展示在一个word文档里面,相应的程序编码复制到 word 文档的末尾,并正反面打印在一张A4纸上,在第一页上方标明“班级/学号/姓名”。
1.利用对象 mytxt
生成数据框,并显示数据框;
2.在数据框中添加 id
变量,并显示新数据框;
3.找到年龄低于65岁的观测样本,按照样本顺序显示其 id 号码;
4.利用对象 group
,按照样本顺序显示各观测样本的年龄属于的年龄组是“old”
(年龄大于65岁)还是“young”
(年龄小于65岁)。
[1] 'young' 'old' 'young' 'young' 'old'
5.将数据框中的变量 gender
转为因子结构,“1”对应于“female”
,“2”对应于“male”
,并显示新数据框;
6.按照样本顺序显示年轻人的性别;
7.将数据框中的变量 date
转为日期值结构,并显示新数据框;
8.按照date
变量从小到大排序数据框,并显示新数据框;
9.在所有的 x<- c(77,56,98,69,35,37,79,33,28,36,92,50)
中,按数值从小到大排序,36排在第几位?所有数据中第三大的数是多少?(借助于 sort
或 order
或 rank
函数都可以实现)倒数第3大的数在x
中的序号是多少?
三、解决问题
第一题:
date <- c("20201101", "20210601", "20200801", "20221001", "20200201")
age <- c(25, 68, 32, 36, 70)
gender <- c(1, 2, 1, 1, 2)
data_1 <- data.frame(date, age, gender)
print(data_1)
结果展示为:
> source("f:\\PycharmProject\\23.R-Project\\homework_1.r", encoding = "UTF-8")
date age gender
1 20201101 25 1
2 20210601 68 2
3 20200801 32 1
4 20221001 36 1
5 20200201 70 2
第二题:
id <- c("a", "b", "c", "d", "e")
data_2 <- data.frame(id, data_1)
print(data_2)
结果展示为:
id date age gender
1 a 20201101 25 1
2 b 20210601 68 2
3 c 20200801 32 1
4 d 20221001 36 1
5 e 20200201 70 2
第三题:
data_3 <- subset(data_2, age < 65, select <- c("id", "date", "age", "gender"))
print(data_3)
id date age gender
1 a 20201101 25 1
3 c 20200801 32 1
4 d 20221001 36 1
第四题:
data_4 <- data_2
data_4$group[data_2$age < 65] <- "young"
data_4$group[data_2$age > 65] <- "old"
print(data_4)
结果展示为:
id date age gender group
1 a 20201101 25 1 young
2 b 20210601 68 2 old
3 c 20200801 32 1 young
4 d 20221001 36 1 young
5 e 20200201 70 2 old
第五题:
data_5 <- data_2
data_5$gender[data_2$gender == 1] <- "female"
data_5$gender[data_2$gender == 2] <- "male"
print(data_5)
结果展示为:
id date age gender
1 a 20201101 25 female
2 b 20210601 68 male
3 c 20200801 32 female
4 d 20221001 36 female
5 e 20200201 70 male
第六题:
print(data_5$gender)
结果展示为:
[1] "female" "male" "female" "female" "male"
第七题:
data_date <- as.Date(data_2$date, "%Y%m%d")
data_2$date <- data_date
data_7 <- data.frame(data_2)
print(data_7)
结果展示为:
id date age gender
1 a 2020-11-01 25 1
2 b 2021-06-01 68 2
3 c 2020-08-01 32 1
4 d 2022-10-01 36 1
5 e 2020-02-01 70 2
第八题:
sort_date <- data_date[order(data_date)]
sort_age <- age[order(data_date)]
sort_gender <- gender[order(data_date)]
sort_id <- id[order(data_date)]
date <- sort_date
age <- sort_age
gender <- sort_gender
id <- sort_id
data_8 <- data.frame(id, date, age, gender)
print(data_8)
结果展示为:
id date age gender
1 e 2020-02-01 70 2
2 c 2020-08-01 32 1
3 a 2020-11-01 25 1
4 b 2021-06-01 68 2
5 d 2022-10-01 36 1
第九题:
1.对x进行排序:
print(sort(x))
[1] 28 33 35 36 37 50 56 69 77 79 92 98
36排在第四位。
print(rank(x))
print(x[rank(x) == 10])
[1] 9 7 12 8 3 5 10 2 1 4 11 6
[1] 79
第三大的数是79.
a <- x[rank(x) == 10]
print(order(x)[sort(x) == a])
结果为:
[1] 7
至此,问题解答完毕,最后只需要加上sink()函数来将输出结果打印成word文档即可!
四、源代码
注意:不带sink()函数,需要自己添加!使用的编译器为vscode
mytxt <- "
date age gender
20201101 25 1
20210601 68 2
20200801 32 1
20221001 36 1
20200201 70 2
"
date <- c("20201101", "20210601", "20200801", "20221001", "20200201")
age <- c(25, 68, 32, 36, 70)
gender <- c(1, 2, 1, 1, 2)
data_1 <- data.frame(date, age, gender)
print(data_1)
id <- c("a", "b", "c", "d", "e")
data_2 <- data.frame(id, data_1)
print(data_2)
data_3 <- subset(data_2, age < 65, select <- c("id", "date", "age", "gender"))
print(data_3)
data_4 <- data_2
data_4$group[data_2$age < 65] <- "young"
data_4$group[data_2$age > 65] <- "old"
print(data_4)
data_5 <- data_2
data_5$gender[data_2$gender == 1] <- "female"
data_5$gender[data_2$gender == 2] <- "male"
print(data_5)
print(data_5$gender)
data_date <- as.Date(data_2$date, "%Y%m%d")
data_2$date <- data_date
data_7 <- data.frame(data_2)
print(data_7)
sort_date <- data_date[order(data_date)]
sort_age <- age[order(data_date)]
sort_gender <- gender[order(data_date)]
sort_id <- id[order(data_date)]
date <- sort_date
age <- sort_age
gender <- sort_gender
id <- sort_id
data_8 <- data.frame(id, date, age, gender)
print(data_8)
x <- c(77, 56, 98, 69, 35, 37, 79, 33, 28, 36, 92, 50)
print(sort(x))
print(rank(x))
print(x[rank(x) == 10])
a <- x[rank(x) == 10]
print(order(x)[sort(x) == a])