此部分内容会每日更新,包括但不限于基础知识,进阶知识,数据处理,图表展示,数据分析实战,机器学习算法等~ !!!
本人统计学硕士在读,想在2024年完成sql、python、R语言、stata、matlab等软件的复盘和巩固,目前在做统计学知识和R语言的复习~
后续考虑出相关视频进行讲解说明,请大家持续点赞+收藏+关注哈,大家一起沟通交流~
2.4 图表展示习题
(1)基础图形的绘制(折线图、柱状图等)
习题16:
题目:绘制折线图,展示某产品近一年的销售趋势。
答案:
# 假设销售数据向量sales_data和对应的月份向量months已经给出
sales_data <- c(100, 120, 130, 150, 170, 160, 180, 200, 210, 220, 230)
months <- seq(from = as.Date("2023-01-01"), by = "month", length.out = length(sales_data))
# 绘制折线图
plot(months, sales_data, type = "l", main = "某产品近一年销售趋势", xlab = "月份", ylab = "销售额", col = "blue")
习题17:
题目:使用柱状图比较不同部门的员工人数。
答案:
# 假设部门名称向量departments和对应的员工人数向量staff_counts已经给出
departments <- c("部门A", "部门B", "部门C", "部门D")
staff_counts <- c(20, 30, 15, 25)
# 绘制柱状图
barplot(staff_counts, names.arg = departments, main = "不同部门员工人数比较", xlab = "部门", ylab = "员工人数", col = "lightgreen")
习题18:
题目:绘制堆叠柱状图,展示不同类别的销售数据。
答案:
# 假设类别名称矩阵categories和对应的销售数据矩阵sales_data已经给出
categories <- matrix(c("类别1", "类别2", "类别3", "类别1", "类别2", "类别3"), nrow = 2, byrow = TRUE)
sales_data <- matrix(c(10, 20, 30, 40, 50, 60), nrow = 2, byrow = TRUE)
# 绘制堆叠柱状图
barplot(sales_data, beside = FALSE, legend = rownames(sales_data), main = "不同类别销售数据堆叠柱状图", xlab = "类别", ylab = "销售额", col = c("lightblue", "lightgreen", "pink"))
习题19:
题目:分析一组时间序列数据,并绘制相应的折线图。
答案:
(假设时间序列数据已经以某种方式给出,这里使用随机生成的数据作为示例)
# 生成时间序列数据
set.seed(123)
time_series <- ts(rnorm(12), start = c(2023, 1), frequency = 12)
# 绘制折线图
plot(time_series, main = "时间序列数据折线图", xlab = "时间", ylab = "值", type = "l", col = "darkblue")
习题20:
题目:使用R语言绘制柱状图,并设置不同的填充颜色。
答案:
# 假设部门名称向量departments和对应的员工人数向量staff_counts已经给出
departments <- c("部门A", "部门B", "部门C", "部门D")
staff_counts <- c(20, 30, 15, 25)
# 设置不同的填充颜色
colors <- c("lightblue", "lightgreen", "pink", "yellow")
# 绘制柱状图并设置填充颜色
barplot(staff_counts, names.arg = departments, main = "不同部门员工人数比较", xlab = "部门", ylab = "员工人数", col = colors)
(2)使用ggplot2包进行高级绘图
习题21:
题目:使用ggplot2包绘制散点图,分析两个变量之间的关系。
答案:
# 加载ggplot2包
library(ggplot2)
# 假设数据框df包含两个变量x和y
df <- data.frame(x = rnorm(100), y = rnorm(100))
# 绘制散点图
ggplot(df, aes(x = x, y = y)) + geom_point() + ggtitle("两个变量之间的关系") + xlab("变量x") + ylab("变量y")
习题22:
题目:绘制分组箱线图,比较不同组别的数据分布情况。
答案:
# 加载ggplot2包
library(ggplot2)
# 假设数据框df包含数值型变量value和分组变量group
df <- data.frame(
value = c(rnorm(20, mean = 0, sd = 1), rnorm(20, mean = 2, sd = 1.5), rnorm(20, mean = 1, sd = 1.2)),
group = rep(c("A", "B", "C"), each = 20)
)
# 绘制分组箱线图
ggplot(df, aes(x = group, y = value, fill = group)) +
geom_boxplot() +
ggtitle("不同组别的数据分布箱线图") +
xlab("组别") +
ylab("数值") +
theme_minimal() # 使用简洁的主题
习题23:
题目:使用ggplot2包绘制堆叠面积图,展示不同类别的占比情况。
答案:
# 加载ggplot2包
library(ggplot2)
# 假设数据框df包含分类变量category、数值型变量value和分组变量group
df <- data.frame(
category = rep(c("Cat1", "Cat2", "Cat3"), each = 10),
value = c(rnorm(10, mean = 20, sd = 5), rnorm(10, mean = 30, sd = 7), rnorm(10, mean = 15, sd = 4)),
group = rep(c("A", "B"), each = 15)
)
# 绘制堆叠面积图
ggplot(df, aes(x = category, y = value, fill = group)) +
geom_area(position = "stack") +
ggtitle("不同类别的占比堆叠面积图") +
xlab("类别") +
ylab("值") +
theme_minimal() +
scale_fill_manual(values = c("lightblue", "lightgreen")) # 自定义填充颜色
习题24:
题目:分析一组多元数据,并使用ggplot2包进行可视化展示。
答案:
(这里我们假设多元数据包含多个变量,并使用散点图矩阵来展示它们之间的关系)
# 加载ggplot2和GGally包
library(ggplot2)
library(GGally)
# 假设数据框df包含多个变量
df <- data.frame(
Var1 = rnorm(100),
Var2 = rnorm(100, mean = 2),
Var3 = rnorm(100, mean = -1, sd = 2),
Var4 = rnorm(100, mean = 3, sd = 1.5)
)
# 使用ggpairs函数绘制散点图矩阵
ggpairs(df)
习题25:
题目:定制ggplot2图的样式,包括颜色、线条类型等。
答案:
# 加载ggplot2包
library(ggplot2)
# 假设数据框df包含数值型变量x和y
df <- data.frame(x = rnorm(100), y = rnorm(100))
# 定制ggplot2图的样式
ggplot(df, aes(x = x, y = y)) +
geom_point(color = "purple", size = 3) + # 设置点的颜色和大小
geom_line(aes(group = 1), color = "darkblue", linetype = "dashed") + # 设置线条的颜色和类型
ggtitle("定制样式的散点图和折线图") +
xlab("X轴") +
ylab("Y轴") +
theme_minimal() + # 使用简洁的主题
theme(plot.title = element_text(size = 14, face = "bold")) # 定制标题样式
(3)图形的定制与优化(添加标题、图例、标签等)
习题26:
题目:为折线图添加标题和轴标签,使其更具可读性。
答案:
# 假设数据框df包含变量x和y
df <- data.frame(x = 1:10, y = rnorm(10))
# 绘制折线图并添加标题和轴标签
plot(df$x, df$y, type = "l", main = "折线图示例", xlab = "X轴标签", ylab = "Y轴标签", col = "blue", lwd = 2) # 设置线条颜色和宽度
习题27:
题目:在柱状图中添加图例,解释不同颜色或图案代表的含义。
答案:
# 假设数据框df包含变量group和value,以及用于区分组的变量color
df <- data.frame(
group = rep(c("A", "B", "C"), each = 10),
value = c(rnorm(10, mean = 2), rnorm(10, mean = 3), rnorm(10, mean = 1)),
color = rep(c("red", "green", "blue"), each = 10)
)
# 绘制柱状图并添加图例
barplot(value ~ group, data = df, col = df$color, main = "分组柱状图示例", xlab = "组别", ylab = "值", legend = rownames(df), beside = TRUE)
legend("topright", legend = c("A", "B", "C"), fill = c("red", "green", "blue"), bty = "n") # 添加图例
习题28:
题目:优化散点图的布局,使其更加美观和易于理解。
答案:
# 假设数据框df包含变量x和y
df <- data.frame(x = rnorm(50), y = rnorm(50))
# 优化散点图布局
plot(df$x, df$y, main = "优化后的散点图", xlab = "X轴", ylab = "Y轴", pch = 20, cex = 1.5, col = "darkblue") # 设置点的类型和大小,以及颜色
abline(h = mean(df$y), col = "red", lwd = 2) # 添加均值水平线
abline(v = mean(df$x), col = "green", lwd = 2) # 添加均值垂直线
box(col = "lightgrey") # 添加灰色边框
习题29:
题目:为箱线图添加数据点的标签,以便更好地识别特定值。
答案:
# 假设数据框df包含数值型变量value
df <- data.frame(value = c(rnorm(20, mean = 0, sd = 1), rnorm(20, mean = 2, sd = 1.5), rnorm(20, mean = 1, sd = 1.2)))
# 绘制箱线图并添加数据点标签
boxplot(value ~ seq_along(df$value), data = df, xlab = "数据点", ylab = "值", main = "带标签的箱线图")
text(x = seq_along(df$value), y = df$value, labels = round(df$value, 2), pos = 3) # 添加数据点标签,pos=3表示标签位于数据点右侧
习题30:
题目:综合使用上述技巧,绘制一个包含多个子图的复杂图表,并进行适当的定制和优化。
答案:
(这里我们假设我们想要绘制一个包含散点图和箱线图的复杂图表)
# 加载所需的包
library(ggplot2)
library(gridExtra)
# 假设数据框df1和df2分别包含散点图和箱线图所需的数据
df1 <- data.frame(x = rnorm(50), y = rnorm(50))
df2 <- data.frame(group = rep(c("A", "B", "C"), each = 20), value = c(rnorm(20, mean = 0), rnorm(20, mean = 2), rnorm(20, mean = 1)))
# 绘制散点图
p1 <- ggplot(df1, aes(x = x, y = y)) +
geom_point(color = "purple", size = 3) +
ggtitle("散点图") +
theme_minimal()
# 绘制箱线图
p2 <- ggplot(df2, aes(x = group, y = value, fill = group)) +
geom_boxplot() +
ggtitle("箱线图") +theme_minimal() +
theme(legend.position = "none") # 隐藏图例,因为组别已经在x轴上表示了
# 使用grid.arrange函数将两个图表组合在一起
do.call(grid.arrange, c(list(p1, p2), ncol = 2))