问题1:单个向量的 density 分布图?
(1) 模拟数据
set.seed(202402)
dat=diamonds[sample(nrow(diamonds), 1000),]
> head(dat)
# A tibble: 6 × 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.35 Ideal I VS2 59.8 57 630 4.6 4.59 2.75
2 0.59 Ideal D SI1 61.8 56 1816 5.37 5.4 3.33
3 0.39 Ideal D VS1 62 57 1095 4.67 4.71 2.91
4 1.12 Premium G IF 60.9 57 9126 6.79 6.68 4.1
5 0.51 Very Good E VVS2 62.1 55 2056 5.14 5.16 3.2
6 1.03 Premium G VVS2 60.4 59 7729 6.58 6.56 3.97
(2) 绘制price的分布曲线
效果图:
ggplot(data = dat, mapping = aes(x=price)) +
geom_histogram(aes(y=..density..),
binwidth = 400,
#bins=30,
fill="bisque",color="white",alpha=0.7) +
geom_density() +
geom_rug() +
labs(x='Price') +
theme_minimal(base_size = 12)
问题2:按照A的值分bin统计B的平均值,并画图
求不同克拉区间的钻石,其价格的分布及平均值。
- 对变量 carat 分区间
- 统计每个 carat 区间的 price 价格的平均值
- 画图
(1) 使用cut函数对carat列划分到10个区间中,求每个区间price均值
效果图:
重要函数:cut(x, breaks=n) 对连续向量x分为n个bin,返回每个x值对应的bin范围,因子型。
> # 按照A的值分bin统计B的平均值,并画图 ----
> dat$tags = cut(dat$carat, breaks = 10) #分10个区间
> head(dat)
# A tibble: 6 × 11
carat cut color clarity depth table price x y z tags
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <fct>
1 0.35 Ideal I VS2 59.8 57 630 4.6 4.59 2.75 (0.217,0.499]
2 0.59 Ideal D SI1 61.8 56 1816 5.37 5.4 3.33 (0.499,0.778]
3 0.39 Ideal D VS1 62 57 1095 4.67 4.71 2.91 (0.217,0.499]
4 1.12 Premium G IF 60.9 57 9126 6.79 6.68 4.1 (1.06,1.34]
5 0.51 Very Good E VVS2 62.1 55 2056 5.14 5.16 3.2 (0.499,0.778]
6 1.03 Premium G VVS2 60.4 59 7729 6.58 6.56 3.97 (0.778,1.06]
求每个carat区间中price的平均值
> results=sapply(split(dat$price, dat$tags), function(x){
+ mean(x)
+ })
> levels(dat$tags)
[1] "(0.217,0.499]" "(0.499,0.778]" "(0.778,1.06]" "(1.06,1.34]" "(1.34,1.61]" "(1.61,1.89]" "(1.89,2.17]"
[8] "(2.17,2.45]" "(2.45,2.73]" "(2.73,3.01]"
> head(results)
(0.217,0.499] (0.499,0.778] (0.778,1.06] (1.06,1.34] (1.34,1.61] (1.61,1.89]
793.9373 2105.2756 4961.7216 6500.9048 9756.3146 10490.7308
> #results[levels(dat$tags)]
> plot( as.numeric( results), type="o", pch=19,
+ xlab="bin index", ylab="Price", mgp=c(2,1,0))
>
>
(2) 绘制每个carat bin的price的小提琴图
效果图:
代码:
数据同上(1):
> summary(dat)
carat cut color clarity depth table price x
Min. :0.2200 Fair : 35 D:129 SI1 :227 Min. :55.90 Min. :52.00 Min. : 345 Min. :3.900
1st Qu.:0.4000 Good : 89 E:178 VS2 :220 1st Qu.:61.10 1st Qu.:56.00 1st Qu.: 987 1st Qu.:4.720
Median :0.7000 Very Good:206 F:181 SI2 :173 Median :61.90 Median :57.00 Median : 2362 Median :5.680
Mean :0.8033 Premium :254 G:203 VS1 :156 Mean :61.82 Mean :57.42 Mean : 3939 Mean :5.745
3rd Qu.:1.0600 Ideal :416 H:136 VVS2 :104 3rd Qu.:62.60 3rd Qu.:59.00 3rd Qu.: 5443 3rd Qu.:6.560
Max. :3.0100 I:119 VVS1 : 62 Max. :79.00 Max. :73.00 Max. :18432 Max. :9.540
J: 54 (Other): 58
y z tags
Min. :3.880 Min. :2.310 (0.217,0.499]:319
1st Qu.:4.740 1st Qu.:2.920 (0.499,0.778]:254
Median :5.700 Median :3.520 (0.778,1.06] :176
Mean :5.747 Mean :3.552 (1.06,1.34] :105
3rd Qu.:6.560 3rd Qu.:4.050 (1.34,1.61] : 89
Max. :9.380 Max. :5.900 (1.61,1.89] : 26
(Other) : 31
绘图代码:
library(ggplot2)
ggplot(dat, aes(x=tags, y=price, fill=tags))+
geom_violin(scale="width", color="#00112200")+
geom_boxplot(width=0.1, fill="white", outlier.size = 0.1)+
geom_jitter(color="blue", alpha=0.2, size=0.5, shape=19)+
theme_classic(base_size = 14)+
#coord_flip()+
theme(
axis.text.x=element_text(angle=30, hjust=1),
legend.position = "none",
)+
labs(x="carat", y="price")+
#ylim(0.5, 0.85)+
scale_fill_manual(values= c(scales::hue_pal()(12)) )