此博客作为自己的学习笔记,同时与大家交流分享!
之所以选择学习ggplot2画图包,最初的原因只是由自己的强迫症引起的,想让图好看高大上,跟别人的不一样,我相信你也一样吧哈哈哈哈!我觉得不管目的是啥,学懂它之后自然就知道它的作用和伟大之处,坚持!
Introduction
一张统计图形就是从数据到几何对象(geometric object,geom,包括点、线、条形)的图形属性(aesthetic attributes,aes)
基本要素
- 数据(Data)和映射(Mapping)
- 几何对象(geom)
- 统计变换(Statistical Transformation,satas)
- 标度(Scale)
- 坐标系(Coordinate System,coord)
- 分面(Facet)
- 图层(Layer)
- 主题(Theme)
注:ggplot2只能创建静态图,创建动态图请参照rggobi包!
一、由易到难:从qplot()开始
数据集:diamonds(关于钻石的信息)
1.基本用法
summary(diamonds) #先简单了解一下数据集
set.seed(1410) #让样本可重复
dsmall <- diamonds[sample(nrow(diamonds), 100), ] #随机选出100条观测
qplot(carat, price, data = diamonds)
qplot(log(carat), log(price), data = diamonds) #qplot()支持将变量的函数作为参数
qplot(carat, x*y*z, data = diamonds) #质量与近似体积之间关系
2.颜色、大小、形状和其他属性
qplot(carat, price, data = dsmall, colour = color) #颜色
qplot(carat, price, data = dsmall, shape = cut) #形状
#透明度(减轻图形重叠现象alpha()函数)
qplot(carat, price, data = diamonds, alpha = I(1/10)) #I()函数用来手动设定图形属性
qplot(carat, price, data = diamonds, alpha = I(1/100))
qplot(carat, price, data = diamonds, alpha = I(1/1000))
注:具体图像请自行演示,不具体给出!
(一)散点图
#散点图添加平滑曲线
qplot(carat, price, data = dsmall, geom = c("point", "smooth")) #点和线两个几何对象
qplot(carat, price, data = diamonds, geom = c("point", "smooth"))
#参数se=FALSE取消置信带
qplot(carat, price, data = dsmall, geom = c("point", "smooth"), se = F)
#参数method:选择不同的平滑器
#当n<=1000时,默认method=”loess”,当n>1000时,默认formula=y~s(x, bs=”cs”)
qplot(carat, price, data = dsmall, geom = c("point", "smooth"), span = 0.2)
qplot(carat, price, data = dsmall, geom = c("point", "smooth"), span = 1) #参数span控制平滑程度
#调用mgcv包你和广义可加模型
library(mgcv)
qplot(carat, price, data = dsmall, geom = c("point", "smooth"), method = "gam", formula = y ~ s(x))
qplot(carat, price, data = diamonds, geom = c("point", "smooth"), method = "gam", formula = y ~ s(x, bs = "cs"))
#splines包可使用自然样条拟合
library(splines)
qplot(carat, price, data = dsmall, geom = c("point", "smooth"), method = "lm") #lm拟合线性模型
#method=”rlm”与lm类似,但采用了更为稳健的算法,结果对异常值不太敏感,包含于MASS包
library(MASS)
qplot(carat, price, data = dsmall, geom = c("point", "smooth"), method = "rlm")
qplot(carat, price, data = dsmall, geom = c("point", "smooth"), method = "lm", formula = y ~ ns(x, 5)) #第二个参数为自由度,表示所拟合曲线的次数
(二)箱线图和扰动点图
绘制每克拉钻石的价格随着颜色变化而变化的趋势图
###箱线图和扰动点图
每克拉钻石的价格随着颜色变化而变化的趋势
qplot(color, price/carat, data = diamonds, geom = "jitter") #扰动点图
qplot(color, price/carat, data = diamonds, geom = "boxplot") #箱线图
#解决重叠问题alpha()
qplot(color, price/carat, data = diamonds, geom = "jitter", alpha = I(1/5))
qplot(color, price/carat, data = diamonds, geom = "jitter", alpha = I(1/50))
qplot(color, price/carat, data = diamonds, geom = "jitter", alpha = I(1/200))
#一些参数
qplot(color, price/carat, data = diamonds, geom = "boxplot", colour = color)
qplot(color, price/carat, data = diamonds, geom = "boxplot", shape = cut)
qplot(color, price/carat, data = diamonds, geom = "boxplot", colour = color, size = I(1.5)) #I()函数封装一下
qplot(color, price/carat, data = diamonds, geom = "boxplot", shape = cut, colour = color)
(三)直方图和密度曲线图
#基础操作
qplot(carat, data = diamonds, geom = "histogram") #直方图
qplot(carat, data = diamonds, geom = "density") #密度曲线图
#相对于密度曲线图,参数adjust控制曲线的平滑程度,取值越大越平滑
qplot(carat, data = diamonds, geom = "density", adjust = 0.5)
qplot(carat, data = diamonds, geom = "density", adjust = 1)
qplot(carat, data = diamonds, geom = "density", adjust = 2)
#相对于直方图,参数binwidth控制曲线的平滑程度
qplot(carat, data = diamonds, geom = "histogram", binwidth = 1, xlim = c(0,3))
qplot(carat, data = diamonds, geom = "histogram", binwidth = 0.1, xlim = c(0,3))
qplot(carat, data = diamonds, geom = "histogram", binwidth = 0.01, xlim = c(0,3))
#要在不同组之间对分布进行对比,只需要再加上一个图形映射
qplot(carat, data = diamonds, geom = "histogram", fill = color) #直方图
qplot(carat, data = diamonds, geom = "density", colour = color ) #密度曲线图
(四)条形图
#基础操作
qplot(color, data = diamonds, geom = "bar")
#如果数据已经进行了汇总,或者你想用其他的方式对数据进行分组处理(例如对连续变量进行分组求和),可使用weight几何对象
qplot(color, data = diamonds, geom = "bar", weight = carat) + scale_y_continuous("carat") # weight = carat进行加权
(五)时间序列中的线条图和路径图
#数据集:economics(美国过去40年的经济数据)
economics
summary(economics)
qplot(date, unemploy / pop, data = economics, geom = "line") #线条图(失业率的变化)
qplot(date, uempmed, data = economics, geom = "line") #线条图(失业星期数的中位数)
#失业率和失业时间长度随时间变化的路径
qplot(unemploy / pop, uempmed, data = economics, geom = c("point", "path")) #路径图
qplot(unemploy / pop, uempmed, data = economics, geom = "path", colour = date)
(六)分面
#利用图形属性(颜色和形状)在同一张图中可以比较不同分组,而分面是另外一种方法
qplot(carat, data = diamonds, geom = "histogram", facets = color~., binwidth = 0.1, xlim = c(0,3)) #y轴表示频数
qplot(carat, data = diamonds, ..density.., geom = "histogram", facets = color~., binwidth = 0.1, xlim = c(0,3)) #y轴表示密度
(七)其它选项
#xlab,ylab设置坐标轴标签文字
#xlim,ylim设置坐标轴显示区间
#main设置图形主标题
qplot(carat, price, data = dsmall, xlab = "Price($)", ylab = "Weight(carats)", main = "Price-weight relationship")
qplot(carat, price/carat, data = dsmall, xlab = "Weight(carats)", ylab = expression(frac(price, carat)), main = "Small diamonds", xlim = c(.2, 1), ylim = c(0, 12000))
qplot(carat, price, data = dsmall, log = "xy", colour = I("red")) #之前有过介绍