ggplot2读书笔记6：第四章语法基础理论

最新推荐文章于 2021-03-03 17:42:16 发布

qy_bioinformatics

最新推荐文章于 2021-03-03 17:42:16 发布

阅读量583

点赞数

分类专栏： R ggplot2

本文链接：https://blog.csdn.net/qy_microbiota/article/details/79486542

版权

R 同时被 2 个专栏收录

19 篇文章

订阅专栏

ggplot2

13 篇文章

订阅专栏

碎碎念ing：终于结束了《ggplot2》的第一部分“Getting Started”，今天开始看第二部分——语法，第四章（Mastering the Grammar）介绍了ggplot2的一些基础语法知识，大概是对前期内容在理论上做一个总结。

建立一个散点图

首先还以“耗油量”数据集（mpg）为例。

library(ggplot2)
mpg
# A tibble: 234 x 11
   manufacturer model      displ  year   cyl trans      drv     cty   hwy fl    class
   <chr>        <chr>      <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr>
 1 audi         a4          1.80  1999     4 auto(l5)   f        18    29 p     comp…
 2 audi         a4          1.80  1999     4 manual(m5) f        21    29 p     comp…
 3 audi         a4          2.00  2008     4 manual(m6) f        20    31 p     comp…
 4 audi         a4          2.00  2008     4 auto(av)   f        21    30 p     comp…
 5 audi         a4          2.80  1999     6 auto(l5)   f        16    26 p     comp…
 6 audi         a4          2.80  1999     6 manual(m5) f        18    26 p     comp…
 7 audi         a4          3.10  2008     6 auto(av)   f        18    27 p     comp…
 8 audi         a4 quattro  1.80  1999     4 manual(m5) 4        18    26 p     comp…
 9 audi         a4 quattro  1.80  1999     4 auto(l5)   4        16    25 p     comp…
10 audi         a4 quattro  2.00  2008     4 manual(m6) 4        20    28 p     comp…
# ... with 224 more rows

我们想建立一个展示发动机排量（displ）和高速公路行驶记录每加仑行驶的英里数（hwy）关系的散点图，并用不同颜色标记汽缸变量（cyl）。代码如下：

ggplot(mpg, aes(displ, hwy, colour = factor(cyl))) +
  geom_point()

You can create plots like this easily, but what is going on underneath the
surface? How does ggplot2 draw this plot?

1. 数据的美学映射（Mapping Aesthetics to Data）

散点图到底是啥？你以前见过很多，有可能也用手绘制过。散点图将每个观测值表示为一个点，根据两个变量的值进行定位。除了水平和垂直位置，每个点还具有大小，颜色和形状。这些属性被称为美学（aesthetics），是能在图形上感知的属性。每个美学可以映射到一个变量，或设置为一个常量值。在上图中，displ 映射到横轴上，hwy映射到纵轴上，cyl设置为彩色分类。大小和形状没有映射到变量，但保持其（恒定）默认值。

映射设定好之后，我们可以通过改变函数，画出除了散点图之外的折线图geom_line()、柱形图geom_bar()等：

## 折线图
ggplot(mpg, aes(displ, hwy, colour = factor(cyl))) +
geom_line() +
theme(legend.position = "none")

## 柱形图
ggplot(mpg, aes(displ, hwy, colour = factor(cyl))) +
geom_bar(stat = "identity", position = "identity", fill = NA) +
theme(legend.position = "none")

这些图虽然语法上是对的，但毫无意义。

点、线和条形都是不同的几何图形，他们组合起来可以有很多种plot：

Named plot	Geom	Others
Scatterplot	Point
Bubblechart	Point	Size mapped to a variable
Barchart	Bar
Box-and-whisker plot	Boxplot
Line chart	Line

有些图是根据实际需要添加的几何图形，他们没有确定的名称，如下：

ggplot(mpg, aes(displ, hwy, colour = factor(cyl))) +
  geom_point() +
  geom_smooth(method = "lm")

2. 标度（Scaling）

实际上，数据集中的数值对计算机没有任何意义。我们需要将它们从数据单位（例如，升，英里每加仑，气缸数量等）转换成计算机可以显示的图形单位（例如，坐标和颜色）。这种转换过程称为Scaling（scales）。

要生成一个完整的图表，需要三个主要元素：

data：数据
geom：代表数据的几何图形
scales and coordinate system：坐标系和标度

复杂化（Adding Complexity)

在普通的散点图中加线条和分面：

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_smooth() +
  facet_wrap(~year)

以上代码中增加了三个部分

分面（facets）
多个图层（multiple layers）
统计变换（statistics）

平滑图层geom_smooth()与点图层geom_point()不同，因为它不显示原始数据，而是显示数据的统计转换。所以，这个过程的形成是在将数据映射到美学之后，传递到统计转换（stat）加以处理。

图层语法的组成（Components of the Layered Grammar）

图表包括了数据、映射、统计变换、几何图形以及位置调整（position adjustment）。

图层 layers

图层由以下五个方面组成（具体见第五章）：

Data
Aesthetic mappings.
A statistical transformation (stat).
A geometric object (geom).
A position adjustment.

标度 Scales

标度控制的是从数据到图形属性的映射。下面是几种标度的例子：

（从左到右依次是连续型变量、离散型变量的形状和颜色标度）

（具体见第六章）

坐标系 Coordinate System

坐标系（简称coord）是指数据映射所在的图表平面，位置通常由两个坐标(x, y)决定。

坐标系的不同类型如下图：

（从左到右依次是笛卡尔坐标系、半对数坐标系、极坐标系）

分面 Facetting

（详见第七章）

= = = = = = = = 我是懵懵的分割线 = = = = = = = =

碎碎念plus：这一章看下来感觉干货不多，貌似更像是本书中间的一个过渡章节（也可能是我没有get到，不是很懂作者的用意）。囧。最后还是要放上参考资料，镇楼。

参考资料：

Hadley Wickham(2016). ggplot2. Springer International Publishing. doi: 10.1007/978-3-319-24277-4
《R语言应用系列丛书·ggplot2:数据分析与图形艺术》