Data Visualization with ggplot2
【BB有话说】:仅仅是作为学习笔记使用的,如有错误请批评指正,会不定期持续更新
-
Creating a ggplot
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = …, size = …, alpha = …, shape = …)) -
ggplot(): creates a coordinate system that you can add layers to
Parameters:
First one: the dataset -
geom_point(): add a layer of points to your plot
Parameters:
mapping: how variables in your dataset are mapped to visual properties; 总是和aes一起使用,aes中的x和y表明哪些变量map to the x and y
aes中的其他变量:
color: 改变points的颜色,可以通过不同的类别改变
size: 改变大小
shape:改变形状;值得注意:ggplot2 will only use six shapes at a time
alpha: 改变点的透明度
也可以不用传递给mapping来设置color等: it goes outside of aes()
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = …)
【Graphing Template】:
ggplot(data = )+
<GEOM_FUNCTION>(mapping = aes() -
Facets
To facet your plot by a single variable use facet_wrap()
Parameters:
(1) Formula: begin with ~ and followed by a variable name
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = …, size = …, alpha = …, shape = …))
facet_wrap(~class, nrow = 2)
如果是plot the combination of two variablesfacet_grid(), 第一个参数还是formula,用~分隔开两个变量
facet_grid(var1~var2)
if you prefer to not facet in the rows or columns dimension, use . instead of 变量名facet_grid(.~var2) -
Geometric Objects
People often describe plots by the type of geom that the plot uses
每一个geom function 都有mapping arguments
【Tips】:
如果把mapping参数直接传递给ggplot()则其认为为全局变量
如果是传递给geom function则认为是局部变量for this layer. It will use these mappings to extend or overerite the gl
两次geom function有叠加的效果
ggplot(data = mpg,mapping = aes(x = displ, y = hwy, color = …, size = …, alpha = …, shape = …)) +
geom_point()
gemo_smooth() -
Statistical Transformation
Bar Chart
ggplot(data = diamonds)+
geom_bar(mapping = aes(x=cut)) / stat_count(mapping = aes(x=cut))/geom_bar(mapping = aes(x=cut, y=…prop…, group = 1) # 画占比instead of count
Bar charts, histograms, and frequency polygons bin your data and then plot bin counts, the number of points that fall in each bin
简而言之,这种图具有统计作用,以上提到的一些plots是来统计各分类的频数之类的
ggplot(data = diamonds)+
stat_count(mapping = aes(x=cut)) # 和geom_bar()效果一样对于其默认的count统计方法 -
Position Adjustments
ggplot(data = diamonds) +
geom_bar(mapping = aes(x=cut, color = cut)) / geom_bar(mapping = aes(x = cut, fill = cut)
如果fill参数后面接除了cut之外的变量,自动变成stacked【未完结1/20/2022】