2.1 the lattice绘图系统
The Lattice Plotting System
The lattice plotting system is implemented using the following packages:
-
lattice: contains code for producing Trellis graphics, which are independent of the “base” graphics system; includes functions like
xyplot
,bwplot
,levelplot
-
grid: implements a different graphing system independent of the “base” system; the lattice package builds on top of grid
- We seldom call functions from the grid package directly
-
The lattice plotting system does not have a "two-phase" aspect with separate plotting and annotation like in base plotting
-
All plotting/annotation is done at once with a single function call
Lattice Functions
xyplot
: this is the main function for creating scatterplotsbwplot
: box-and-whiskers plots (“boxplots”)histogram
: histogramsstripplot
: like a boxplot but with actual pointsdotplot
: plot dots on "violin strings"splom
: scatterplot matrix; likepairs
in base plotting systemlevelplot
,contourplot
: for plotting "image" data
Lattice Functions
Lattice functions generally take a formula for their first argument, usually of the form
-
We use the formula notation here, hence the
~
. -
On the left of the ~ is the y-axis variable, on the right is the x-axis variable
-
f and g are conditioning variables — they are optional
- the * indicates an interaction between two variables
-
The second argument is the data frame or list from which the variables in the formula should be looked up
- If no data frame or list is passed, then the parent frame is used.
-
If no other arguments are passed, there are defaults that can be used.
Lattice Behavior
Lattice functions behave differently from base graphics functions in one critical way.
-
Base graphics functions plot data directly to the graphics device (screen, PDF file, etc.)
-
Lattice graphics functions return an object of class trellis
-
The print methods for lattice functions actually do the work of plotting the data on the graphics device.
-
Lattice functions return "plot objects" that can, in principle, be stored (but it’s usually better to just save the code + data).
-
On the command line, trellis objects are auto-printed so that it appears the function is plotting the data
Lattice Panel Functions
-
Lattice functions have a panel function which controls what happens inside each panel of the plot.
-
The lattice package comes with default panel functions, but you can supply your own if you want to customize what happens in each panel
-
Panel functions receive the x/y coordinates of the data points in their panel (along with any optional arguments)
其下若无panel.xyplot(x,y,...)则不会显示原函数
Many Panel Lattice Plot: Example from MAACS
-
Study: Mouse Allergen and Asthma Cohort Study (MAACS)
-
Study subjects: Children with asthma living in Baltimore City, many allergic to mouse allergen
-
Design: Observational study, baseline home visit + every 3 months for a year.
-
Question: How does indoor airborne mouse allergen vary over time and across subjects?
Ahluwalia et al., Journal of Allergy and Clinical Immunology, 2013
Summary
-
Lattice plots are constructed with a single function call to a core lattice function (e.g.
xyplot
) -
Aspects like margins and spacing are automatically handled and defaults are usually sufficient
-
The lattice system is ideal for creating conditioning plots where you examine the same kind of plot under many different conditions
-
Panel functions can be specified/customized to modify what is plotted in each of the plot panels
2.2 ggplot2
What is ggplot2?
- An implementation of The Grammar of Graphics by Leland Wilkinson
- Written by Hadley Wickham (while he was a graduate student at Iowa State)
- A “third” graphics system for R (along with base and lattice)
- Available from CRAN via
install.packages()
- Web site: http://ggplot2.org (better documentation)
What is ggplot2?
- Grammar of graphics represents an abstraction of graphics ideas/objects
- Think “verb”, “noun”, “adjective” for graphics
- Allows for a “theory” of graphics on which to build new graphics and graphics objects
- “Shorten the distance from mind to page”
Grammer of Graphics
“In brief, the grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). The plot may also contain statistical transformations of the data and is drawn on a specific coordinate system”
- from ggplot2 book
The Basics: qplot()
- Works much like the
plot
function in base graphics system - Looks for data in a data frame, similar to lattice, or in the parent environment
- Plots are made up of aesthetics 美学(size, shape, color) and geoms(几何) (points, lines)
The Basics: qplot()
- Factors are important for indicating subsets of the data (if they are to have different properties); they should be labeled
- The
qplot()
hides what goes on underneath, which is okay for most operations ggplot()
is the core function and very flexible for doing thingsqplot()
cannot do
具体应用那节,网上好像有问题,没有,所以去
What is ggplot2?
- An implementation of the Grammar of Graphics by Leland Wilkinson
- Grammar of graphics represents and abstraction of graphics ideas/objects
- Think “verb”, “noun”, “adjective” for graphics
- Allows for a “theory” of graphics on which to build new graphics and graphics objects
Basic Components of a ggplot2 Plot
- A data frame
- aesthetic mappings: how data are mapped to color, size
- geoms: geometric objects like points, lines, shapes.
- facets: for conditional plots.
- stats: statistical transformations like binning, quantiles, smoothing.
- scales: what scale an aesthetic map uses (example: male = red, female = blue).
- coordinate system
Building Plots with ggplot2
- When building plots in ggplot2 (rather than using qplot) the “artist’s palette” model may be the closest analogy
- Plots are built up in layers
- Plot the data
- Overlay a summary
- Metadata and annotation
Example: BMI, PM2.5, Asthma
- Mouse Allergen and Asthma Cohort Study
- Baltimore children (age 5-17)
- Persistent asthma, exacerbation in past year
- Does BMI (normal vs. overweight) modify the relationship between PM2.5 and asthma symptoms?
Building Up in Layers
logpm25 bmicat NocturnalSympt logno2_new
1 1.5362 normal weight 1 1.299
2 1.5905 normal weight 0 1.295
3 1.5218 normal weight 0 1.304
4 1.4323 normal weight 0 NA
5 1.2762 overweight 8 1.108
6 0.7139 overweight 0 0.837
data: logpm25, bmicat, NocturnalSympt, logno2_new [554x4]
mapping: x = logpm25, y = NocturnalSympt
faceting: facet_null()
No Plot Yet!
Error: No layers in plot
这样不会有用
但这样会有
g <- ggplot(maacs, aes(logpm25, NocturnalSympt)) g + geom_point()
Annotation
- Labels:
xlab()
,ylab()
,labs()
,ggtitle()
- Each of the “geom” functions has options to modify
- For things that only make sense globally, use
theme()
- Example:
theme(legend.position = "none")
- Example:
- Two standard appearance themes are included
theme_gray()
: The default theme (gray background)theme_bw()
: More stark/plain
alpha就是透明度
此例中因为有y=100,所以才显得第二个图那么变态,第一个图好像少了个头似的,这里只是搞清楚两者的区别即可,但实际应用中有时不需知道全景,只需掌握核心即可
从这图可知道,设置了ylim,会自动去掉outliner,而在后者则不会去掉这个东东。
More Complex Example
- How does the relationship between PM$_{2.5}$ and nocturnal symptoms vary by BMI and NO$_2$?
- Unlike our previous BMI variable, NO2 is continuous
- We need to make NO2 categorical so we can condition on it in the plotting
- Use the
cut()
function for this
Making NO$_2$ Tertiles
[1] "(0.378,1.2]" "(1.2,1.42]" "(1.42,2.55]"
Summary
- ggplot2 is very powerful and flexible if you learn the “grammar” and the various elements that can be tuned/modified
- Many more types of plots can be made; explore and mess around with the package (references mentioned in Part 1 are useful)