[注:本文来自wikipedia[Box plot]
From Wikipedia, the free encyclopedia
Figure 1. Box plot of data from the Michelson–Morley
experiment
In descriptive statistics, a box
plot or boxplot (also known as a box-and-whisker
diagram or plot) is a convenient way of graphically
depicting groups of numerical data through their five-number summaries: the smallest
observation (sample minimum),
lower quartile (Q1), median
(Q2), upper quartile (Q3), and largest observation (sample maximum). A boxplot
may also indicate which observations, if any, might be considered
outliers.
Boxplots display differences between populations without making any
assumptions of the underlying statistical distribution: they are
non-parametric. The
spacings between the different parts of the box help indicate the
degree of dispersion (spread) and skewness in the data, and identify outliers. Boxplots can be drawn either horizontally
or vertically.
Alternative
forms
Figure 2. Boxplot with whiskers from minimum to maximum
Figure 3. Same Boxplot with whiskers with maximum 1.5 IQR
Box and whisker plots are uniform in their use of the box: the
bottom and top of the box are always the 25th and 75th percentile (the lower and upper quartiles, respectively), and the band near the
middle of the box is always the 50th percentile (the median).
But the ends of the whiskers can represent several possible
alternative values, among them:
the minimum and maximum of all the data
the lowest datum still within 1.5 IQR of the lower quartile, and the
highest datum still within 1.5 IQR of the upper quartile
one standard deviation above and below the mean of the
data
the 2nd percentile and the 98th percentile.
Any data not included between the whiskers should be plotted as
an outlier with a dot, small circle, or star, but occasionally this
is not done.
Some box plots include an additional character to represent the
mean of the data.
On some box plots a crosshatch is placed on each whisker, before
the end of the whisker.
Rarely, box plots can be presented with no whiskers at all.
Because of this variability, it is appropriate to describe the
convention being used for the whiskers and outliers in the caption
for the plot.
The unusual percentiles 2%, 9%, 91%, 98% are sometimes used for
whisker cross-hatches and whisker ends to show the seven-number summary. If the data are
normally distributed, the locations of
the seven marks on the box plot will be equally spaced.
Variations
Figure 4. Four box plots, with and without notches and variable
width
Several variations on the traditional box plot have been
described. Two of the most common are variable width box plots and
notched box plots (see figure 4).
Variable width box plots illustrate the size of each group whose
data is being plotted by making the width of the box proportional
to the size of the group. A popular convention is to make the box
width proportional to the square root of the size of the
group.
Notched box plots apply a "notch" or narrowing of the box around
the median. Notches are useful in offering a rough guide to
significance of difference of medians; if the notches of two boxes
do not overlap, this offers evidence of a statistically significant
difference between the medians.
.
Visualization
Figure 5. Boxplot and a probability density function
(pdf) of a Normal N(0,1σ2) Population
The boxplot is a quick way of examining one or more sets of data
graphically. Boxplots may seem more primitive than a histogram or kernel density estimate but they do
have some advantages. They take up less space and are therefore
particularly useful for comparing distributions between several
groups or sets of data (see Figure 1 for an example). Choice of
number and width of bins techniques can
heavily influence the appearance of a histogram, and choice of
bandwidth can heavily influence the appearance of a kernel density
estimate.
As looking at a statistical distribution is more intuitive than
looking at a boxplot, comparing the boxplot against the probability
density function (theoretical histogram) for a normal
N(0,1σ2) distribution may be a useful tool for
understanding the boxplot (Figure 5).
See also
References
Robert McGill,
John W. Tukey, Wayne A.
Larsen (February 1978). "Variations of Box Plots". 32 (1): 12–16. doi:10.2307/2683468. JSTOR
Michael Frigge,
David C. Hoaglin, Boris Iglewicz (February 1989). "Some
Implementations of the Boxplot". 43 (1): 50–54. doi:10.2307/2685173. JSTOR
"R: Box Plot Statistics". R manual. Retrieved 26 June
2011.
John
W. Tukey (1977). Exploratory Data Analysis. Addison-Wesley.
Benjamini, Y. (1988). "Opening
the Box of a Boxplot". The American Statistician 42
(4): 257–262. doi:10.2307/2685133. JSTOR
Rousseeuw, P. J.; Ruts, I.; Tukey,
J. W. (1999). "The Bagplot: A Bivariate Boxplot". The
American Statistician 53 (4): 382–387. doi:10.2307/2686061. JSTOR
External
links
Wikimedia Commons has
media related to: