python boxplot significance_Boxplot的概念

[注:本文来自wikipedia[Box plot]

From Wikipedia, the free encyclopedia

Figure 1. Box plot of data from the Michelson–Morley

experiment

In descriptive statistics, a box

plot or boxplot (also known as a box-and-whisker

diagram or plot) is a convenient way of graphically

depicting groups of numerical data through their five-number summaries: the smallest

observation (sample minimum),

lower quartile (Q1), median

(Q2), upper quartile (Q3), and largest observation (sample maximum). A boxplot

may also indicate which observations, if any, might be considered

outliers.

Boxplots display differences between populations without making any

assumptions of the underlying statistical distribution: they are

non-parametric. The

spacings between the different parts of the box help indicate the

degree of dispersion (spread) and skewness in the data, and identify outliers. Boxplots can be drawn either horizontally

or vertically.

Alternative

forms

Figure 2. Boxplot with whiskers from minimum to maximum

Figure 3. Same Boxplot with whiskers with maximum 1.5 IQR

Box and whisker plots are uniform in their use of the box: the

bottom and top of the box are always the 25th and 75th percentile (the lower and upper quartiles, respectively), and the band near the

middle of the box is always the 50th percentile (the median).

But the ends of the whiskers can represent several possible

alternative values, among them:

the minimum and maximum of all the data

the lowest datum still within 1.5 IQR of the lower quartile, and the

highest datum still within 1.5 IQR of the upper quartile

one standard deviation above and below the mean of the

data

the 2nd percentile and the 98th percentile.

Any data not included between the whiskers should be plotted as

an outlier with a dot, small circle, or star, but occasionally this

is not done.

Some box plots include an additional character to represent the

mean of the data.

On some box plots a crosshatch is placed on each whisker, before

the end of the whisker.

Rarely, box plots can be presented with no whiskers at all.

Because of this variability, it is appropriate to describe the

convention being used for the whiskers and outliers in the caption

for the plot.

The unusual percentiles 2%, 9%, 91%, 98% are sometimes used for

whisker cross-hatches and whisker ends to show the seven-number summary. If the data are

normally distributed, the locations of

the seven marks on the box plot will be equally spaced.

Variations

Figure 4. Four box plots, with and without notches and variable

width

Several variations on the traditional box plot have been

described. Two of the most common are variable width box plots and

notched box plots (see figure 4).

Variable width box plots illustrate the size of each group whose

data is being plotted by making the width of the box proportional

to the size of the group. A popular convention is to make the box

width proportional to the square root of the size of the

group.

Notched box plots apply a "notch" or narrowing of the box around

the median. Notches are useful in offering a rough guide to

significance of difference of medians; if the notches of two boxes

do not overlap, this offers evidence of a statistically significant

difference between the medians.

.

Visualization

Figure 5. Boxplot and a probability density function

(pdf) of a Normal N(0,1σ2) Population

The boxplot is a quick way of examining one or more sets of data

graphically. Boxplots may seem more primitive than a histogram or kernel density estimate but they do

have some advantages. They take up less space and are therefore

particularly useful for comparing distributions between several

groups or sets of data (see Figure 1 for an example). Choice of

number and width of bins techniques can

heavily influence the appearance of a histogram, and choice of

bandwidth can heavily influence the appearance of a kernel density

estimate.

As looking at a statistical distribution is more intuitive than

looking at a boxplot, comparing the boxplot against the probability

density function (theoretical histogram) for a normal

N(0,1σ2) distribution may be a useful tool for

understanding the boxplot (Figure 5).

See also

References

Robert McGill,

John W. Tukey, Wayne A.

Larsen (February 1978). "Variations of Box Plots". 32 (1): 12–16. doi:10.2307/2683468. JSTOR

Michael Frigge,

David C. Hoaglin, Boris Iglewicz (February 1989). "Some

Implementations of the Boxplot". 43 (1): 50–54. doi:10.2307/2685173. JSTOR

"R: Box Plot Statistics". R manual. Retrieved 26 June

2011.

John

W. Tukey (1977). Exploratory Data Analysis. Addison-Wesley.

Benjamini, Y. (1988). "Opening

the Box of a Boxplot". The American Statistician 42

(4): 257–262. doi:10.2307/2685133. JSTOR

Rousseeuw, P. J.; Ruts, I.; Tukey,

J. W. (1999). "The Bagplot: A Bivariate Boxplot". The

American Statistician 53 (4): 382–387. doi:10.2307/2686061. JSTOR

External

links

Wikimedia Commons has

media related to:

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值