Data: Faithful -- Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.
> head(faithful)
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
> summary(faithful)
eruptions waiting
Min. :1.600 Min. :43.0
1st Qu.:2.163 1st Qu.:58.0
Median :4.000 Median :76.0
Mean :3.488 Mean :70.9
3rd Qu.:4.454 3rd Qu.:82.0
Max. :5.100 Max. :96.0
Mean, median, quantile
> mean(faithful$eruptions)
[1] 3.487783
> median(faithful$eruptions)
[1] 4
> quantile(faithful$eruptions)
0% 25% 50% 75% 100%
1.60000 2.16275 4.00000 4.45425 5.10000
Sample variance is defined as
> var(faithful$eruptions)
[1] 1.302728
The standard deviation of an observation variable is the square root of its variance.
> sd(faithful$eruptions)
[1] 1.141371
The covariance of two variances x and y in a data sample how the two are linear related. A positive covariance would indicates a positive linear relationship between the variables, and a negative covariance would indicate the opposite.
The sample covariance is defined in terms of the sample means as:
> cov(faithful$eruptions, faithful$waiting)
[1] 13.97781
The correlation coefficient of two variables in a data sample is their covariance divided by the product of their individual standard deviations. It is a normalized measurement of how the two are linearly related.
The sample correlation coefficient is defined by the following formula, where sxand sy are the sample standard deviations, and sxy is the sample covariance.
cor(faithful$eruptions, faithful$waiting)
[1] 0.9008112
The k th central moment (or moment about the mean ) of a data sample is:
In particular, the second central moment of a population is its variance.
> library("moments", lib.loc="~/R/win-library/3.2")
> moment(faithful$eruptions, order = 3, central = TRUE)
[1] -0.6149059
The skewness of a data population is defined by the following formula, where μ2 and μ3 are the second and third central moments.
Intuitively, the skewness is a measure of symmetry. As a rule, negative skewness indicates that the mean of the data values is less than the median, and the data distribution is left-skewed. Positive skewness would indicates that the mean of the data values is larger than the median, and the data distribution is right-skewed.
> skewness(faithful$eruptions)
[1] -0.415841
Intuitively, the kurtosis is a measure of the peakedness of the data distribution. Negative kurtosis would indicates a flat data distribution, which is said to be platykurtic. Positive kurtosis would indicates a peaked distribution, which is said to be leptokurtic. Incidentally, the normal distribution has zero kurtosis, and is said to be mesokurtic.
> kurtosis(faithful$eruptions)
[1] 1.4994