Numerical Measures in R

最新推荐文章于 2021-11-26 15:40:20 发布

cheryl1116

最新推荐文章于 2021-11-26 15:40:20 发布

阅读量425

点赞数

分类专栏： R语言学习笔记

R语言学习笔记专栏收录该内容

16 篇文章 0 订阅

订阅专栏

Data: Faithful -- Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.

> head(faithful)
  eruptions waiting
1     3.600      79
2     1.800      54
3     3.333      74
4     2.283      62
5     4.533      85
6     2.883      55
> summary(faithful)
   eruptions        waiting    
 Min.   :1.600   Min.   :43.0  
 1st Qu.:2.163   1st Qu.:58.0  
 Median :4.000   Median :76.0  
 Mean   :3.488   Mean   :70.9  
 3rd Qu.:4.454   3rd Qu.:82.0  
 Max.   :5.100   Max.   :96.0

Mean, median, quantile

> mean(faithful$eruptions)
[1] 3.487783
> median(faithful$eruptions)
[1] 4
> quantile(faithful$eruptions)
     0%     25%     50%     75%    100% 
1.60000 2.16275 4.00000 4.45425 5.10000

Sample variance is defined as

> var(faithful$eruptions)
[1] 1.302728

The standard deviation of an observation variable is the square root of its variance.

> sd(faithful$eruptions)
[1] 1.141371

The covariance of two variances x and y in a data sample how the two are linear related. A positive covariance would indicates a positive linear relationship between the variables, and a negative covariance would indicate the opposite.

The sample covariance is defined in terms of the sample means as:

> cov(faithful$eruptions, faithful$waiting)
[1] 13.97781

The correlation coefficient of two variables in a data sample is their covariance divided by the product of their individual standard deviations. It is a normalized measurement of how the two are linearly related.

The sample correlation coefficient is defined by the following formula, where s_xand s_y are the sample standard deviations, and s_xy is the sample covariance.

cor(faithful$eruptions, faithful$waiting)
[1] 0.9008112

The k ^th central moment (or moment about the mean ) of a data sample is:

In particular, the second central moment of a population is its variance.

> library("moments", lib.loc="~/R/win-library/3.2")
> moment(faithful$eruptions, order = 3, central = TRUE)
[1] -0.6149059

The skewness of a data population is defined by the following formula, where μ₂ and μ₃ are the second and third central moments.

Intuitively, the skewness is a measure of symmetry. As a rule, negative skewness indicates that the mean of the data values is less than the median, and the data distribution is left-skewed. Positive skewness would indicates that the mean of the data values is larger than the median, and the data distribution is right-skewed.

> skewness(faithful$eruptions)
[1] -0.415841

The kurtosis of a univariate population is defined by the following formula, where μ ₂ and μ ₄ are the second and fourth central moments .

Intuitively, the kurtosis is a measure of the peakedness of the data distribution. Negative kurtosis would indicates a flat data distribution, which is said to be platykurtic. Positive kurtosis would indicates a peaked distribution, which is said to be leptokurtic. Incidentally, the normal distribution has zero kurtosis, and is said to be mesokurtic.

> kurtosis(faithful$eruptions)
[1] 1.4994

cheryl1116

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Numerical Measures in R

basic culculation（mean，median，quantile, variance, standard deviation, covariance, correlation coefficient, central moment, skewness, kurtosis）
复制链接

扫一扫

专栏目录