Numerical Measures in R

Data: Faithful -- Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.

> head(faithful)
  eruptions waiting
1     3.600      79
2     1.800      54
3     3.333      74
4     2.283      62
5     4.533      85
6     2.883      55
> summary(faithful)
   eruptions        waiting    
 Min.   :1.600   Min.   :43.0  
 1st Qu.:2.163   1st Qu.:58.0  
 Median :4.000   Median :76.0  
 Mean   :3.488   Mean   :70.9  
 3rd Qu.:4.454   3rd Qu.:82.0  
 Max.   :5.100   Max.   :96.0  

Mean, median, quantile

> mean(faithful$eruptions)
[1] 3.487783
> median(faithful$eruptions)
[1] 4
> quantile(faithful$eruptions)
     0%     25%     50%     75%    100% 
1.60000 2.16275 4.00000 4.45425 5.10000 
 

Sample variance is defined as 

> var(faithful$eruptions)
[1] 1.302728

The standard deviation of an observation variable is the square root of its variance.

> sd(faithful$eruptions)
[1] 1.141371

The covariance of two variances x and y in a data sample how the two are linear related.  A positive covariance would indicates a positive linear relationship between the variables, and a negative covariance would indicate the opposite.

The sample covariance is defined in terms of the sample means as:

> cov(faithful$eruptions, faithful$waiting)
[1] 13.97781

The correlation coefficient of two variables in a data sample is their covariance divided by the product of their individual standard deviations. It is a normalized measurement of how the two are linearly related.

The sample correlation coefficient is defined by the following formula, where sxand sy are the sample standard deviations, and sxy is the sample covariance.


cor(faithful$eruptions, faithful$waiting)
[1] 0.9008112

The  k th   central moment  (or moment about the mean ) of a data sample is:


In particular, the second central moment of a population is its variance.

> library("moments", lib.loc="~/R/win-library/3.2")
> moment(faithful$eruptions, order = 3, central = TRUE)
[1] -0.6149059

The skewness of a data population is defined by the following formula, where μ2 and μ3 are the second and third central moments.


Intuitively, the skewness is a measure of symmetry. As a rule, negative skewness indicates that the mean of the data values is less than the median, and the data distribution is left-skewed. Positive skewness would indicates that the mean of the data values is larger than the median, and the data distribution is right-skewed.

> skewness(faithful$eruptions)
[1] -0.415841


The  kurtosis  of a univariate population is defined by the following formula, where  μ 2  and  μ 4 are the second and fourth  central moments .


Intuitively, the kurtosis is a measure of the peakedness of the data distribution. Negative kurtosis would indicates a flat data distribution, which is said to be platykurtic. Positive kurtosis would indicates a peaked distribution, which is said to be leptokurtic. Incidentally, the normal distribution has zero kurtosis, and is said to be mesokurtic.

> kurtosis(faithful$eruptions)
[1] 1.4994

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器学习模型机器
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值