偏度和峰度描述什么_什么是偏度和峰度

偏度和峰度描述什么

数据科学机器学习(Data Science, Machine Learning)

In this article, we will go through two of the important concepts in descriptive statistics — Skewness and Kurtosis. At the end of the article, you will have answers to the questions such as what is skewness & kurtosis, right/left skewness, how skewness & kurtosis are measured, how it is useful, etc.

在本文中,我们将介绍描述性统计中的两个重要概念-偏度和峰度。 在本文的结尾,您将获得以下问题的答案,例如什么是偏度和峰度,右/左偏度,如何测量偏度和峰度,如何使用等等。

偏度 (Skewness)

‘Skewness’ is a measure of the asymmetry of the probability distribution of a real-valued random variable.

“偏度”是对实值随机变量的概率分布的不对称性的度量。

负偏度 (Negative Skewness)

The data concentrated more on the right of the figure as you can see below. So there is a long tail on the left side. It is also called as left-skewed or left-tailed.

数据更加集中在该图的右侧,如下所示。 因此,左侧有一条长长的尾巴。 也称为左偏或左尾。

正偏度 (Positive Skewness)

The data concentrated more on the left of the figure as you can see below. So there is a long tail on the right side. It is also called as right-skewed or right-tailed.

数据更加集中在图的左侧,如下所示。 因此,右侧有一条长长的尾巴。 也称为右偏或右尾。

Image for post
Source: Wikipedia
资料来源:维基百科

如何解释偏度(How to interpret skewness)

A rule of thumb says:

经验法则说:

  • If the skewness is between -0.5 and 0.5, the data are fairly symmetrical (normal distribution).

    如果偏度在-0.5到0.5之间,则数据是相当对称的(正态分布)。
  • If the skewness is between -1 and -0.5(negatively skewed) or between 0.5 and 1(positively skewed), the data are moderately skewed.

    如果偏斜度在-1和-0.5之间(负偏度)或0.5和1之间(正偏度),则数据偏斜。
  • If the skewness is less than -1(negatively skewed) or greater than 1(positively skewed), the data are highly skewed.

    如果偏斜度小于-1(负偏度)或大于1(正偏度),则数据高度偏斜。

If the data follow normal distribution, its skewness will be zero. But in real world, we don’t find any data which perfectly follows normal distribution. So, for any real world data we don’t find exact zero skewness but it can be close to zero.

如果数据遵循正态分布,则其偏斜度将为零。 但是在现实世界中,我们找不到完全符合正态分布的数据。 因此,对于任何现实世界的数据,我们都找不到精确的零偏度,但它可能接近零。

为什么要研究偏度 (Why study skewness)

Consider the below example. Here total_bill is positively skewed and data points are concentrated on the left side. If we were to build the model on this, the model will make better predictions where total_bill is lower compared to higher total_bill.

考虑下面的例子。 这里total_bill正偏,数据点集中在左侧。 如果我们要以此为基础构建模型,则与更高的total_bill相比, total_bill较低的模型将做出更好的预测。

Image for post
Image by Author
图片作者

Skewness tells us about the direction of the outlier. From the above distribution, we can clearly say that outliers are present on the right side of the distribution.

偏度告诉我们异常值的方向。 从上面的分布中,我们可以清楚地说出异常值出现在分布的右侧。

如何处理偏斜的数据 (How to deal with skewed data)

Many statistical tests and machine learning models depend on normality assumptions. So, significant skewness means that data is not normal and that may affect your statistical tests or machine learning prediction power. In such cases, we need to transform the data to make it normal. Some of the common techniques used for treating skewed data:

许多统计测试和机器学习模型都依赖于正态性假设。 因此,严重偏斜意味着数据不正常,并且可能会影响您的统计测试或机器学习预测能力。 在这种情况下,我们需要转换数据以使其正常。 用于处理偏斜数据的一些常用技术:

  • Log transformation

    日志转换
  • Square root transformation

    平方根变换
  • Power transformation

    动力转换
  • Exponential transformation

    指数变换
  • Box-Cox transformation, etc

    Box-Cox转换等

In the below example, we will look at the tips dataset from the Seaborn library. As we can see, total_bill has a skewness of 1.12 which means it is highly skewed. It is also visible from the distribution plot that data is positively skewed. After the log transformation of total_bill, skewness is reduced to -0.11 which means is fairly symmetrical.

在下面的示例中,我们将查看Seaborn库中的tips数据集。 如我们所见, total_bill的偏斜度为1.12,这意味着高度偏斜。 从分布图中还可以看出,数据正偏。 经过total_bill对数转换后,偏斜度减小到-0.11,这意味着相当对称。

峰度 (Kurtosis)

‘Kurtosis’ is a measure of ‘tailedness’ of the probability distribution of a real-valued random variable. It is generally used to identify outliers (extreme values) in the given dataset. Since it is used for identifying outliers, extreme values at both ends of tails are used for analysis.

“峰度”是对实值随机变量的概率分布的“尾部”度量。 通常用于标识给定数据集中的异常值(极值)。 由于用于识别离群值,因此使用尾部两端的极值进行分析。

峰态的类型以及如何解释 (Types of Kurtosis and how to interpret)

  1. Mesokurtic (Kurtosis = 3) — This distribution shows kurtosis of 3 near zero. The distribution of extreme values (outliers) is similar to that of normal distribution.

    Mesokurtic(峰度= 3)-此分布显示峰度3接近零。 极值(离群值)的分布与正态分布相似。

  2. Leptokurtic (Kurtosis > 3) — This distribution shows greater kurtosis than mesokurtic. The peak is higher and sharper than Mesokurtic. It shows heavy tails on either side that indicates large outliers. In the investment world, a leptokurtic distribution means that it is a high-risk investment.

    Leptokurtic(Kurtosis> 3)-这种分布显示出比Mekokurtic更大的峰度。 该峰比中胚层更高且更尖锐。 它的两边都有粗尾,表明离群值较大。 在投资世界中,Leptokurtic发行意味着它是高风险的投资。

  3. Platykurtic: (Kurtosis < 3) — This distribution shows lower kurtosis than mesokurtic. The peak is lower and broader than Mesokurtic. It shows flat tails on either side indicating small outliers. In the investment world, a platykurtic distribution means that it is a low-risk investment.

    侧柏:(Kurtosis <3)-此分布显示峰度比中侧偏低。 该峰比中胚层低且宽。 它的两侧均显示平坦的尾巴,表示离群值较小。 在投资世界中,platykurtic发行意味着它是一种低风险的投资。

图片发布
Source: tutorialspoint.com
资料来源:tutorialspoint.com

Below example shows how to calculate kurtosis:

下面的示例显示如何计算峰度:

Thank you for reading this article. You can reach me at https://www.linkedin.com/in/chetanambi/

感谢您阅读本文。 您可以通过https://www.linkedin.com/in/chetanambi/与我联系

翻译自: https://medium.com/towards-artificial-intelligence/what-are-skewness-and-kurtosis-3e854a01808c

偏度和峰度描述什么

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值