分散式计算是什么
Dispersion is basically the spread of the data, the extent to which a distribution is stretched or squeezed. Central tendencies are not enough to fully understand the nature of the entire dataset, hence we need dispersion measures. With both central tendencies and dispersion measures together paint a good picture of the dataset.
分散基本上是数据的分散,即分布被拉伸或压缩的程度。 中心趋势不足以完全理解整个数据集的性质,因此我们需要分散度量。 通过集中趋势和分散度量,可以很好地描绘数据集。
Following are the most common dispersion measures:
以下是最常见的分散措施:
范围: (Range:)
Range is nothing but the difference between maximum value and minimum value. It tells us within what range the whole dataset lies. Range is very easy and straightforward to calculate but at the same time, it is too sensitive to outliers
范围不过是最大值和最小值之间的差异。 它告诉我们整个数据集位于什么范围内。 范围非常容易直接计算,但同时又对异常值过于敏感
R = max(dataset) -min(dataset)
R = max(数据集)-min(数据集)
方差: (Variance:)
Variance is the measure of spread of data, it is the expectation of squared deviation of a random variable from its mean.
方差是数据传播的量度,是对随机变量与其均值平方差的期望。
Now, let us try to understand by intuition, without going into mathematics too much, why we divide with “n-1” instead of “n”.Ideal way to calculate standard deviation from sample dataset would be
现在,让我们尝试通过直觉来理解,而不必过多地研究数学,为什么我们用“ n-1”而不是“ n”进行除法。从样本数据集计算标准差的理想方法是
But we do not know ‘mu’, hence we use ‘x-bar’ instead. ‘x-bar’ is calculated using only sample dataset, so in real world
但是我们不知道'mu',因此我们改用'x-bar'。 “ x-bar”仅使用样本数据集进行计算,因此在现实世界中
hence,
因此,
Hence we use “n-1” in the denominator to calculate the unbiased standard deviation from sample.
因此,我们在分母中使用“ n-1”来计算样本的无偏标准偏差。
标准偏差: (Standard Deviation:)
As the unit of variance is squared the unit of data(as obvious from the formula above), the value of variance is not very intuitive, hence we take its square root which is defined as standard deviation.
由于方差单位是数据单位的平方(从上面的公式显而易见),方差的值不是很直观,因此我们将其平方根定义为标准偏差。
- If the standard deviation is small, the data has little spread (i.e., the majority of points fall very near the mean). 如果标准偏差很小,则数据的传播范围很小(即,大多数点都非常接近均值)。
- If standard deviation = 0, there is no spread. This only happens when all data items are the same value. 如果标准偏差= 0,则不会扩展。 仅当所有数据项具有相同的值时,才会发生这种情况。
- The standard deviation is significantly affected by outliers and skewed distributions. 离群值和偏斜分布会严重影响标准偏差。
翻译自: https://medium.com/analytics-vidhya/what-is-dispersion-and-its-measures-888fedddd40a
分散式计算是什么