这一章学什么?
1. 一些很简单的统计数据:Mean, Median, Mode, Percentiles, Variance, Standard Deviation
2. 用数据描述图形的形状:z-score, skewness
3. 最后的两个变量之间的 Covariance 和 Correlation Coefficient。
这三个方面都是为以后的章节打下基础。
If the measures are computed for data from a sample, they are called sample statistics.
If the measures are computed for data from a population, they are called population parameters.
3.1 Measure of Location
Sample Mean :
Population Mean:
Median:Arrange the data in ascending order (smallest value to largest value).
(a) For an odd number of observations, the median is the middle value.
(b) For an even number of observations, the median is the average of the two middle values.
Mode:The mode is the value that occurs with greatest frequency.
Percentiles: The pth percentile is a value such that at least p percent of the observations are less than or equal to this value and at least (100-p) percent of the observations are greater than or equal to this value. 第 p 的 percentiles 是至少 p%的数据是小于等于这个值的,而且 至少100-p 的值是大于或者等于这个值。
怎么计算 Percentile ?
Step 1. Arrange the data in ascending order (smallest value to largest value).
Step 2. Compute an index i= p/100 *n where p is the percentile of interest and n is the number of observations.
Step 3. (a) If i is not an integer, round up. The next integer greater than i denotes the position of the pth percentile.
(b) If i is an integer, the pth percentile is the average of the values in po- sitions i and i+1.
如果计算出 i 不是一个整数,那么就 round up,下一个数字就是第 p 的 pencentile。
如果计算出 i 是一个整数,那么第 p 的 pencentile 就是 第 i个数据和第 i+1个数据的平均数
<