统计原理笔记 Notes for Statistics I

最新推荐文章于 2020-08-06 16:29:46 发布

初木

最新推荐文章于 2020-08-06 16:29:46 发布

阅读量2.1k

点赞数

分类专栏：金融笔记文章标签：统计笔记 statistic

本文链接：https://blog.csdn.net/jtongxin/article/details/73885168

版权

Notes for Statistics

分享一下自己商统的笔记
by Feiran Jia

Lecture 1 Introduction

Variables

Quantitative
Categorical 明确的
- Ordinal 有先后顺序的
- Nominal 定义好赋予数值

Data sets

Cross-section
Time series

Sampling Error or Noise

Sampling error is a purely random difference between a sample and population of interest that arises because the sample is a random subset of the population.

Lecture 2 Displaying and Describing Quantitative Data

Histogram

Frequency histogram: bar height = frequency
Relative histogram：bar height = 频数／总数，直方纵坐标之和为1
Density histogram: bar height = fraction/bin width

Central Tendency 数据聚集程度

Mean

Population	Sample
Numbers of observations	N
Mean	$\mu = \frac{\sum_{i = 1}^Ny_i}{N}$

Median

Order observations from smallest (in value) to the largest
Find the middle - that would be the median of your data

Mode

Observation that occurs more often
Not unique (unimodal, bimodal)

Spread 数据离散程度

Range 极差 is an absolute difference between the smallest and the largest value in the data.

Interquartile Range 四分位数 IQR

Sort your data in ascending order.
Divide your data into two equal groups at the median.
Find the median of the first, “low” group. This is called Q1, or first quartile.
The median of the second, “high” group is the third quartile, Q3.
The interquartile range (IQR) is the difference between Q3 and Q1

数列	参数	四分差
1	102
2	104
3	105	Q1
4	107
5	108
6	109	Q2 (中位数)
7	110
8	112
9	115	Q3
10	118
11	118

Percentiles

median: $50^{th}$ percentile
first quartile Q1: $25^{th}$ percentile
third quartile Q3: $75^{th}$ percentile

Variance

Population	Sample
Number of observations	N
Variance	$\sigma^2 = \frac{\sum_{i=1}^{N}(y_i - \mu)^2}{N}$

Total Sum of Squares = TSS = $\sum_{i=1}^{n}(y_i - \bar y)^2$

degrees of freedom = $\nu$ = n - 1

Standard Devation

population standard deviation: $\sigma = \sqrt \frac{\sum_{i=1}^{N}(y_i - \mu)^2}{N}$

sample standard deviation: $s =\sqrt \frac{\sum_{i=1}^{n}(y_i - \bar y)^2}{n-1}$

Comparison/Standardization

Coefficient of Variation (CV) 变异系数

CV = Standard deviation / Mean
how much variability is in the data compared to the mean: 变量值平均水平高，其离散程度的测度值越大，反之越小。在进行数据统计分析时，如果变异系数大于15%，则要考虑该数据可能不正常，应该剔除。

z-score

$z = \frac{y-\bar y}{s}$
Variable z has a mean of 0 and standard deviation equal to 1
Value of z-score indicates how many standard deviations a value is from the mean

Lecture 3&4 Linear Relationship: Association, Correlation and Linear Regression

Covariance

Population	Sample
Number of observations	N
Covariance	$\sigma_{xy} = \frac{\sum_{i = 1}^N(x_i-\mu)(y_i-\mu_y)}{N}$

两个变量在变化过程中是同方向变化

Correlation 相关系数

为了能准确的研究两个变量在变化过程中的相似程度，我们就要把变化幅度对协方差的影响，从协方差中剔除掉。

Population	Sample
Covariance	$\sigma_{xy}$
Standard Deviations	$\sigma_x,\sigma_y$
How to find	$\rho = \frac{\sigma_{xy}}{\sigma_x \sigma_y}$

Coefficient of correlation is always between -1 and 1.
-1: strong negative linear relationship.
1: strong positive linear relationship
0: no linear relationship

The Linear Model

$y = b_0 + b_1 * x$

$b_0$ : y-intercept

$b_1$ : slope of the line

$e = y - \hat{y}$ observed $y$ , Predicted $\hat{y}$

OLS=Ordinary Least Squares 证明

Minimize sum of squares: $min\ \sum_{i=1}^n (y_i - \hat{y_i})^2$ or $min\ \sum_{i=1}^n (y_i - b_0 - b_1*x)^2$

solution:

$b_1 = r \frac{s_y}{s_x}$

$b_0 = \bar y -b_1\bar x$

need calculate

Proof:

b1=∑ni=1(xi−x¯)(yi−y¯)∑ni=1(xi−x¯)2=∑ni=1(xi−x¯)(yi−y¯)/(n−1)∑ni=1(xi−x¯)2/(n−1)=sxy

最低0.47元/天解锁文章

初木

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
统计原理笔记 Notes for Statistics I

统计笔记Lecture 1 Introduction VariablesQuantitativeCategorical 明确的 Ordinal 有先后顺序的Nominal 定义好赋予数值Data setsCross-sectionTime seriesSampling Error or NoiseSampling error is a purely random dif
复制链接

扫一扫