统计原理笔记 Notes for Statistics I

Notes for Statistics

分享一下自己商统的笔记
by Feiran Jia

Lecture 1 Introduction

Variables

  • Quantitative
  • Categorical 明确的
    • Ordinal 有先后顺序的
    • Nominal 定义好赋予数值

Data sets

  • Cross-section
  • Time series

Sampling Error or Noise

Sampling error is a purely random difference between a sample and population of interest that arises because the sample is a random subset of the population.

Lecture 2 Displaying and Describing Quantitative Data

Histogram

  1. Frequency histogram: bar height = frequency
  2. Relative histogram:bar height = 频数/总数,直方纵坐标之和为1
  3. Density histogram: bar height = fraction/bin width

Central Tendency 数据聚集程度

Mean

Population Sample
Numbers of observations N
Mean μ=Ni=1yiN

Median

  • Order observations from smallest (in value) to the largest
  • Find the middle - that would be the median of your data

Mode

  • Observation that occurs more often
  • Not unique (unimodal, bimodal)

Spread 数据离散程度

Range 极差 is an absolute difference between the smallest and the largest value in the data.

Interquartile Range 四分位数 IQR

  • Sort your data in ascending order.
  • Divide your data into two equal groups at the median.
  • Find the median of the first, “low” group. This is called Q1, or first quartile.
  • The median of the second, “high” group is the third quartile, Q3.
  • The interquartile range (IQR) is the difference between Q3 and Q1
数列 参数 四分差
1 102
2 104
3 105 Q1
4 107
5 108
6 109 Q2 (中位数)
7 110
8 112
9 115 Q3
10 118
11 118

Percentiles

  • median: 50th percentile
  • first quartile Q1: 25th percentile
  • third quartile Q3: 75th percentile

Variance

Population Sample
Number of observations N
Variance σ2=Ni=1(yiμ)2N

Total Sum of Squares = TSS = ni=1(yiy¯)2

degrees of freedom = ν = n - 1

Standard Devation

population standard deviation: σ=Ni=1(yiμ)2N

sample standard deviation: s=ni=1(yiy¯)2n1

Comparison/Standardization

Coefficient of Variation (CV) 变异系数

  • CV = Standard deviation / Mean
  • how much variability is in the data compared to the mean: 变量值平均水平高,其离散程度的测度值越大,反之越小。在进行数据统计分析时,如果变异系数大于15%,则要考虑该数据可能不正常,应该剔除。

z-score

  • z=yy¯s
  • Variable z has a mean of 0 and standard deviation equal to 1
  • Value of z-score indicates how many standard deviations a value is from the mean

Lecture 3&4 Linear Relationship: Association, Correlation and Linear Regression

Covariance

Population Sample
Number of observations N
Covariance σxy=Ni=1(xiμ)(yiμy)N

两个变量在变化过程中是同方向变化

Correlation 相关系数

为了能准确的研究两个变量在变化过程中的相似程度,我们就要把变化幅度对协方差的影响,从协方差中剔除掉。

Population Sample
Covariance σxy
Standard Deviations σx,σy
How to find ρ=σxyσxσy
  • Coefficient of correlation is always between -1 and 1.

  • -1: strong negative linear relationship.

  • 1: strong positive linear relationship

  • 0: no linear relationship

The Linear Model

y=b0+b1x

b0 : y-intercept

b1 : slope of the line

e=yŷ  observed y , Predicted ŷ 

OLS=Ordinary Least Squares 证明

Minimize sum of squares: min ni=1(yiyi^)2 or min ni=1(yib0b1x)2

solution:

b1=rsysx

b0=y¯b1x¯

need calculate

Proof:

b1=ni=1(xix¯)(yiy¯)ni=1(xix¯)2=ni=1(xix¯)(yiy¯)/(n1)ni=1(xix¯)2/(n1)=sxy

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值