统计与R入门

最新推荐文章于 2024-07-20 09:40:11 发布

手撕机

最新推荐文章于 2024-07-20 09:40:11 发布

阅读量2k

点赞数

原创文章，未经授权请勿转载。

本文链接：https://blog.csdn.net/guolindonggld/article/details/50820731

版权

主要是Coursera Basic Statistics课程的笔记。

第一周 Exploring Data

Descriptive Statistics

Different Levels of Measurement:
Nominal（定标）, Ordinal（定序）, Interval（定距）, Ratio（定比）: Interval和Ratio的差别是Inteval的零不是表示没有，比如温度为0并不代表没有温度。

Central Tendency and Dispersion:
Central Tendency指标有：Mode，Median，Mean（俗称3M）
Dispersion指标有：Range，Interquantile Range（IQR），Variance， Standard Deviation

另外一个Z-Scores：to specific a observation is common or exceptional，(变量值-均值)/标准差，

在R中对应的函数有：

Measurement	Function
Mode	N/A
Mean	mean()
Median	median()
Range	range()
IQR	IQR()
Variance	var()
Standard Deviation	sd()
Z-Scores	N/A

第二周 Correlation and Regression

Frequency table: One varible
Contingency table: Two varible
When the two varible are quantitative, we use scatterplot.

Correlation: Pearson r，取值范围[-1,1]，正数表示正相关，负数表示负相关，数值表示强度：
0.8-1.0 极强相关
0.6-0.8 强相关
0.4-0.6 中等程度相关
0.2-0.4 弱相关
0.0-0.2 极弱相关或无相关
Regression: $\hat{y}=a+bx$ , 其中： $b=r\frac{\sum z_{x}z_{y}}{n}$ ， r为皮尔森系数，z为z-score，n为样本数

Explained variance: The percentage of the variance in the dependent variable that can be explained using the formula of the regression line. You can measure this with r-squared.

R语言对应函数：

Name	function
Frequency Table or Contingency Table	table()
Pearson’s r/Correlation	cor()
Linear Regression	lm()
Scatter Plot	plot()
Regression Line	abline()

第三周 Probability

Experiment
Trial
Outcome
Event
Random Variable
Marginal Probability

Two methods to calculate probability:

Tree Diagram
Contingency Table

The complement of $X$ is $X ^ c$ .
Independent intersecting events are two events that do not influence each other and can occur similtaneously. An example might be the outcome of rolling two dices.
Disjoint exhaustive events are mutually exclusive, so only one of the events can happen at a time.

Intersection： $P(A \cap B)$
Union： $P(A \cup B)=P(A)+P(B)-P(A \cap B)$

Joint Probability： $P(AB)$ , i.e. P(A and B)
Conditional Probability： $P(A \mid B)=\frac{P(AB)}{P(B)}$ , i.e. P(A given B), reduced sample space

袋子里有6颗红球4颗绿球，从袋子里随机拿出两个球：
无放回：依赖事件
有放回：独立事件

If event A and event B are independent:
$P(AB)=P(A)*P(B)$
$P(A \mid B)=\frac{P(AB)}{P(B)}=\frac{P(A)*P(B)}{P(B)}=P(A)$
(比如投两枚硬币，一枚硬币的结果不会影响另一枚硬币的结果)

And how to calculate $P(AB)$ when events are dependent?
See this course
这里写图片描述

Bayes’ Law:
$\because P(AB)=P(A \mid B)*P(B)=P(B \mid A)*P(A)$
$\therefore P(A \mid B)=\frac{P(B \mid A)*P(A)}{P(B)}$
where $P(A)$ is called prior probability, and $P(B \mid A)$ is called posterior probability.