1. 概率论基本概念
1.1 概率:
概率函数三定理:
- 非负性: P ( A ) ≥ 0 P(A) \ge 0 P(A)≥0
- 规范性: P ( Ω ) = 1 P(\Omega)=1 P(Ω)=1
- 可列可加性: P ( ⋃ i = 0 ∞ ) = ∑ i = 0 ∞ P ( A i ) \displaystyle{P(\bigcup_{i=0}^{\infty})=\sum_{i=0}^{\infty}P(A_i)} P(i=0⋃∞)=i=0∑∞P(Ai)
1.2 最大似然估计:
已知数据分布,以及分布的参数
θ
\theta
θ, 可以根据采样出的样本估计
θ
\theta
θ的值:
l
i
k
e
(
θ
)
=
f
D
(
x
1
,
x
2
,
⋯
,
x
n
∣
θ
)
like(\theta) = f_D(x_1,x_2,\cdots,x_n|\theta)
like(θ)=fD(x1,x2,⋯,xn∣θ)
在所有
θ
\theta
θ的取值上, 使这个函数最大的那个值就被称为
θ
\theta
θ的最大似然估计.
1.3 条件概率:
P ( A ∣ B ) = P ( A ∩ B ) P ( B ) P ( A ∩ B ) = P ( A ∣ B ) P ( B ) = P ( B ∣ A ) P ( A ) P ( A 1 A 2 . . . A n ) = P ( A 1 ) P ( A 2 ∣ A 1 ) P ( A 3 ∣ A 1 A 2 ) . . . P ( A n ∣ ⋂ i = 1 n − 1 A i ) P(A|B) = \frac{P(A\cap B)}{P(B)} \\ P(A\cap B)=P(A|B)P(B)=P(B|A)P(A) \\ P(A_1A_2...A_n)=P(A_1)P(A_2|A_1)P(A_3|A_1A_2)...P(A_n|\bigcap_{i=1}^{n-1} A_i) P(A∣B)=P(B)P(A∩B)P(A∩B)=P(A∣B)P(B)=P(B∣A)P(A)P(A1A2...An)=P(A1)P(A2∣A1)P(A3∣A1A2)...P(An∣i=1⋂n−1Ai)
1.4 贝叶斯法则
P ( B ∣ A ) = P ( B ∩ A ) P ( A ) = P ( A ∣ B ) P ( B ) P ( A ) P(B|A) = \frac{P(B\cap A)}{P(A)}=\frac{P(A|B)P(B)}{P(A)} P(B∣A)=P(A)P(B∩A)=P(A)P(A∣B)P(B)
全概率公式:
P ( A ) = ∑ i P ( A ∣ B i ) P ( B i ) P(A)=\sum_{i}P(A|B_i)P(B_i) P(A)=i∑P(A∣Bi)P(Bi)
1.5 随机变量:
随机变量就是试验结果的函数
1.6 二项式分布:
X ∽ B ( n , p ) p i = ( i n ) p i ( 1 − p ) n − i , i = 0 , 1 , ⋯ , n X \backsim B(n,p) \\ p_i=(_i^n)p^i(1-p)^{n-i}, i=0,1,\cdots,n X∽B(n,p)pi=(in)pi(1−p)n−i,i=0,1,⋯,n
1.7 联合概率分布和条件概率分布:
P ( X 1 = a i ∣ X 2 = b j ) = P ( X 1 = a i , X 2 = b j ) P ( X 2 = b j ) = p i j P ( X 2 = b j ) P(X_1=a_i|X_2=b_j) = \frac{P(X_1=a_i,X_2=b_j)}{P(X_2=b_j)} = \frac{p_{ij}}{P(X_2=b_j)} P(X1=ai∣X2=bj)=P(X2=bj)P(X1=ai,X2=bj)=P(X2=bj)pij
1.8 贝叶斯决策理论:
P ( ω i ∣ x ) = p ( x ∣ ω i ) P ( ω i ) ∑ j = 1 c p ( x ∣ ω j ) P ( ω j ) P(\omega_i|x)=\frac{p(x|\omega_i)P(\omega_i)}{\displaystyle{\sum_{j=1}^c}{p(x|\omega_j)P(\omega_j)}} P(ωi∣x)=j=1∑cp(x∣ωj)P(ωj)p(x∣ωi)P(ωi)
1.9 期望和方差:
E ( X ) = ∑ k = 1 ∞ x k p k V a r ( X ) = E ( ( X − E ( X ) ) 2 ) E(X)=\sum_{k=1}^{\infty}x_kp_k \\ Var(X) = E((X-E(X))^2) E(X)=k=1∑∞xkpkVar(X)=E((X−E(X))2)
2. 语料库与语言知识库:
语料库(corpus base)
2.1 语料库的分类
-
平衡语料库和平行语料库
-
通用语料库和专用语料库
-
共时语料库和历时语料库
-
生语料与标注语料库