Introduction
0
- Jian Tang
- tangjianpku@gmail.com
1
- History 1950-1970 logic rules; 1980-1990 knowledge acquisition; 2010
-. machine learning
- DeepLearning⊂MachineLearning⊂ArtificialIntelligence D e e p L e a r n i n g ⊂ M a c h i n e L e a r n i n g ⊂ A r t i f i c i a l I n t e l l i g e n c e
- machine learning
- use statistical techniques, “learn” with data
- extract features automatically, instead of by domain experts
- learn automatically, instead of explicit programming
- Big Data-Big Computation-Big Model : Why deep learning now
- usage
- …
2
Probability
Bayes’ Theorem
- p(Y|X)=p(X|Y)p(Y)p(X),p(X)=∑Yp(X|Y)p(Y) p ( Y | X ) = p ( X | Y ) p ( Y ) p ( X ) , p ( X ) = ∑ Y p ( X | Y ) p ( Y )
- posterior ∝ ∝ likelihood * prior
variables
- E[f] := the average value of f(X) under the distribution p(x)
- E[f]=∑xp(x)f(x) E [ f ] = ∑ x p ( x ) f ( x )
- V[f], cov[x, y]
distributions
- binomial distribution
- Bin(m|N,μ)=(Nm)μm(1−μ)N−m B i n ( m | N , μ ) = ( N m ) μ m ( 1 − μ ) N − m
- E[m]=Nμ,var[m]=Nμ(1−μ) E [ m ] = N μ , v a r [ m ] = N μ ( 1 − μ )
multinomial variables
- x可以取k种值, x=(0,0,1,0,0,0)T x = ( 0 , 0 , 1 , 0 , 0 , 0 ) T 表示x取了六种中的第三种
μ=(μ1,μ2,...,μk)T μ = ( μ 1 , μ 2 , . . . , μ k ) T ,对应x向量每个位置上为1的概率
从而某个特定的x出现的概率 p(x|μ)=∏k=1Kμxkk p ( x | μ ) = ∏ k = 1 K μ k x k (也就是 μk μ k )
E[x|mu]=∑xp(x|μ)x=(μ1,μ2,...,μk)T=μ E [ x | m u ] = ∑ x p ( x | μ ) x = ( μ 1 , μ 2 , . . . , μ k ) T = μ
maximum likelihood estimation
μk=mkN,mk=∑Nxnk观察值的矩阵的每列和 μ k = m k N , m k = ∑ N x n k 观 察 值 的 矩 阵 的 每 列 和
gaussian univariate distribution正态分布
- multivariate gaussian distribution
- maximum likelihood estimation
- mixture of gaussians-可以模拟其他各种分布
gradient descent梯度下降
- a way to minimize an object function J(θ) J ( θ )
- η η : learning rate, which determines the size of the steps we take to reach a local minimum
- update equation: θ=θ−η∗∇θJ(θ) θ = θ − η ∗ ∇ θ J ( θ )