算法名称 | 模型 | 策略 | 求解算法 |
---|---|---|---|
线性回归 | f ( x ) = W T ⋅ x + b f(x)=W^T \cdot x + b f(x)=WT⋅x+b | 最小二乘 L ( W , b ) = ( f ( x ) − y ) 2 L(W,b) = (f(x)-y)^2 L(W,b)=(f(x)−y)2 | 梯度下降、牛顿法 |
LOSSO回归 | f ( x ) = W T ⋅ x + b f(x)=W^T \cdot x + b f(x)=WT⋅x+b | 最小二乘 L ( W , b ) = ( f ( x ) − y ) 2 + λ ∣ ∣ W ∣ ∣ 1 L(W,b) = (f(x)-y)^2+\lambda ||W||_1 L(W,b)=(f(x)−y)2+λ∣∣W∣∣1 | 坐标下降法 |
岭回归 | f ( x ) = W T ⋅ x + b f(x)=W^T \cdot x + b f(x)=WT⋅x+b | 最小二乘 L ( W , b ) = ( f ( x ) − y ) 2 + 1 2 λ W 2 L(W,b) = (f(x)-y)^2+\frac{1}{2}\lambda W^2 L(W,b)=(f(x)−y)2+21λW2 | 梯度下降 |
逻辑回归 | f ( x ) = 1 1 + e − ( W T ⋅ x + b ) f(x)=\frac{1}{1+e^{-(W^T\cdot x + b)}} f(x)=1+e−(WT⋅x+b)1 | 交叉熵损失 − l n p ( y ∣ x ) = − 1 m ∑ i = 1 m ( y l n y ^ + ( 1 − y ) l n ( 1 − y ^ ) ) -lnp(y|x)=-\dfrac{1}{m}\sum_{i=1}^m(yln\hat y+(1-y)ln(1-\hat y)) −lnp(y∣x)=−m1∑i=1m(ylny^+(1−y)ln(1−y^)), y ^ = 1 1 + e − ( W T ⋅ X + b ) \hat y = \dfrac{1}{1+e^{-(W^T \cdot X + b)}} y^=1+e−(WT⋅X+b)1 | 梯度下降、牛顿法 |
感知机 | f ( x ) = s i g n ( W T ⋅ x + b ) f(x)=sign(W^T \cdot x+b) f(x)=sign(WT⋅x+b),sign是一个符号函数 | 让分错的点距离当前分离超平面距离尽可能的小 L ( w , b ) = − ∑ x i ∈ M y i ( w x i + b ) L(w,b)=-\sum_{x_i \in M} y_i(wx_i+b) L(w,b)=−∑xi∈Myi(wxi+b) M是所有分类错误点的集合 | 随机梯度下降:每找到一个错误的样本更新一次参数 |
K近邻 | y = a r g m a x ∑ x i ∈ N K ( x ) I ( y i = c j ) y=argmax \sum_{x_i\in N_K(x)} I(y_i=c_j) y=argmax∑xi∈NK(x)I(yi=cj) | - | - |
朴素贝叶斯 | P ( Y = c k ∣ X = x ) = P ( Y = c k ) ∏ j P ( X ( j ) = x ( j ) ∣ Y = c k ) ∑ k P ( Y = c k ) ∏ j P ( X ( j ) = x ( j ) ∣ Y = c k ) P(Y=c_k|X=x)=\frac{P(Y=c_k)\prod_j P(X^{(j)}=x^{(j)}|Y=c_k)}{\sum_kP(Y=c_k)\prod_j P(X^{(j)}=x^{(j)}|Y=c_k)} P(Y=ck∣X=x)=∑kP(Y=ck)∏jP(X(j)=x(j)∣Y=ck)P(Y=ck)∏jP(X(j)=x(j)∣Y=ck),条件独立性假设 | 经验风险期望最小化: y = f ( x ) = a r g m a x c k P ( Y = c k ) ∏ j P ( X ( j ) = x ( j ) ∣ Y = c k ) y=f(x)=argmax_{c_k}P(Y=c_k)\prod_j P(X^{(j)}=x^{(j)}|Y=c_k) y=f(x)=argmaxckP(Y=ck)∏jP(X(j)=x(j)∣Y=ck) | 极大似然估计: P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N P(Y=c_k)=\frac{\sum_{i=1}^N I(y_i=c_k)}{N} P(Y=ck)=N∑i=1NI(yi=ck), P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) P(X^{(j)}=a_{jl}|Y=c_k)=\frac{\sum_{i=1}^N I(x_i^{(j)}=a_{jl},y_i=c_k)}{\sum_{i=1}^N I(y_i=c_k)} P(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)∑i=1NI(xi(j)=ajl,yi=ck) |
SVM-线性可分 | f ( x ) = s i g n ( W T ⋅ x + b ) f(x)=sign(W^T \cdot x+b) f(x)=sign(WT⋅x+b) | 间隔最大化/合页损失: m i n 1 2 ∣ ∣ w ∣ ∣ 2 min\dfrac{1}{2}||w||^2 min21∣∣w∣∣2同时包含约束: y i ( W T ⋅ x i + b ) − 1 > = 0 y_i(W^T\cdot x_i+b)-1>=0 yi(WT⋅xi+b)−1>=0 | 拉格朗日、SMO |
SVM-线性近似可分 | f ( x ) = s i g n ( W T ⋅ x + b ) f(x)=sign(W^T \cdot x+b) f(x)=sign(WT⋅x+b) | 间隔最大化/合页损失: m i n 1 2 ∣ ∣ w ∣ ∣ 2 + C ∑ i = 1 N ξ i min\frac{1}{2}||w||^2+C\sum_{i=1}^N\xi_i min21∣∣w∣∣2+C∑i=1Nξi,同时包含约束: y i ( w ⋅ x i + b ) > = 1 − ξ i y_i(w\cdot x_i+b)>=1-\xi_i yi(w⋅xi+b)>=1−ξi | 拉格朗日、SMO |
SVM-支持向量机 | f ( x ) = s i g n ( W T ⋅ x + b ) f(x)=sign(W^T \cdot x+b) f(x)=sign(WT⋅x+b) | 间隔最大化/合页损失: m i n α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j K ( x i ⋅ x j ) − ∑ i = 1 N α i min_{\alpha}\dfrac{1}{2}\sum_{i=1}^N\sum_{j=1}^N\alpha_i\alpha_jy_iy_jK(x_i\cdot x_j)-\sum_{i=1}^N\alpha_i minα21∑i=1N∑j=1NαiαjyiyjK(xi⋅xj)−∑i=1Nαi同时包含约束: ∑ i = 1 N α i y i = 0 \sum_{i=1}^N\alpha_iy_i=0 ∑i=1Nαiyi=0, 0 < = α i < = C 0<=\alpha_i<=C 0<=αi<=C | 拉格朗日、SMO |
CRF-概率计算 | P ( y ∣ x ) = 1 Z ( x ) M i ( y i − 1 , y i ∣ x ) P(y|x)=\frac{1}{Z(x)}M_i(y_{i-1},y_i|x) P(y∣x)=Z(x)1Mi(yi−1,yi∣x) | 前向算法,后向算法 | |
CRF-参数估计 | L ( w ) = l o g ∏ x , y P ( y ∣ x ) P ^ ( y ∣ x ) = ∑ x , y P ^ ( y ∣ x ) l o g P ( y ∣ x ) L(w)=log\prod_{x,y}P(y|x)^{\hat{P}(y|x)}=\sum_{x,y}\hat{P}(y|x)logP(y|x) L(w)=log∏x,yP(y∣x)P^(y∣x)=∑x,yP^(y∣x)logP(y∣x) | 极大似然估计 | 梯度下降,改进的迭代尺度法、拟牛顿法 |
CR-序列标注 | y ∗ = a r g m a x y P ( y ∣ x ) = a r g m a x y e x p ( w . F ( y , x ) ) Z ( x ) = a r g m a x y ( w . F ( y , x ) ) y^{*}=argmax_{y}P(y|x)=argmax_y\frac{exp(w.F(y,x))}{Z(x)}=argmax_y(w.F(y,x)) y∗=argmaxyP(y∣x)=argmaxyZ(x)exp(w.F(y,x))=argmaxy(w.F(y,x)) | 动态规划 | 维特比算法 |
HMM-概率计算 | P ( O ∣ λ ) = P ( O 1 T ∣ λ ) = ∑ i = 1 N P ( O 1 T , i T = q i ) = ∑ i = 1 N α T ( i ) P(O|\lambda)= P(O_1^T|\lambda)=\sum_{i=1}^NP(O_1^T,i_T=q_i)=\sum_{i=1}^N \alpha_T(i) P(O∣λ)=P(O1T∣λ)=∑i=1NP(O1T,iT=qi)=∑i=1NαT(i) | 前向算法,后向算法 | |
HMM-学习问题 | π ∗ = a r g m a x P ( I ∣ O , λ ) \pi^*=argmaxP(I|O,\lambda) π∗=argmaxP(I∣O,λ) | EM算方法 | |
HMM-序列标注 | y ∗ = a r g m a x y P ( y ∣ x ) = a r g m a x y e x p ( w . F ( y , x ) ) Z ( x ) = a r g m a x y ( w . F ( y , x ) ) y^{*}=argmax_{y}P(y|x)=argmax_y\frac{exp(w.F(y,x))}{Z(x)}=argmax_y(w.F(y,x)) y∗=argmaxyP(y∣x)=argmaxyZ(x)exp(w.F(y,x))=argmaxy(w.F(y,x)) | 动态规划 | 维特比算法 |
Adaboost | G ( x ) = s i g n ( f ( x ) ) = s i g n ( ∑ m = 1 M α m G m ( x ) ) G(x)=sign(f(x))=sign(\sum_{m=1}^M\alpha_mG_m(x)) G(x)=sign(f(x))=sign(∑m=1MαmGm(x)) | 不断调整训练数据集数据的权重学习 | |
Boosting | f ( x ) = ∑ m = 1 M β m b ( x ; r m ) f(x)=\sum_{m=1}^M\beta_mb(x;r_m) f(x)=∑m=1Mβmb(x;rm) | 前向分布算法 | |
BDT | f M = ∑ m = 1 M T ( x ; θ m ) f_M=\sum_{m=1}^M T(x;\theta_m) fM=∑m=1MT(x;θm) T ( X ) = ∑ j = 1 J c j I ( X ∈ R j ) T(X)=\sum_{j=1}^Jc_jI(X\in R^j) T(X)=∑j=1JcjI(X∈Rj) | 最小二乘 | 残差 |
GBDT | f M = ∑ m = 1 M T ( x ; θ m ) f_M=\sum_{m=1}^M T(x;\theta_m) fM=∑m=1MT(x;θm) | 最小二乘 | f(x)的一阶梯度 |
xgboost |
f
(
X
)
=
W
q
(
X
)
f(X)=W_{q(X)}
f(X)=Wq(X) y i ^ = ∑ k = 1 K f k ( X i ) \hat{y_i}=\sum_{k=1}^Kf_k(X_i) yi^=∑k=1Kfk(Xi) L = ∑ i = 1 n l ( y i , y ^ i ) + ∑ K Ω ( f k ) L=\sum_{i=1}^nl(y_i,\hat{y}_i)+\sum_K \Omega(f_k) L=∑i=1nl(yi,y^i)+∑KΩ(fk) | 最小二乘 | 一阶梯度,二阶梯度 |
几个算法比较
于 2023-01-26 04:36:47 首次发布