几个算法比较

算法名称模型策略求解算法
线性回归 f ( x ) = W T ⋅ x + b f(x)=W^T \cdot x + b f(x)=WTx+b最小二乘 L ( W , b ) = ( f ( x ) − y ) 2 L(W,b) = (f(x)-y)^2 L(W,b)=(f(x)y)2梯度下降、牛顿法
LOSSO回归 f ( x ) = W T ⋅ x + b f(x)=W^T \cdot x + b f(x)=WTx+b最小二乘 L ( W , b ) = ( f ( x ) − y ) 2 + λ ∣ ∣ W ∣ ∣ 1 L(W,b) = (f(x)-y)^2+\lambda ||W||_1 L(W,b)=(f(x)y)2+λ∣∣W1坐标下降法
岭回归 f ( x ) = W T ⋅ x + b f(x)=W^T \cdot x + b f(x)=WTx+b最小二乘 L ( W , b ) = ( f ( x ) − y ) 2 + 1 2 λ W 2 L(W,b) = (f(x)-y)^2+\frac{1}{2}\lambda W^2 L(W,b)=(f(x)y)2+21λW2梯度下降
逻辑回归 f ( x ) = 1 1 + e − ( W T ⋅ x + b ) f(x)=\frac{1}{1+e^{-(W^T\cdot x + b)}} f(x)=1+e(WTx+b)1交叉熵损失 − l n p ( y ∣ x ) = − 1 m ∑ i = 1 m ( y l n y ^ + ( 1 − y ) l n ( 1 − y ^ ) ) -lnp(y|x)=-\dfrac{1}{m}\sum_{i=1}^m(yln\hat y+(1-y)ln(1-\hat y)) lnp(yx)=m1i=1m(ylny^+(1y)ln(1y^)), y ^ = 1 1 + e − ( W T ⋅ X + b ) \hat y = \dfrac{1}{1+e^{-(W^T \cdot X + b)}} y^=1+e(WTX+b)1梯度下降、牛顿法
感知机 f ( x ) = s i g n ( W T ⋅ x + b ) f(x)=sign(W^T \cdot x+b) f(x)=sign(WTx+b),sign是一个符号函数让分错的点距离当前分离超平面距离尽可能的小 L ( w , b ) = − ∑ x i ∈ M y i ( w x i + b ) L(w,b)=-\sum_{x_i \in M} y_i(wx_i+b) L(w,b)=xiMyi(wxi+b) M是所有分类错误点的集合随机梯度下降:每找到一个错误的样本更新一次参数
K近邻 y = a r g m a x ∑ x i ∈ N K ( x ) I ( y i = c j ) y=argmax \sum_{x_i\in N_K(x)} I(y_i=c_j) y=argmaxxiNK(x)I(yi=cj)--
朴素贝叶斯 P ( Y = c k ∣ X = x ) = P ( Y = c k ) ∏ j P ( X ( j ) = x ( j ) ∣ Y = c k ) ∑ k P ( Y = c k ) ∏ j P ( X ( j ) = x ( j ) ∣ Y = c k ) P(Y=c_k|X=x)=\frac{P(Y=c_k)\prod_j P(X^{(j)}=x^{(j)}|Y=c_k)}{\sum_kP(Y=c_k)\prod_j P(X^{(j)}=x^{(j)}|Y=c_k)} P(Y=ckX=x)=kP(Y=ck)jP(X(j)=x(j)Y=ck)P(Y=ck)jP(X(j)=x(j)Y=ck),条件独立性假设经验风险期望最小化: y = f ( x ) = a r g m a x c k P ( Y = c k ) ∏ j P ( X ( j ) = x ( j ) ∣ Y = c k ) y=f(x)=argmax_{c_k}P(Y=c_k)\prod_j P(X^{(j)}=x^{(j)}|Y=c_k) y=f(x)=argmaxckP(Y=ck)jP(X(j)=x(j)Y=ck)极大似然估计: P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N P(Y=c_k)=\frac{\sum_{i=1}^N I(y_i=c_k)}{N} P(Y=ck)=Ni=1NI(yi=ck), P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) P(X^{(j)}=a_{jl}|Y=c_k)=\frac{\sum_{i=1}^N I(x_i^{(j)}=a_{jl},y_i=c_k)}{\sum_{i=1}^N I(y_i=c_k)} P(X(j)=ajlY=ck)=i=1NI(yi=ck)i=1NI(xi(j)=ajl,yi=ck)
SVM-线性可分 f ( x ) = s i g n ( W T ⋅ x + b ) f(x)=sign(W^T \cdot x+b) f(x)=sign(WTx+b)间隔最大化/合页损失: m i n 1 2 ∣ ∣ w ∣ ∣ 2 min\dfrac{1}{2}||w||^2 min21∣∣w2同时包含约束: y i ( W T ⋅ x i + b ) − 1 > = 0 y_i(W^T\cdot x_i+b)-1>=0 yi(WTxi+b)1>=0拉格朗日、SMO
SVM-线性近似可分 f ( x ) = s i g n ( W T ⋅ x + b ) f(x)=sign(W^T \cdot x+b) f(x)=sign(WTx+b)间隔最大化/合页损失: m i n 1 2 ∣ ∣ w ∣ ∣ 2 + C ∑ i = 1 N ξ i min\frac{1}{2}||w||^2+C\sum_{i=1}^N\xi_i min21∣∣w2+Ci=1Nξi,同时包含约束: y i ( w ⋅ x i + b ) > = 1 − ξ i y_i(w\cdot x_i+b)>=1-\xi_i yi(wxi+b)>=1ξi拉格朗日、SMO
SVM-支持向量机 f ( x ) = s i g n ( W T ⋅ x + b ) f(x)=sign(W^T \cdot x+b) f(x)=sign(WTx+b)间隔最大化/合页损失: m i n α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j K ( x i ⋅ x j ) − ∑ i = 1 N α i min_{\alpha}\dfrac{1}{2}\sum_{i=1}^N\sum_{j=1}^N\alpha_i\alpha_jy_iy_jK(x_i\cdot x_j)-\sum_{i=1}^N\alpha_i minα21i=1Nj=1NαiαjyiyjK(xixj)i=1Nαi同时包含约束: ∑ i = 1 N α i y i = 0 \sum_{i=1}^N\alpha_iy_i=0 i=1Nαiyi=0 0 < = α i < = C 0<=\alpha_i<=C 0<=αi<=C拉格朗日、SMO
CRF-概率计算 P ( y ∣ x ) = 1 Z ( x ) M i ( y i − 1 , y i ∣ x ) P(y|x)=\frac{1}{Z(x)}M_i(y_{i-1},y_i|x) P(yx)=Z(x)1Mi(yi1,yix)前向算法,后向算法
CRF-参数估计 L ( w ) = l o g ∏ x , y P ( y ∣ x ) P ^ ( y ∣ x ) = ∑ x , y P ^ ( y ∣ x ) l o g P ( y ∣ x ) L(w)=log\prod_{x,y}P(y|x)^{\hat{P}(y|x)}=\sum_{x,y}\hat{P}(y|x)logP(y|x) L(w)=logx,yP(yx)P^(yx)=x,yP^(yx)logP(yx)极大似然估计梯度下降,改进的迭代尺度法、拟牛顿法
CR-序列标注 y ∗ = a r g m a x y P ( y ∣ x ) = a r g m a x y e x p ( w . F ( y , x ) ) Z ( x ) = a r g m a x y ( w . F ( y , x ) ) y^{*}=argmax_{y}P(y|x)=argmax_y\frac{exp(w.F(y,x))}{Z(x)}=argmax_y(w.F(y,x)) y=argmaxyP(yx)=argmaxyZ(x)exp(w.F(y,x))=argmaxy(w.F(y,x))动态规划维特比算法
HMM-概率计算 P ( O ∣ λ ) = P ( O 1 T ∣ λ ) = ∑ i = 1 N P ( O 1 T , i T = q i ) = ∑ i = 1 N α T ( i ) P(O|\lambda)= P(O_1^T|\lambda)=\sum_{i=1}^NP(O_1^T,i_T=q_i)=\sum_{i=1}^N \alpha_T(i) P(Oλ)=P(O1Tλ)=i=1NP(O1T,iT=qi)=i=1NαT(i)前向算法,后向算法
HMM-学习问题 π ∗ = a r g m a x P ( I ∣ O , λ ) \pi^*=argmaxP(I|O,\lambda) π=argmaxP(IO,λ)EM算方法
HMM-序列标注 y ∗ = a r g m a x y P ( y ∣ x ) = a r g m a x y e x p ( w . F ( y , x ) ) Z ( x ) = a r g m a x y ( w . F ( y , x ) ) y^{*}=argmax_{y}P(y|x)=argmax_y\frac{exp(w.F(y,x))}{Z(x)}=argmax_y(w.F(y,x)) y=argmaxyP(yx)=argmaxyZ(x)exp(w.F(y,x))=argmaxy(w.F(y,x))动态规划维特比算法
Adaboost G ( x ) = s i g n ( f ( x ) ) = s i g n ( ∑ m = 1 M α m G m ( x ) ) G(x)=sign(f(x))=sign(\sum_{m=1}^M\alpha_mG_m(x)) G(x)=sign(f(x))=sign(m=1MαmGm(x))不断调整训练数据集数据的权重学习
Boosting f ( x ) = ∑ m = 1 M β m b ( x ; r m ) f(x)=\sum_{m=1}^M\beta_mb(x;r_m) f(x)=m=1Mβmb(x;rm)前向分布算法
BDT f M = ∑ m = 1 M T ( x ; θ m ) f_M=\sum_{m=1}^M T(x;\theta_m) fM=m=1MT(x;θm) T ( X ) = ∑ j = 1 J c j I ( X ∈ R j ) T(X)=\sum_{j=1}^Jc_jI(X\in R^j) T(X)=j=1JcjI(XRj)最小二乘残差
GBDT f M = ∑ m = 1 M T ( x ; θ m ) f_M=\sum_{m=1}^M T(x;\theta_m) fM=m=1MT(x;θm)最小二乘f(x)的一阶梯度
xgboost f ( X ) = W q ( X ) f(X)=W_{q(X)} f(X)=Wq(X)
y i ^ = ∑ k = 1 K f k ( X i ) \hat{y_i}=\sum_{k=1}^Kf_k(X_i) yi^=k=1Kfk(Xi)
L = ∑ i = 1 n l ( y i , y ^ i ) + ∑ K Ω ( f k ) L=\sum_{i=1}^nl(y_i,\hat{y}_i)+\sum_K \Omega(f_k) L=i=1nl(yi,y^i)+KΩ(fk)
最小二乘一阶梯度,二阶梯度
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值