LR with spark liblinear

Logistic Regression

sparkliblinear 库的类关系图

这里写图片描述
1、LR

Given a set of training label-instance pairs ${(x_ i ,y_ i )}^ l_{i=1} ,
x i \in \mathbb{R}^ n , y i \in{−1,1}, \forall{i} $

LR with L2 reg model considers the following optimization problem:

minwf(w)=12wTw+Ci=1llog(1+exp(yiwTxi))1 min w f ( w ) = 1 2 w T w + C ∑ i = 1 l l o g ( 1 + e x p ( − y i w T x i ) ) ( 1 )
minwf(w)=12wTw+Ci=1llog(1+exp(yiwTxi))1
minwf(w)=12wTw+Ci=1llog(1+exp(yiwTxi))1
minwf(w)=12wTw+Ci=1llog(1+exp(yiwTxi))1
minwf(w)=12wTw+Ci=1llog(1+exp(yiwTxi))1
minwf(w)=12wTw+Ci=1llog(1+exp(yiwTxi))1
minwf(w)=12wTw+Ci=1llog(1+exp(yiwTxi))1
minwf(w)=12wTw+Ci=1llog(1+exp(yiwTxi))1
minwf(w)=12wTw+Ci=1llog(1+exp(yiwTxi))1
minwf(w)=12wTw+Ci=1llog(1+exp(yiwTxi))1

2、A Trust Region Newton Method(TRON)

TRON obtains the truncated Newton step by approximately solving

mindqt(d)subject todΔt2 min d q t ( d ) subject to ‖ d ‖ ≤ Δ t ( 2 )
mindqt(d)subject todΔt2
mindqt(d)subject todΔt2
mindqt(d)subject todΔt2
mindqt(d)subject todΔt2
mindqt(d)subject todΔt2
mindqt(d)subject todΔt2
mindqt(d)subject todΔt2
mindqt(d)subject todΔt2
mindqt(d)subject todΔt2

Δt Δ t Δt Δt Δt Δt Δt Δt Δt Δt Δt is the size of the trust region, qt(d)=f(wt)Td+12dT2f(xt)d q t ( d ) = ∇ f ( w t ) T d + 1 2 d T ∇ 2 f ( x t ) d qt(d)=f(wt)Td+12dT2f(xt)d qt(d)=f(wt)Td+12dT2f(xt)d qt(d)=f(wt)Td+12dT2f(xt)d qt(d)=f(wt)Td+12dT2f(xt)d qt(d)=f(wt)Td+12dT2f(xt)d qt(d)=f(wt)Td+12dT2f(xt)d qt(d)=f(wt)Td+12dT2f(xt)d qt(d)=f(wt)Td+12dT2f(xt)d qt(d)=f(wt)Td+12dT2f(xt)d
is the second-order Taylor approximation of f(wt+d)f(wt) f ( w t + d ) − f ( w t ) f(wt+d)f(wt) f(wt+d)f(wt) f(wt+d)f(wt) f(wt+d)f(wt) f(wt+d)f(wt) f(wt+d)f(wt) f(wt+d)f(wt) f(wt+d)f(wt) f(wt+d)f(wt) .

applying CG(Conjugate Gradient) to slove (2)

2.1 Distributed Algorithm

We partition the data matrix X and the labels Y into
disjoint p parts.

X = [ X 1 , . . . , X p ] T , Y = d i a g ( y 1 , . . . , y l ) = [ Y 1... Y p ] , σ ( v ) ≡ [ 1 + e x p ( − v 1 ) , . . . , 1 + e x p ( − v n ) ] T
X=[X1,...,Xp]T,Y=diag(y1,...,yl)=[Y1...Yp],σ(v)[1+exp(v1),...,1+exp(vn)]TX=[X1,...,Xp]T,Y=diag(y1,...,yl)=[Y1...Yp],σ(v)[1+exp(v1),...,1+exp(vn)]T
X=[X1,...,Xp]T,Y=diag(y1,...,yl)=[Y1...Yp],σ(v)[1+exp(v1),...,1+exp(vn)]TX=[X1,...,Xp]T,Y=diag(y1,...,yl)=[Y1...Yp],σ(v)[1+exp(v1),...,1+exp(vn)]T
X=[X1,...,Xp]T,Y=diag(y1,...,yl)=[Y1...Yp],σ(v)[1+exp(v1),...,1+exp(vn)]TX=[X1,...,Xp]T,Y=diag(y1,...,yl)=[Y1...Yp],σ(v)[1+exp(v1),...,1+exp(vn)]T
X=[X1,...,Xp]T,Y=diag(y1,...,yl)=[Y1...Yp],σ(v)[1+exp(v1),...,1+exp(vn)]TX=[X1,...,Xp]T,Y=diag(y1,...,yl)=[Y1...Yp],σ(v)[1+exp(v1),...,1+exp(vn)]T
X=[X1,...,Xp]T,Y=diag(y1,...,yl)=[Y1...Yp],σ(v)[1+exp(v1),...,1+exp(vn)]TX=[X1,...,Xp]T,Y=diag(y1,...,yl)=[Y1...Yp],σ(v)[1+exp(v1),...,1+exp(vn)]T

We can observe that for computing (12)-(14), only the data partition Xk X k Xk Xk Xk Xk Xk Xk Xk Xk Xk is needed in computing. Therefore, the computation can be done in parallel, with the partitions being stored distributedly. After the map functions are computed, we need to reduce the results to the machine performing the TRON algorithm in order to obtain the summation over all partitions.

3、Implement Design

1) Loop Structure: choose the while loop to implement the software

2) Data Encapsulation:

AA uses two arrays to store indices and feature values of an instance:

ndex1 index2 index3 index4 index5 …

value1 value2 value3 value4 value5 …

3) Using mapPartitions Rather Than map

4) not to cache σ(YkXkw) σ ( Y k X k w ) σ(YkXkw) σ(YkXkw) σ(YkXkw) σ(YkXkw) σ(YkXkw) σ(YkXkw) σ(YkXkw) σ(YkXkw) σ(YkXkw)

5) Using Broadcast Variables

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值