应用场景
推荐系统:分析购买某类商品的潜在因素,判断该类商品的购买概率。挑选购买过的人群A和未购买的人群B,获取两组人群不同的用户画像和行为特征数据。建立用户行为模型、商品推荐模型实现产品的自动推荐。
公式
对于二分类问题,给定训练集{(x1, y1), …, xn,yn)},其中xi表示第i个用户的p维特征,yi∈{0,1}表示第i个用户是否购买过该商品。
模型满足二项式分布:
P
(
y
i
∣
x
i
)
=
u
(
x
i
)
y
(
1
−
u
(
x
i
)
)
(
1
−
y
i
)
P(y_{i}|x_{i})=u(x_{i})^y(1-u(x_{i}))^{(1-y_{i})}
P(yi∣xi)=u(xi)y(1−u(xi))(1−yi)
u
(
x
i
)
=
1
1
+
(
e
−
x
i
T
θ
)
u(x_i)=\tfrac{1}{1+(e^{-x_i^T\theta})}
u(xi)=1+(e−xiTθ)1
其中,θ为模型参数,包含该商品的偏置项。
通过最大似然估计来求解:
L
=
P
(
y
1
,
.
.
.
,
y
n
∣
x
1
,
.
.
.
,
x
n
;
θ
)
=
∏
i
=
1
n
P
(
y
i
∣
x
i
;
θ
)
=
∏
i
=
1
n
u
(
x
i
)
y
i
(
1
−
u
(
x
i
)
1
−
y
i
)
L=P(y_1,...,y_n|x_1,...,x_n;\theta) =\prod_{i=1}^{n}P(y_i|x_i;\theta) =\prod_{i=1}^{n}u(x_i)^{y_i}(1-u(x_i)^{1-y_i})
L=P(y1,...,yn∣x1,...,xn;θ)=i=1∏nP(yi∣xi;θ)=i=1∏nu(xi)yi(1−u(xi)1−yi)
进一步可以得到负对数似然函数:
L
(
θ
)
=
−
l
o
g
P
(
y
1
,
.
.
.
,
y
n
∣
x
1
,
.
.
.
,
x
n
;
θ
,
b
)
=
−
∑
i
n
(
(
y
i
l
o
g
u
(
x
i
)
)
+
(
1
−
y
i
)
l
o
g
(
1
−
u
(
x
i
)
)
)
L(\theta)=-logP(y_1,...,y_n|x_1,...,x_n;\theta,b) =-\sum_{i}^{n}((y_ilogu(x_i))+(1-y_i)log(1-u(x_i)))
L(θ)=−logP(y1,...,yn∣x1,...,xn;θ,b)=−i∑n((yilogu(xi))+(1−yi)log(1−u(xi)))
采用随机梯度下降法来求解数值:
θ
=
a
r
g
m
i
n
θ
∑
i
n
(
y
i
l
o
g
u
(
x
i
)
+
(
1
−
y
i
)
l
o
g
(
1
−
u
(
x
i
)
)
)
\theta=argmin_{\theta}\sum_{i}^{n}(y_ilogu(x_i)+(1-y_i)log(1-u(x_i)))
θ=argminθi∑n(yilogu(xi)+(1−yi)log(1−u(xi)))
对参数θ求导得到:
∂
L
∂
θ
=
∑
i
n
(
g
(
x
i
T
θ
)
−
y
i
)
x
i
\frac{\partial L}{\partial \theta}=\sum_{i}^{n}(g(x_i^T\theta)-y_i)x_i
∂θ∂L=i∑n(g(xiTθ)−yi)xi
g
(
x
)
=
1
1
−
e
−
x
g(x)=\tfrac{1}{1-e^{-x}}
g(x)=1−e−x1
进一步可以得到:
θ
t
+
1
=
θ
t
−
ρ
(
g
(
x
i
T
θ
)
−
y
i
)
x
i
\theta^{t+1}=\theta^{t}-\rho (g(x_i^T\theta)-y_i)x_i
θt+1=θt−ρ(g(xiTθ)−yi)xi
其中,0<ρ<1是步长参数。
代码实现
损失函数的定义,正则化防止过拟合
l
o
s
s
=
−
1
m
[
∑
i
=
1
m
(
y
i
l
o
g
(
u
θ
(
x
i
)
)
+
(
1
−
y
i
)
l
o
g
(
1
−
u
θ
(
x
i
)
)
)
]
+
λ
1
2
m
∑
j
=
1
n
θ
j
2
loss=-\tfrac{1}{m}[\sum_{i=1}^{m}(y_ilog(u_\theta(x_i))+(1-y_i)log(1-u_\theta(x_i)))]+\lambda\tfrac{1}{2m}\sum_{j=1}^{n}\theta_j^2
loss=−m1[i=1∑m(yilog(uθ(xi))+(1−yi)log(1−uθ(xi)))]+λ2m1j=1∑nθj2
import random
import numpy as np
class LogisticRegression(object) :
def __init__(self, x, y, lr=0.0005, lam=0.1):
'''
x: features of examples
y: label of examples
lr: learning rate
lambda: penality on theta
'''
self.x = x
self.y = y
self.lr = lr
self.lam = lam
n = self.x.shape
self.theta = np.array ([0.0] * (n + 1))
def _sigmoid (self , x) :
z = 1.0/(1.0 + np.exp((-1) * x))
return z
def loss_function (self):
u = self._sigmoid(np.dot(self.x, self.theta)) # u(xi)
c1 = (-1) * self.y * np.log (u)
c2 = (1.0 - self.y) * np.log(1.0 - u)
# 计算交叉熵 L(θ)求均值
loss = np.average(sum(c1 - c2) + 0.5 * self.lam * sum(self.theta[1:] ** 2))
return loss
def _gradient(self, iterations) :
# m是样本数, p是特征数
m, p = self.x.shape
for i in range(0, iterations):
u = self._sigmoid(np.dot(self.x, self.theta))
diff = self.theta - self.y
for _ in range(0, p):
self.theta[_]= self.theta[_] - self.lr * (1.0 / m) * (sum(diff * self.x [:, _]) + self.lam * m * self.theta[_])
cost= self.loss_function()
def run(self, iteration):
self. _gradient(iteration)
def predict(self, x):
preds = self._sigmoid(np.dot(x, self.theta))
np.putmask (preds, preds >= 0.5, 1.0)
np.putmask (preds, preds < 0.5, 0.0)
return preds
参考文献
《推荐系统与深度学习》黄昕等