逻辑回归就是把线性回归的输出做了一个映射,把结果映射成一个概率问题。以实现分类
代码实现
sigmoid:
f
(
x
)
=
1
1
+
e
−
x
f(x) = \frac{1}{1 + e^{-x}}
f(x)=1+e−x1
损失函数:
J
(
θ
)
=
−
l
(
θ
)
=
−
∑
i
=
1
n
[
y
(
i
)
ln
(
h
θ
(
x
(
i
)
)
)
+
(
1
−
y
(
i
)
)
ln
(
1
−
h
θ
(
x
(
i
)
)
)
]
J(\theta) = -l(\theta) = -\sum\limits_{i = 1}^n[y^{(i)}\ln(h_{\theta}(x^{(i)})) + (1-y^{(i)})\ln(1-h_{\theta}(x^{(i)}))]
J(θ)=−l(θ)=−i=1∑n[y(i)ln(hθ(x(i)))+(1−y(i))ln(1−hθ(x(i)))]
线性函数
y
=
X
W
+
b
y=XW + b
y=XW+b
梯度更新
θ
j
t
+
1
=
θ
j
t
−
α
⋅
∑
i
=
1
n
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
\theta_j^{t+1} = \theta_j^t - \alpha \cdot \sum\limits_{i=1}^{n}(h_{\theta}(x^{(i)}) -y^{(i)})x_j^{(i)}
θjt+1=θjt−α⋅i=1∑n(hθ(x(i))−y(i))xj(i)
class My_LogisticRegression:
#默认没有正则化,正则项参数默认为1,学习率默认为0.001,迭代次数为10001次
def __init__(self,penalty = None,Lambda = 1,a = 0.001,epochs = 10001):
self.W = None
self.penalty = penalty
self.Lambda = Lambda
self.a = a
self.epochs =epochs
#sigmoid
self.sigmoid = lambda x:1/(1 + np.exp(-x))
#损失函数
def loss(self,x,y):
m=x.shape[0]
#转化成概率
p = self.sigmoid(x * self.W)
return (-1/m) * np.sum((np.multiply(y, np.log(p)) + np.multiply((1-y),np.log(1-p))))
#预测
def predict(self,X):
#加偏置
X = np.concatenate((np.ones((X.shape[0],1)),X),axis = 1)
y_p = np.mat(X) * self.W
#概率
p = self.sigmoid(y_p)
y_p = np.where(p>=0.5,1,0)
return y_p
def fit(self,x,y):
import numpy as np
lossList = []
#总样本数
m = x.shape[0]
#添加偏置项
X = np.concatenate((np.ones((m,1)),x),axis = 1)
#总特征数
n = X.shape[1]
#初始化W的值
self.W = np.mat(np.ones((n,1)))
xMat = np.mat(X)
yMat = np.mat(y.reshape(-1,1))
#初始化loss
loss = 0
#前一次的loss
pre_loss = loss + 1
#循环epochs次
for i in range(self.epochs):
#预测值
p = self.sigmoid(xMat * self.W)
gradient = xMat.T * (p - yMat)/m
#加入l1和l2正则项,和之前的线性回归正则化一样
if self.penalty == 'l2':
gradient = gradient + self.Lambda * np.linalg.norm(self.W, ord=2)
elif self.penalty == 'l1':
gradient = gradient + self.Lambda * np.linalg.norm(self.W, ord=1)
self.W = self.W-self.a * gradient
#当前的loss
pre_loss = loss
loss = self.loss(xMat,yMat)
if i % 50 == 0:
lossList.append(loss)
#损失没什么变化,收敛退出迭代
if np.abs(pre_loss - loss) < 0.002:
break
#返回系数,和损失列表
return self.W,lossList,i
加载乳腺癌数据
from sklearn import datasets
data = datasets.load_breast_cancer()
from sklearn.preprocessing import scale # 数据标准化Z-score
np.set_printoptions(suppress=True)
X, y = data['data'], data['target']
# 数据标准化Z-score
X = scale(X)
# print(X)
# display(X.shape,y.shape)
数据集切分
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2)
训练评估
import warnings
warnings.filterwarnings('ignore')
lgr = My_LogisticRegression(penalty="l1")
W,loss_list,times = lgr.fit(X_train,y_train)
# 预测评估
from sklearn.metrics import accuracy_score
y_pre = lgr.predict(X_test)
score = accuracy_score(y_test,y_pre)
print("L1正则化-逻辑回归的准确率:",score)
L1正则化-逻辑回归的准确率: 0.9298245614035088
对比sklearn
from sklearn.linear_model import LogisticRegression
lcs = LogisticRegression()
lcs.fit(X_train,y_train)
pre = lcs.predict(X_test)
score = accuracy_score(y_test,pre)
print("sklearn 逻辑回归的准确率:",score)
sklearn 逻辑回归的准确率: 0.9824561403508771