对数几率回归

利用sklearn中的load_breast_cancer数据集进行对数几率回归分类


先导入一下数据,再把矩阵调整成按列存储每个样本的形式

cancer = datasets.load_breast_cancer()
X,y = cancer['data'], cancer['target']
X = StandardScaler().fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, random_state=42)

m = X_train.shape[0]
m_test = X_test.shape[0]
n = X_train.shape[1]
X_train = np.array(X_train)
x = np.concatenate((X_train,np.ones((m,1),dtype=X_train.dtype)), axis=1).T
beta = np.zeros((n+1,1))
y_train.reshape(1,m)

定义sigmoid函数 S ( z ) = 1 1 + e − z S(z)=\frac{1}{1+e^{-z}} S(z)=1+ez1

def Si(z):
    return 1/(1+np.exp(-z))

强制确定牛顿迭代法的迭代深度,再进行迭代。老师的课件上给出的是累加的形式,但是一阶导可以用矩阵的形式表达,二阶导就是一个向量乘自己的转置,可以把 p 1 ( 1 − p 1 ) p_1(1-p_1) p1(1p1)看作是每一项的系数,不妨把 p 1 , i p_{1,i} p1,i升为一个对角矩阵,然后再利用矩阵实现。

Max_Ite = 50
for t in range(Max_Ite):
    p1 = Si(np.dot(beta.T, x))
    grad = np.dot(x,(p1-y_train).T)
    gg2 = np.dot(x, np.dot(np.diag((p1-p1*p1).reshape(m,)),x.T))
    beta = beta - np.linalg.solve(gg2,grad)
#print("beta=", beta.T)

最后可以把 β = < ω , b > \beta=<\omega, b> β=<ω,b>打出来

利用对率回归进行预测,并检测准确性

y_pred = np.empty_like(y_test)
X_test = np.concatenate((X_test,np.ones((m_test,1),dtype=X_test.dtype)), axis=1).T
l = Si(np.dot(beta.T, X_test)).flatten()
for i in range(m_test):
    if l[i]>0.5:
        y_pred[i] = 1
    else:
        y_pred[i] = 0
print('precision: ', accuracy_score(y_test, y_pred))

但是神奇的是运行的中间结果爆掉了,算出来的 β \beta β不收敛。这组数据的大小是

print("shape=", n, m)

shape= 30 398

然后我就把每个数据第26个特征起以后的属性都不要了,也就是:

n = 25
X_train = X_train[:,0:n]
X_test = X_test[:,0:n]

最终得到的结果也还行:

precision:  0.9532163742690059

以下是完整代码:

#!/usr/bin/env python
# coding: utf-8

import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# import data
cancer = datasets.load_breast_cancer()
X,y = cancer['data'], cancer['target']
X = StandardScaler().fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, random_state=42)

n = 25
X_train = X_train[:,0:n]
X_test = X_test[:,0:n]

m = X_train.shape[0]
m_test = X_test.shape[0]
n = X_train.shape[1]
X_train = np.array(X_train)
x = np.concatenate((X_train,np.ones((m,1),dtype=X_train.dtype)), axis=1).T
beta = np.zeros((n+1,1))
y_train.reshape(1,m)
#print("shape=", n, m)

def Si(z):
    return 1/(1+np.exp(-z))
Max_Ite = 50
for t in range(Max_Ite):
    p1 = Si(np.dot(beta.T, x))
    grad = np.dot(x,(p1-y_train).T)
    gg2 = np.dot(x, np.dot(np.diag((p1-p1*p1).reshape(m,)),x.T))
    beta = beta - np.linalg.solve(gg2,grad)

#print("beta=", beta.T)
y_pred = np.empty_like(y_test)
X_test = np.concatenate((X_test,np.ones((m_test,1),dtype=X_test.dtype)), axis=1).T
l = Si(np.dot(beta.T, X_test)).flatten()
for i in range(m_test):
    if l[i]>0.5:
        y_pred[i] = 1
    else:
        y_pred[i] = 0

print('precision: ', accuracy_score(y_test, y_pred))

输出的 β \beta β和准确度为

beta= [[ -25.7523818     3.17634347  108.15206595  -43.22151497    1.56521521
    50.74155133  -56.88274302  -11.26094169   -2.83976125  -26.87876068
   -74.99938566    9.57940252   40.73596439   20.84486128    3.88865309
   -18.76046644   20.32546863  -20.28042746    4.7779897    24.7781943
  -112.76540417  -23.93186562  -81.68191927  119.62649284   -7.53212998
    -2.20112507]]
precision:  0.9532163742690059

截图如下
代码及运行结果

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值