scikit-learn机器学习笔记——逻辑斯蒂回归

最新推荐文章于 2023-12-04 14:53:23 发布

学习爱好者fz

最新推荐文章于 2023-12-04 14:53:23 发布

阅读量448

点赞数 1

分类专栏： scikit-learn机器学习文章标签：机器学习 python 逻辑回归

本文链接：https://blog.csdn.net/weixin_45031468/article/details/113823675

版权

scikit-learn机器学习专栏收录该内容

9 篇文章 2 订阅

订阅专栏

scikit-learn机器学习笔记——逻辑斯蒂回归

逻辑回归公式
逻辑回归的损失函数
sklearn逻辑回归API
LogisticRegression回归案例：良／恶性乳腺癌肿瘤预测
- pandas使用
- 良／恶性乳腺癌肿分类流程

逻辑回归公式

公式:
$\begin{array}{c} h_{\theta}(x)=g\left(\theta^{T} x\right)=\frac{1}{1+e^{-\theta^{T} x}} \\ g(z)=\frac{1}{1+e^{-z}} \end{array}$
输出：[0,1]区间的概率值，默认0.5作为阀值
注：g(z)为sigmoid函数

sigmoid函数图形：
在这里插入图片描述

逻辑回归的损失函数

与线性回归原理相同,但由于是分类问题，损失函数不一样，只能通过梯度下降求解。

$\operatorname{cost}\left(h_{\theta}(x), y\right)=\left\{\begin{array}{ll} -\log \left(h_{\theta}(x)\right) & \text { if } \mathrm{y}=1 \\ -\log \left(1-h_{\theta}(x)\right) & \text { if } \mathrm{y}=0 \end{array} \longrightarrow-\log P(Y \mid X)\right.$
完整的损失函数:
$\operatorname{cost}\left(h_{\theta}(x), y\right)=\sum_{i=1}^{m}-y_{i} \log \left(h_{\theta}(x)\right)-\left(1-y_{i}\right) \log \left(1-h_{\theta}(x)\right)$
cost损失的值越小，那么预测的类别准确度更高。

sklearn逻辑回归API

• sklearn.linear_model.LogisticRegression(penal ty=‘l2’, C = 1.0)
• Logistic回归分类器
• coef_：回归系数

LogisticRegression回归案例：良／恶性乳腺癌肿瘤预测

原始数据的下载地址： https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data

数据描述
（1）699条样本，共11列数据，第一列用语检索的id，后9列分别是与肿瘤相关的医学特征，最后一列表示肿瘤类型的数值。
（2）包含16个缺失值，用”?”标出。

pandas使用

• pd.read_csv(’’,names=column_names)
• column_names：指定类别名字,[‘Sample code number’,‘Clump Thickness’, ‘Uniformity of Cell Size’,‘Uniformity of Cell Shape’,‘Marginal Adhesion’, ‘Single Epithelial Cell Size’,‘Bare Nuclei’,‘Bland Chromatin’,‘Normal Nucleoli’,‘Mitoses’,‘Class’]
• replace(to_replace=’’,value=)：替代数据
• dropna():返回数据

良／恶性乳腺癌肿分类流程

1、网上获取数据（工具pandas）
2、数据缺失值处理、标准化
3、LogisticRegression估计器流程

代码示例：

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np





def logistic_regression():
    '''
    逻辑斯蒂回归案例
    :return: None
    '''
    #读取数据
    colnames = ['Sample code number','Clump Thickness',
                'Uniformity of Cell Size','Uniformity of Cell Shape',
                'Marginal Adhesion', 'Single Epithelial Cell Size',
                'Bare Nuclei','Bland Chromatin','Normal Nucleoli',
                'Mitoses','Class']

    data = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data", names=colnames)
    #特征工程
    data = data.replace("?", np.nan)
    data = data.dropna()
    #划分数据集
    X_train, X_test, y_train, y_test = train_test_split(data[colnames[1:10]], data[colnames[10]], test_size=0.25)
    #标准化数据集
    std = StandardScaler()
    X_train = std.fit_transform(X_train)
    X_test = std.transform(X_test)
    #实例化模型
    lg = LogisticRegression()
    #训练
    lg.fit(X_train, y_train)
    #预测
    y_pre = lg.predict(X_test)
    acc = lg.score(X_test, y_test)
    mat = classification_report(y_true=y_test, y_pred=y_pre, labels=[2, 4], target_names=['良性', '恶性'])
    #打印结果
    print("预测结果：", y_pre)
    print("准确率为：", acc)
    print("混淆矩阵为：", mat)
    
    return None



if __name__ == '__main__':
    logistic_regression()

所得结果：

预测结果： [2 4 2 4 2 2 2 4 2 2 2 4 2 2 2 2 2 4 2 2 2 4 2 4 2 4 2 2 2 2 2 2 4 2 2 4 2
 4 2 2 4 4 4 2 2 2 2 2 2 4 4 2 4 2 2 4 2 2 4 2 2 2 2 4 4 2 4 2 4 2 4 4 2 4
 4 4 2 2 2 2 2 2 2 4 4 4 4 2 4 4 4 2 2 4 2 2 4 2 4 4 2 2 2 2 2 2 4 2 2 2 2
 2 2 2 4 2 2 4 4 2 4 4 2 2 2 2 4 4 2 4 2 2 4 4 2 4 2 4 4 4 2 2 2 2 2 4 2 2
 2 2 4 2 2 2 2 4 4 2 2 4 2 2 4 4 4 2 4 2 4 2 2]
准确率为： 0.9649122807017544
混淆矩阵为：               precision    recall  f1-score   support
          良性       0.95      0.99      0.97       103
          恶性       0.98      0.93      0.95        68
    accuracy                           0.96       171
   macro avg       0.97      0.96      0.96       171
weighted avg       0.97      0.96      0.96       171

学习爱好者fz

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
scikit-learn机器学习笔记——逻辑斯蒂回归

scikit-learn机器学习笔记——逻辑斯蒂回归逻辑回归公式逻辑回归的损失函数sklearn逻辑回归APILogisticRegression回归案例：良／恶性乳腺癌肿瘤预测pandas使用良／恶性乳腺癌肿分类流程逻辑回归公式公式:hθ(x)=g(θTx)=11+e−θTxg(z)=11+e−z\begin{array}{c}h_{\theta}(x)=g\left(\theta^{T} x\right)=\frac{1}{1+e^{-\theta^{T} x}} \\g(z)=\frac{
复制链接

扫一扫

专栏目录