逻辑回归
sigmoid函数:
逻辑回归公式:
逻辑回归损失函数:
与线性回归原理相同,但由于是分类问题,损失函数不一样,只能通过梯度下降求解
对数似然损失函数:
完整的损失函数:
注:cost损失的值越小,那么预测的类别准确度越高
sklearn逻辑回归API:sklearn.linear_model.LogisticRegression
LogisticRegression总结:
优点:适合需要得到一个分类概率的场景,简单,速度快
缺点:不好处理多分类问题
应用:广告点击率、是否患病、金融诈骗、是否为虚假账号
# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
def my_logistic():
"""
逻辑回归做二分类,癌症预测
:return: None
"""
# 读取数据,指定数据读取的列名
column = ['Sample code number', 'Clump Thickness',
'Uniformity of Cell Size', 'Uniformity of Cell Shape',
'Marginal Adhesion', 'Single Epithelial Cell Size',
'Bare Nuclei', 'Bland Chromatin', 'Normal Nucleoli',
'Mitoses', 'Class']
data = pd.read_csv(
"https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data",
names=column)
# 替换?为np.nan, 删除?标记的缺失值
data = data.replace(to_replace="?", value=np.NaN)
data = data.dropna()
print(data)
# 进行数据分割,取出特征值和目标值
x = data[column[1: 10]]
y = data[column[10]]
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)
# 进行数据的标准化
std = StandardScaler()
x_train = std.fit_transform(x_train)
x_test = std.transform(x_test)
# 逻辑回归预测
lr = LogisticRegression()
lr.fit(x_train, y_train)
print("得出的权重参数为:", lr.coef_)
y_predict = lr.predict(x_test)
print("预测的精确率和召回率为:", "\n", classification_report(y_test, y_predict,
labels=[2, 4],
target_names=["良性", "恶性"]))
if __name__ == '__main__':
my_logistic()