【逻辑回归分类】

好好学习_rich

于 2023-01-07 22:14:12 发布

阅读量270

点赞数

分类专栏：分类线性模型文章标签：逻辑回归分类

本文链接：https://blog.csdn.net/Four2017/article/details/128593736

版权

线性模型同时被 2 个专栏收录

8 篇文章 1 订阅

订阅专栏

分类

1 篇文章 0 订阅

订阅专栏

逻辑回归做分类

二分类
多分类
sklearn中LogisticRegression的solvers
超参数选择
MNIST 数字分类任务例子

逻辑回归是一种用于分类的线性模型，注意到因变量target为分类型变量。

二分类

假设 $Y=\{0,1\}$ ，预测是正类 $P(y_i=1|X_i)$ 的概率为：

$\hat{p}(X_i)=expit(X_iw+w_0)=\frac{1}{1+exp(-X_iw-w_0)}$

其目标函数：

$\min_w C\sum_{i=1}^n(-y_ilog(\hat{p}(X_i))-(1-y_i)log(1-\hat{p}(X_i)))+r(w)$

其中 $r (w)$ 是正则化参数。在sklearn中有4种选择：

在这里插入图片描述

对于 ElasticNet， $\rho$ 对应于l1_ratio参数。当 $\rho=1$ 时 ElasticNet等价于 $l_1$ ；当 $\rho=0$ 时， ElasticNet等价于 $l_2$ 。

多分类

假设 $Y=\{1,\cdots,K\}$ 。predict_proba预测属于第 $K$ 类的概率 $P(y_i=k|X_i)$ 为：

$\hat{p}_k(X_i)=\frac{exp(X_iW_k+W_{0,k})}{\sum_{l=0}^{K-1}exp(X_iW_l+W_{0,l}}$

目标函数为：

$\min_W -C\sum_{i=1}^n\sum_{k=0}^{K-1}[y_i=k]log(\hat{p}_k(X_i))+r(W)$

此时 $r (W)$ 对应的4种选择为：

在这里插入图片描述

sklearn中LogisticRegression的solvers

1、solver=“liblinear”
使用coordinate descent (CD)算法，但是对于多分类的效果不好。
2、solver=“sag”
使用Stochastic Average Gradient descent算法，对于大量数据集，它处理更快。
3、solver=“saga”
是solver=“sag”的一种优化，也支持 $l_1$ 正则化，也支持penalty="elasticnet"，对于大样本处理更快。
4、solver=“lbfgs”
是一种优化算法(近似the Broyden–Fletcher–Goldfarb–Shanno算法)，它是默认设置，因为它更适合广泛的不同训练集，但是它在处理0-1编码的分类特征数据时表现不好。
5、solver=“newton-cholesky”
对于n_samples >> n_features的数据，它是非常好的选择，但是仅支持 $l_2$ 正则化。

“lbfgs”, “newton-cg” and “sag”仅支持 $l_2$ 正则化或无正则化，对于高维数据，处理更快，且多分类效果也更好。

总结一下见下表：

在这里插入图片描述

超参数选择

一般用LogisticRegressionCV选择最优超参数C和l1_ratio。通常，“newton-cg”, “sag”, “saga” and “lbfgs” solvers对高维数据处理更快。

MNIST 数字分类任务例子

使用SAGA算法，处理大样本数据更快，且用 $l_1$ 正则化进行分类。

import time
import matplotlib.pyplot as plt
import numpy as np

from sklearn.datasets import fetch_openml
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.utils import check_random_state

# Turn down for faster convergence
t0 = time.time()
train_samples = 5000

# Load data from https://www.openml.org/d/554
X, y = fetch_openml(
    "mnist_784", version=1, return_X_y=True, as_frame=False
)

random_state = check_random_state(0)
permutation = random_state.permutation(X.shape[0])
X = X[permutation]
y = y[permutation]
X = X.reshape((X.shape[0], -1))

X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=train_samples, test_size=10000
)#划分训练集和测试集

scaler = StandardScaler()#建模之前标准化处理数据
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Turn up tolerance for faster convergence
clf = LogisticRegression(C=50.0 / train_samples, penalty="l1", solver="saga", tol=0.1)
clf.fit(X_train, y_train)
sparsity = np.mean(clf.coef_ == 0) * 100
score = clf.score(X_test, y_test)
# print('Best C % .4f' % clf.C_)
print("Sparsity with L1 penalty: %.2f%%" % sparsity)
print("Test score with L1 penalty: %.4f" % score)

coef = clf.coef_.copy()
plt.figure(figsize=(10, 5))
scale = np.abs(coef).max()
for i in range(10):
    l1_plot = plt.subplot(2, 5, i + 1)
    l1_plot.imshow(
        coef[i].reshape(28, 28),
        interpolation="nearest",
        cmap=plt.cm.RdBu,
        vmin=-scale,
        vmax=scale,
    )
    l1_plot.set_xticks(())
    l1_plot.set_yticks(())
    l1_plot.set_xlabel("Class %i" % i)
plt.suptitle("Classification vector for...")

run_time = time.time() - t0
print("Example run in %.3f s" % run_time)