机器学习算法与Python实践(9) - 弹性网络（Elastic Net）

最新推荐文章于 2025-03-18 07:30:00 发布

shun.su

最新推荐文章于 2025-03-18 07:30:00 发布

阅读量2.1w

点赞数 3

分类专栏：机器学习文章标签： python 机器学习算法

本文链接：https://blog.csdn.net/m0_37167788/article/details/78657523

版权

机器学习专栏收录该内容

16 篇文章

订阅专栏

　　ElasticNet 是一种使用L1和L2先验作为正则化矩阵的线性回归模型.这种组合用于只有很少的权重非零的稀疏模型，比如:class:Lasso, 但是又能保持:class:Ridge 的正则化属性.我们可以使用 l1_ratio 参数来调节L1和L2的凸组合(一类特殊的线性组合)。
　　当多个特征和另一个特征相关的时候弹性网络非常有用。Lasso 倾向于随机选择其中一个，而弹性网络更倾向于选择两个.
　　在实践中，Lasso 和 Ridge 之间权衡的一个优势是它允许在循环过程（Under rotate）中继承 Ridge 的稳定性.
　　
弹性网络的目标函数是最小化：

$\underset{w}{min\,} { \frac{1}{2n_{samples}} ||X w - y||_2 ^ 2 + \alpha \rho ||w||_1 +\frac{\alpha(1-\rho)}{2} ||w||_2 ^ 2}$

ElasticNetCV 可以通过交叉验证来用来设置参数:
alpha ( $\alpha$ )，l1_ratio ( $\rho$ )

代码部分如下：

import numpy as np
from sklearn import linear_model
import warnings

warnings.filterwarnings('ignore')

###############################################################################  
# Generate sample data  
n_samples_train, n_samples_test, n_features = 75, 150, 500
np.random.seed(0)
coef = np.random.randn(n_features)
coef[50:] = 0.0  # only the top 10 features are impacting the model  
X = np.random.randn(n_samples_train + n_samples_test, n_features)
y = np.dot(X, coef)

# Split train and test data  
X_train, X_test = X[:n_samples_train], X[n_samples_train:]
y_train, y_test = y[:n_samples_train], y[n_samples_train:]

###############################################################################  
# Compute train and test errors  
alphas = np.logspace(-5, 1, 60)
enet = linear_model.ElasticNet(l1_ratio=0.7)
train_errors = list()
test_errors = list()
for alpha in alphas:
    enet.set_params(alpha=alpha)
    enet.fit(X_train, y_train)
    train_errors.append(enet.score(X_train, y_train))
    test_errors.append(enet.score(X_test, y_test))

i_alpha_optim = np.argmax(test_errors)
alpha_optim = alphas[i_alpha_optim]
print("Optimal regularization parameter : %s" % alpha_optim)

# Estimate the coef_ on full data with optimal regularization parameter  
enet.set_params(alpha=alpha_optim)
coef_ = enet.fit(X, y).coef_

###############################################################################  
# Plot results functions  

import matplotlib.pyplot as plt

plt.subplot(2, 1, 1)
plt.semilogx(alphas, train_errors, label='Train')
plt.semilogx(alphas, test_errors, label='Test')
plt.vlines(alpha_optim, plt.ylim()[0], np.max(test_errors), color='k',
           linewidth=3, label='Optimum on test')
plt.legend(loc='lower left')
plt.ylim([0, 1.2])
plt.xlabel('Regularization parameter')
plt.ylabel('Performance')

# Show estimated coef_ vs true coef  
plt.subplot(2, 1, 2)
plt.plot(coef, label='True coef')
plt.plot(coef_, label='Estimated coef')
plt.legend()
plt.subplots_adjust(0.09, 0.04, 0.94, 0.94, 0.26, 0.26)
plt.show()

结果如下图所示：

这里写图片描述

控制台结果如下：

这里写图片描述

elastic net的大部分函数也会与之前的大体相似，所以这里仅仅介绍一些比较经常用的到的或者特殊的参数或函数：

参数：
l1_ratio:在0到1之间，代表在l1惩罚和l2惩罚之间，如果l1_ratio=1，则为lasso，是调节模型性能的一个重要指标。
eps:Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3
n_alphas:正则项alpha的个数
alphas：alpha值的列表

返回值：
alphas：返回模型中的alphas值。
coefs：返回模型系数。shape=（n_feature,n_alphas）

函数：
score（X,y,sample_weight）:
评价模型性能的标准，值越接近1，模型效果越好。