机器学习笔记之LINEAR MODELS

Linear Models for Regression

  • For regression, general prediction formula should look like: y = w[0]*x[0]+w[1]*x[1]+…+w[p]*x[p]+b

  • For a data with a singLe feature, this formula would be y =b+w[0]*x[0]

  • Linear Regression, also knowns as ordinary leat squares(OLS), is the
    simplest and most classic linear method for regression

  • Linear regression finds the parameters w and b that minimize the mean squared error between predictions and the true regression targets

  • Ridge regression: formula used is the same as linear regression. However, the coefficients are chosen not only based on better prediction but also fitting an additional constraint # which is required to be as small as possible(restricting the effect of features on predictions, also known as regularization)

  • Lasso regression: Also restricts the coefficients to be close to zero, but in a slightly different way, called L1 regularization, which means some features are entirely ignored by the model.(Perhaps automatic feature selection?) Having some coefficient to be zero means it may become easier to reveal other important feature of the model.

在Python当中可以通过scikit-learn中的 LinearRegression 等实现

import matplotlib.pyplot as plt
import mglearn
'''引入LinearRegression,Ridge,Lasso三种linear models'''
from sklearn.linear_model import LinearRegression,Ridge,Lasso
from sklearn.model_selection import train_test_split
import numpy as np
'''根据使用模型种类来进行回归'''
def LinearRegressionTest(func,df,alpha = 1.0,random_state = 42,max_iter= 1):
    x, y = df
    x_train, x_test, y_train, y_test = train_test_split(x, y,random_state = random_state)
    lr = func(alpha).fit(x_train, y_train)
    print('lr_coef:%s' % (lr.coef_), 'lr_intercept:%s' % (lr.intercept_))
    print('Training score:%s' % (lr.score(x_train, y_train)), 'Test score: %s' % (lr.score(x_test, y_test)))
    if func == Lasso :
    print('Number of features used:%s'%(np.sum(lr.coef_!=0)))
    return [lr.coef_,lr.intercept_]

首先看一下LinearRegression的情况,以样本数目为60的Make_wave生成的样本为例

LinearRegressionTest(LinearRegression,mglearn.datasets.make_wave(n_samples = 60))

显示训练集和测试集表现很接近,但是仅仅0.66左右的R^2表明模型可能还是欠拟合的
在这里插入图片描述

再以 波士顿房价的数据集为例

LinearRegressionTest(LinearRegression,mglearn.datasets.load_extended_boston(),random_state=0)

可以发现这一次训练集与测试集表现相差较大,训练集过高可能过拟合了
在这里插入图片描述

接下来看一看Ridge的情况

LinearRegressionTest(Ridge,mglearn.datasets.load_extended_boston(),random_state=0)

测试集的表现得到提升,训练集表现下降,避免过拟合的情况下这样的结果可以预见
在这里插入图片描述

Rifdge 在模型的简易度和训练表现上做出了取舍,量化这个取舍的值就是alpha,通常来说alpha默认为1.0,增加alpha值意味着让拟合直线斜率更毕竟0,而训练表现更差(就是模型变简单了的感觉?)

下面通过变化alpha值看看

LinearRegressionTest(Ridge,mglearn.datasets.load_extended_boston(),random_state=0)
LinearRegressionTest(Ridge,mglearn.datasets.load_extended_boston(),alpha = 0.1,random_state=0)
LinearRegressionTest(Ridge,mglearn.datasets.load_extended_boston(),alpha = 10,random_state=0)

可以见到alpha值越大,训练表现越差
在这里插入图片描述
更直观一点,我们可以将不同alpha值下的斜率在图里表现出来

lr=LinearRegressionTest(LinearRegression,mglearn.datasets.load_extended_boston(),random_state=0)
ridge = LinearRegressionTest(Ridge,mglearn.datasets.load_extended_boston(),random_state=0)
ridge01 = LinearRegressionTest(Ridge,mglearn.datasets.load_extended_boston(),alpha = 0.1,random_state=0)
ridge10 =  LinearRegressionTest(Ridge,mglearn.datasets.load_extended_boston(),alpha = 10,random_state=0)



plt.plot(ridge[0],'s',label = 'Ridge alpha = 1')
plt.plot(ridge10[0],'^',label = 'Ridge alpha = 10')
plt.plot(ridge01[0],'v',label = 'Ridge alpha = 0.1')

plt.plot(lr[0],'o',label = 'Linear Regression')

plt.hlines(0,0,len(lr[0]))
plt.xlabel('Coefficient index')
plt.ylabel('Coefficient magnitude')
plt.ylim(-25,25)
plt.legend()
  • x值代表斜率与第x+1个特征有关,y值代表的是斜率的小数值
  • 可以看到ALPHA值越大,对于Ridge得到的斜率就越接近0,同时Ridge
    model得到的斜率总体上来说也较LinearRegression 小

在这里插入图片描述

这里将以上对Ridge进行的探索再对Lasso进行一遍

lasso = LinearRegressionTest(Lasso,mglearn.datasets.load_extended_boston(),random_state=0)

可以看到Lasso的表现是很差的,并且只用到了105个中的4个feature
在这里插入图片描述
根据之前对Ridge的探索,知道低ALPHA值意味着一个更复杂的模型,这里我们通过降低alpha增加Max_iter来看看是否能提升表现

lasso = LinearRegressionTest(Lasso,mglearn.datasets.load_extended_boston(),random_state=0)
lasso001 = LinearRegressionTest(Lasso,mglearn.datasets.load_extended_boston(),alpha = 0.01,random_state=0,max_iter=100000)
lasso0001 =  LinearRegressionTest(Lasso,mglearn.datasets.load_extended_boston(),alpha = 0.0001,random_state=0,max_iter=100000)

可以看到随着alpha值的降低和max_iter固定在100000之后,Lasso的训练集表现得到了提升,同时使用的features也变多了
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
更直观一点,我们可以将不同alpha值下的斜率在图里表现出来

ridge01= LinearRegressionTest(Ridge,mglearn.datasets.load_extended_boston(),alpha=0.1,random_state=0)
lasso = LinearRegressionTest(Lasso,mglearn.datasets.load_extended_boston(),random_state=0)
lasso001 = LinearRegressionTest(Lasso,mglearn.datasets.load_extended_boston(),alpha = 0.01,random_state=0,max_iter=100000)
lasso0001 =  LinearRegressionTest(Lasso,mglearn.datasets.load_extended_boston(),alpha = 0.0001,random_state=0,max_iter=100000)



plt.plot(lasso[0],'s',label = 'Lasso alpha = 1')
plt.plot(lasso001[0],'^',label = 'Lasso alpha = 0.01')
plt.plot(lasso0001[0],'v',label = 'Lasso alpha = 0.0001')

plt.plot(ridge01[0],'o',label = 'Ridge alphs = 0.1')

plt.hlines(0,0,len(ridge01[0]))
plt.xlabel('Coefficient index')
plt.ylabel('Coefficient magnitude')
plt.ylim(-25,25)
plt.legend()
plt.show()

Alpha = 1的时候,斜率基本都接近0,,随着alpha降低模型复杂起来,斜率逐渐远离0

在这里插入图片描述

Linear Models for Classification

  • For classification, general prediction formula should look like: y = w[0]*x[0]+w[1]*x[1]+…+w[p]*x[p]+b>0 其实就是线性规划吧
  • For classification, the decision boundary is a line, a plane or a hyper plane
  • Differences of learning linear models lie in the way in which they measure how well a particular combination o coefficients and intercept fits *if and what kind of regularization is applied

在Python当中主要通过LogisticRegression 和LinearSVC来进行

from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.datasets import load_breast_cancer
from sklearn.datasets import make_blobs
import numpy as np

先看一个例子

x, y = mglearn.datasets.make_forge()
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
for model, ax in zip([LogisticRegression, LinearSVC], axes):
    clf = model().fit(x, y)
    mglearn.plots.plot_2d_classification(clf, x, fill=False, ax=ax,eps = 0.5,alpha = .4)
    mglearn.discrete_scatter(x[:, 0], x[:, 1], y, ax=ax)
    ax.set_xlabel('Feature0')
    ax.set_ylabel('Feature1')
    ax.set_title('{}'.format(clf.__class__.__name__))
axes[0].legend()
plt.show()

可以看到两个模型得出了差不多(?这明明差很多教材说差不多)的分界面,但都存在错的情况。
在这里插入图片描述
这两个模型跟Ridge一样也存在一个取舍量化的值,C值越高,模型往更加拟合靠近。
看一个例子

'''both of the two methods uses same L2 regularization as Ridge '''
'''high value of c corresponds to less regularization'''
mglearn.plots.plot_linear_svc_regularization()
plt.show()

可以看随着C值增加,分类往之前没有解决的那个分类点靠近了
在这里插入图片描述

使用之前使用过的乳腺癌人群的数据继续探索C值的影响

'''
默认C=1的情况提供了很好的性能,但是训练集和测试集新能很相近,意味着可能模型是欠拟合的
'''

cancer  = load_breast_cancer()
x_train,x_test,y_train,y_test = train_test_split(cancer.data,cancer.target,stratify=cancer.target,random_state=42)
logreg = LogisticRegression().fit(x_train,y_train)
print('Train score:%s'%(logreg.score(x_train,y_train)),'Test score:%s'%(logreg.score(x_test,y_test)))


'''增加C值继续查看,训练和测试性能都得到提升'''
logreg_100 = LogisticRegression(C = 100).fit(x_train,y_train)
print('Train score:%s'%(logreg_100.score(x_train,y_train)),'Test score:%s'%(logreg.score(x_test,y_test)))



'''减少C值,性能随之降低'''
logreg_01 = LogisticRegression(C = 0.01).fit(x_train,y_train)
print('Train score:%s'%(logreg_01.score(x_train,y_train)),'Test score:%s'%(logreg.score(x_test,y_test)))
#

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
继续看不同C值对斜率的影响

plt.plot(logreg.coef_.T, 'o', label="C=1")
plt.plot(logreg_100.coef_.T, '^', label="C=100")
plt.plot(logreg_01.coef_.T, 'v', label="C=0.001")
plt.xticks(range(cancer.data.shape[1]), cancer.feature_names, rotation=90)
xlims = plt.xlim()
plt.hlines(0, xlims[0], xlims[1])
plt.xlim(xlims)
plt.ylim(-5, 5)
plt.xlabel("Feature")
plt.ylabel("Coefficient magnitude")
plt.legend()
plt.show()

C值越大,模型越拟合,斜率越远离0
在这里插入图片描述

Multiclassification
使用make_blobs产生一个随机的数据集来进行分类,基本思路和knn差不多,主要是算法的区别

'''multiclassification'''
fig,axes = plt.subplots(1,4,figsize = (128/3,3))
for i,ax in zip(range(3,7),axes):
    x, y = make_blobs(random_state = 42,centers = i)
    l_svc = LinearSVC().fit(x,y)
    # print('Shape of COEF',(l_svc.coef_.shape),'Shape of INTERCEPT:',(l_svc.intercept_.shape))
    # print(l_svc.coef_)
    mglearn.plots.plot_2d_classification(l_svc,x,fill = True,alpha = .4,eps = 0.5,ax = ax)
    mglearn.discrete_scatter(x[:,0],x[:,1],y,ax= ax)
    line = np.linspace(-15,15)
    for coef, intercept,color in zip(l_svc.coef_,l_svc.intercept_,mglearn.cm3.colors):
        ax.plot(line,-(line*coef[0]+intercept)/coef[1],c  = color)
    ax.set_xlabel('Feature0')
    ax.set_ylabel('Feature1')
    ax.legend()
plt.show()

在这里插入图片描述

总结

  • Main parameter for linear models is the regularization parameter:
    alpha in the regression models and C in Linear SVC and Logistic
    Regression
  • large values means simple models
  • only a few features - L1 regularization
  • Strengths: FAST TO TRAIN &FAST TO PREDICT(WORK WELL ON LARGE DATASETS WITH SPARSE DATA),EASY TO UNDERSTAND
  • Weaknesses: PERFORM NOT WELL ON LOWER-DIMENSIONAL SPACES
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值