Ridge Lasso Regression

最新推荐文章于 2024-05-18 16:27:44 发布

walter1990

最新推荐文章于 2024-05-18 16:27:44 发布

阅读量865

点赞数 1

分类专栏：机器学习

本文链接：https://blog.csdn.net/suichen1/article/details/50610855

版权

机器学习专栏收录该内容

33 篇文章 0 订阅

订阅专栏

import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
from sklearn.linear_model import LinearRegression
rcParams['figure.figsize'] = 12, 10

x = np.array([i*np.pi/180 for i in range(60, 300, 4)])
np.random.seed(0)

y = np.sin(x)+np.random.normal(0,0.15, len(x))

data = pd.DataFrame(np.column_stack([x,y]), columns=['x', 'y'])
#plt.plot(data['x'], data['y'], '.')

for i in range(2,16):
    colname='x_%d'%i
    data[colname]=data['x']**i

def linear_regression(data, power, models_to_plot):
    predictors=['x']
    
    if power >= 2:
        predictors.extend(['x_%d' % i for i in range(2, power+1)])
    
    linreg = LinearRegression(normalize=True)
    linreg.fit(data[predictors], data['y'])
    y_pred = linreg.predict(data[predictors])
    
    if power in models_to_plot:
        plt.subplot(models_to_plot[power])
        plt.tight_layout()
        plt.plot(data['x'], y_pred)
        plt.plot(data['x'], data['y'], '.')
        plt.title('Plot for power: %d'%power)
        
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([linreg.intercept_])
    ret.extend(linreg.coef_)
    return ret
    
col = ['rss','intercept'] + ['coef_x_%d'%i for i in range(1,16)]
ind = ['model_pow_%d'%i for i in range(1,16)]
coef_matrix_simple = pd.DataFrame(index=ind, columns=col)
models_to_plot = {1:231,3:232,6:233,9:234,12:235,15:236}

for i in range(1, 16):
    coef_matrix_simple.iloc[i-1,0:i+2] = linear_regression(data,power=i, models_to_plot=models_to_plot)

pd.options.display.float_format='{:,.2g}'.format

print(coef_matrix_simple)

It is clearly evident that the size of coefficients increase exponentially with increase in model complexity. I hope this gives some intuition into why putting a constraint on the magnitude of coefficients can be a good idea to reduce model complexity

Lets try to understand this even better.

What does a large coefficient signify? It means that we're putting a lot of emphasis on that feature, i.e. the particular feature is a good predictor for the outcome. when it becomes too large, the algorithm starts modelling intricate relations to estimate the output and ends up overfitting to the particular training data.

Ridge Rehgession:

parameter：alpha

I hope this gives some sense on how alpha would impact the magnitude of coefficients. One thing is for sure that any non-zero value would give values less than that of simple linear regression.

Keep in mind that normalizing the inputs is generally a good idea is every type of regression and should be used in case of ridge regression as well.

import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge

rcParams['figure.figsize'] = 12, 10

x = np.array([i*np.pi/180 for i in range(60, 300, 4)])
np.random.seed(0)

y = np.sin(x)+np.random.normal(0,0.15, len(x))

data = pd.DataFrame(np.column_stack([x,y]), columns=['x', 'y'])
#plt.plot(data['x'], data['y'], '.')

for i in range(2,16):
    colname='x_%d'%i
    data[colname]=data['x']**i

def linear_regression(data, power, models_to_plot):
    predictors=['x']
    
    if power >= 2:
        predictors.extend(['x_%d' % i for i in range(2, power+1)])
    
    linreg = LinearRegression(normalize=True)
    linreg.fit(data[predictors], data['y'])
    y_pred = linreg.predict(data[predictors])
    
    if power in models_to_plot:
        plt.subplot(models_to_plot[power])
        plt.tight_layout()
        plt.plot(data['x'], y_pred)
        plt.plot(data['x'], data['y'], '.')
        plt.title('Plot for power: %d'%power)
        
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([linreg.intercept_])
    ret.extend(linreg.coef_)
    return ret
def ridge_regression(data, predictors, alpha, models_to_plot={}):
    ridgereg = Ridge(alpha=alpha, normalize=True)
    ridgereg.fit(data[predictors], data['y'])
    y_pred = ridgereg.predict(data[predictors])
    
    if alpha in models_to_plot:
        plt.subplot(models_to_plot[alpha])
        plt.tight_layout()
        plt.plot(data['x'], y_pred)
        plt.plot(data['x'], data['y'], '.')
        plt.title('Plot for alpha: %.3g'%alpha)
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([ridgereg.intercept_])
    ret.extend(ridgereg.coef_)
    return ret
    
#col = ['rss','intercept'] + ['coef_x_%d'%i for i in range(1,16)]
#ind = ['model_pow_%d'%i for i in range(1,16)]
#coef_matrix_simple = pd.DataFrame(index=ind, columns=col)
#models_to_plot = {1:231,3:232,6:233,9:234,12:235,15:236}

#for i in range(1, 16):
#    coef_matrix_simple.iloc[i-1,0:i+2] = linear_regression(data,power=i, models_to_plot=models_to_plot)
#
#pd.options.display.float_format='{:,.2g}'.format
#
#print(coef_matrix_simple)

predictors=['x']
predictors.extend(['x_%d'%i for i in range(2,16)])
alpha_ridge = [1e-15, 1e-10, 1e-8, 1e-4, 1e-3,1e-2, 1, 5, 10, 20]
col = ['rss','intercept'] + ['coef_x_%d'%i for i in range(1,16)]
ind = ['alpha_%.2g'%alpha_ridge[i] for i in range(0,10)]
coef_matrix_ridge = pd.DataFrame(index=ind, columns=col)
models_to_plot = {1e-15:231, 1e-10:232, 1e-4:233, 1e-3:234, 1e-2:235, 5:236}

for i in range(10):
    coef_matrix_ridge.iloc[i,] = ridge_regression(data, predictors, alpha_ridge[i], models_to_plot)

Here we can clearly observe that as the value of alpha increases, the model complexity reduces.

Lets have a look at the value of coefficients in the above models:

This straight away gives us the following inferences:

1. The RSS increases with increase in alpha, this model complexity reduces

2. An alpha as small as 1e-15 gives us significant reduction in magnitude of coefficients. How? Compare the coefficients in the first row of this table to the last row of simple linear regression table.

3. High alpha values can lead to significant underfitting. Note that rapid increase in RSS for values of alpha greater than 1

4. Though the coefficients are very very small, they are NOT zero.

Lasso:

def lasso_regression(data, predictors, alpha, models_to_plot={}):
    lassoreg = Lasso(alpha, normalize=True, max_iter=1e5)
    lassoreg.fit(data[predictors],data['y'])
    y_pred = lassoreg.predict(data[predictors])

    if alpha in models_to_plot:
        plt.subplot(models_to_plot[alpha])
        plt.tight_layout()
        plt.plot(data['x'],y_pred)
        plt.plot(data['x'],data['y'],'.')
        plt.title('Plot for alpha: %.3g'%alpha)
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([lassoreg.intercept_])
    ret.extend(lassoreg.coef_)
    return ret

Notice the additional parameters defined in Lasso function——'max_iter'. This is the maximum number of iterations for which we want the model to run if it doesn't converge before. This exists for Ridge as well but setting this to a higher than default value was required in this case.

This again tells us that the model complexity decreases with increase in the values of alpha. But notice the straight line at alpha=1.

Apart from the expected inference of higher RSS for higher alphas, we can see the following:

1. For the same values of alpha, the coefficients of lasso regression are much smaller as compared to that of ridge regression(compare row 1 of the 2 tables)

2. For the same alpha, lasso has higher RSS(poor fit) as compared to ridge regression

3. Many of the coefficients are zero even for very small values of alpha.

walter1990

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Ridge Lasso Regression

import numpy as npimport pandas as pdimport randomimport matplotlib.pyplot as pltfrom matplotlib.pylab import rcParamsfrom sklearn.linear_model import LinearRegressionrcParams['figure.figsize']
复制链接

扫一扫

专栏目录