Ridge Lasso Regression

import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
from sklearn.linear_model import LinearRegression
rcParams['figure.figsize'] = 12, 10

x = np.array([i*np.pi/180 for i in range(60, 300, 4)])
np.random.seed(0)

y = np.sin(x)+np.random.normal(0,0.15, len(x))

data = pd.DataFrame(np.column_stack([x,y]), columns=['x', 'y'])
#plt.plot(data['x'], data['y'], '.')

for i in range(2,16):
    colname='x_%d'%i
    data[colname]=data['x']**i

def linear_regression(data, power, models_to_plot):
    predictors=['x']
    
    if power >= 2:
        predictors.extend(['x_%d' % i for i in range(2, power+1)])
    
    linreg = LinearRegression(normalize=True)
    linreg.fit(data[predictors], data['y'])
    y_pred = linreg.predict(data[predictors])
    
    if power in models_to_plot:
        plt.subplot(models_to_plot[power])
        plt.tight_layout()
        plt.plot(data['x'], y_pred)
        plt.plot(data['x'], data['y'], '.')
        plt.title('Plot for power: %d'%power)
        
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([linreg.intercept_])
    ret.extend(linreg.coef_)
    return ret
    
col = ['rss','intercept'] + ['coef_x_%d'%i for i in range(1,16)]
ind = ['model_pow_%d'%i for i in range(1,16)]
coef_matrix_simple = pd.DataFrame(index=ind, columns=col)
models_to_plot = {1:231,3:232,6:233,9:234,12:235,15:236}

for i in range(1, 16):
    coef_matrix_simple.iloc[i-1,0:i+2] = linear_regression(data,power=i, models_to_plot=models_to_plot)

pd.options.display.float_format='{:,.2g}'.format

print(coef_matrix_simple)



It is clearly evident that the size of coefficients increase exponentially with increase in model complexity. I hope this gives some intuition into why putting a constraint on the magnitude of coefficients can be a good idea to reduce model complexity


Lets try to understand this even better.


What does a large coefficient signify? It means that we're putting a lot of emphasis on that feature, i.e. the particular feature is a good predictor for the outcome.  when it becomes too large, the algorithm starts modelling intricate relations to estimate the output and ends up overfitting to the particular training data.


Ridge Rehgession:

parameter:alpha

I hope this gives some sense on how alpha would impact the magnitude of coefficients. One thing is for sure that any non-zero value would give values less than that of simple linear regression.


Keep in mind that normalizing the inputs is generally a good idea is every type of regression and should be used in case of ridge regression as well.


import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge

rcParams['figure.figsize'] = 12, 10

x = np.array([i*np.pi/180 for i in range(60, 300, 4)])
np.random.seed(0)

y = np.sin(x)+np.random.normal(0,0.15, len(x))

data = pd.DataFrame(np.column_stack([x,y]), columns=['x', 'y'])
#plt.plot(data['x'], data['y'], '.')

for i in range(2,16):
    colname='x_%d'%i
    data[colname]=data['x']**i

def linear_regression(data, power, models_to_plot):
    predictors=['x']
    
    if power >= 2:
        predictors.extend(['x_%d' % i for i in range(2, power+1)])
    
    linreg = LinearRegression(normalize=True)
    linreg.fit(data[predictors], data['y'])
    y_pred = linreg.predict(data[predictors])
    
    if power in models_to_plot:
        plt.subplot(models_to_plot[power])
        plt.tight_layout()
        plt.plot(data['x'], y_pred)
        plt.plot(data['x'], data['y'], '.')
        plt.title('Plot for power: %d'%power)
        
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([linreg.intercept_])
    ret.extend(linreg.coef_)
    return ret
def ridge_regression(data, predictors, alpha, models_to_plot={}):
    ridgereg = Ridge(alpha=alpha, normalize=True)
    ridgereg.fit(data[predictors], data['y'])
    y_pred = ridgereg.predict(data[predictors])
    
    if alpha in models_to_plot:
        plt.subplot(models_to_plot[alpha])
        plt.tight_layout()
        plt.plot(data['x'], y_pred)
        plt.plot(data['x'], data['y'], '.')
        plt.title('Plot for alpha: %.3g'%alpha)
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([ridgereg.intercept_])
    ret.extend(ridgereg.coef_)
    return ret
    
#col = ['rss','intercept'] + ['coef_x_%d'%i for i in range(1,16)]
#ind = ['model_pow_%d'%i for i in range(1,16)]
#coef_matrix_simple = pd.DataFrame(index=ind, columns=col)
#models_to_plot = {1:231,3:232,6:233,9:234,12:235,15:236}

#for i in range(1, 16):
#    coef_matrix_simple.iloc[i-1,0:i+2] = linear_regression(data,power=i, models_to_plot=models_to_plot)
#
#pd.options.display.float_format='{:,.2g}'.format
#
#print(coef_matrix_simple)

predictors=['x']
predictors.extend(['x_%d'%i for i in range(2,16)])
alpha_ridge = [1e-15, 1e-10, 1e-8, 1e-4, 1e-3,1e-2, 1, 5, 10, 20]
col = ['rss','intercept'] + ['coef_x_%d'%i for i in range(1,16)]
ind = ['alpha_%.2g'%alpha_ridge[i] for i in range(0,10)]
coef_matrix_ridge = pd.DataFrame(index=ind, columns=col)
models_to_plot = {1e-15:231, 1e-10:232, 1e-4:233, 1e-3:234, 1e-2:235, 5:236}

for i in range(10):
    coef_matrix_ridge.iloc[i,] = ridge_regression(data, predictors, alpha_ridge[i], models_to_plot)



Here we can clearly observe that as the value of alpha increases, the model complexity reduces.


Lets have a look at the value of coefficients in the above models:



This straight away gives us the following inferences:

1. The RSS increases with increase in alpha, this model complexity reduces

2. An alpha as small as 1e-15 gives us significant reduction in magnitude of coefficients. How? Compare the coefficients in the first row of this table to the last row of simple linear regression table.

3. High alpha values can lead to significant underfitting. Note that rapid increase in RSS for values of alpha greater than 1

4. Though the coefficients are very very small, they are NOT zero.


Lasso:

def lasso_regression(data, predictors, alpha, models_to_plot={}):
    lassoreg = Lasso(alpha, normalize=True, max_iter=1e5)
    lassoreg.fit(data[predictors],data['y'])
    y_pred = lassoreg.predict(data[predictors])

    if alpha in models_to_plot:
        plt.subplot(models_to_plot[alpha])
        plt.tight_layout()
        plt.plot(data['x'],y_pred)
        plt.plot(data['x'],data['y'],'.')
        plt.title('Plot for alpha: %.3g'%alpha)
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([lassoreg.intercept_])
    ret.extend(lassoreg.coef_)
    return ret

Notice the additional parameters defined in Lasso function——'max_iter'. This is the maximum number of iterations for which we want the model to run if it doesn't converge before. This exists for Ridge as well but setting this to a higher than default value was required in this case.




This again  tells us that the model complexity decreases with increase in the values of alpha. But notice the straight line at alpha=1.



 Apart from the expected inference of higher RSS for higher alphas, we can see the following:

1. For the same values of alpha, the coefficients of lasso regression are much smaller as compared to that of ridge regression(compare row 1 of the 2 tables)

2. For the same alpha, lasso has higher RSS(poor fit) as compared to ridge regression

3. Many of the coefficients are zero even for very small values of alpha.

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值