数据集abalone. txt,记录了鲍鱼(一种介壳类水生动物)的年龄,鲍鱼年龄可以从鲍鱼壳的层数推算得到。
前几列是样本点的特征数据,最后一列是鲍鱼的年龄。请分别使用标准线性回归(最小二乘)、
岭回归和lasso回归对模型训练,并求出三种模型的分值(分值即𝑹^ 𝟐( 决定系数) )
import numpy as np
import matplotlib. pyplot as plt
from sklearn import linear_model
from sklearn. model_selection import train_test_split
from sklearn. preprocessing import PolynomialFeatures
dataSet= np. genfromtxt( 'abalone.txt' )
x_data= dataSet[ : , : - 1 ]
print ( x_data)
y_data= dataSet[ : , - 1 ]
print ( y_data)
print ( len ( y_data) )
X_train, X_test, y_train, y_test= train_test_split( x_data, y_data, random_state= 0 )
standR= linear_model. LinearRegression( )
standR. fit( X_train, y_train)
print ( '训练集上得分' , standR. score( X_train, y_train) )
print ( standR. predict( x_data[ 1 , np. newaxis] ) )
yHat= standR. predict( x_data)
print ( yHat)
print ( '测试集上得分' , standR. score( X_test, y_test) )
print ( np. corrcoef( y_data, yHat) )
alphas_to_test= np. linspace( 0.001 , 4 )
Ridge= linear_model. RidgeCV( alphas= alphas_to_test, store_cv_values= True )
Ridge. fit( x_data, y_data)
print ( '岭系数' , Ridge. alpha_)
print ( Ridge. coef_)
print ( Ridge. intercept_)
print ( '决定系数' , Ridge. score( x_data, y_data) )
lasso= linear_model. Lasso( alpha= 0.001 )
lasso. fit( x_data, y_data)
print ( lasso. coef_)
print ( lasso. intercept_)
print ( '决定系数' , lasso. score( x_data, y_data) )