(每日碎碎念:又困又累,但该学的还得学)
本篇有些英文名词不太会翻译,所以会中英结合。
可以参考书籍:Introduction to Data Mining Pang-Ning Tan, Michael Steinbach, Vipin Kumar, ISBN: 0-321-32136-7, 2005.
学习目标:
1、验证集validation dataset的使用
2、正则化线性回归模型 Linear regression with regularization
3、具有复杂性惩罚(complexity penalty)的后剪枝决策树post pruning decision tree
4、交叉验证 cross-validation
toolbox:
-
pruning decision trees: cost_complexity_pruning_path Post pruning decision trees with cost complexity pruning — scikit-learn 1.4.2 documentation
-
Support vector machine for classification: from sklearn.svm import SVC sklearn.svm.SVC — scikit-learn 1.4.2 documentation
-
GridSearch CV: from sklearn.model_selection import GridSearchCV (好用,爱用sklearn.model_selection.GridSearchCV — scikit-learn 1.4.2 documentation
-
Classification report: from sklearn.metrics import classification_report sklearn.metrics.classification_report — scikit-learn 1.4.2 documentation
1. 岭回归 Ridge Regression with pre-determined penalty
1.1 首先从y=Sin(x)+噪声中生成一个数据集进行拟合
In [1]:
#Importing libraries. The same will be used throughout the article.
import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib.pylab import rcParams
plt.figure(figsize=(8,5))
#Define input array with angles from 60deg to 300deg converted to radians
x = np.array([i*np.pi/180 for i in range(60,300,4)])
np.random.seed(10) #Setting seed for reproducibility
y = np.sin(x) + np.random.normal(0,0.15,len(x))
data = pd.DataFrame(np.column_stack([x,y]),columns=['x','y'])
plt.plot(data['x'],data['y'],".",color='black',markersize=16)
plt.xlabel('X')
plt.ylabel('Y')
plt.show()
#create new variables for x^k
for i in range(2,16): #power of 1 is already there
colname = 'x^%d'%i #new var will be x_power
data[colname] = data['x']**i
data.head()
Out[1]:
x | y | x^2 | x^3 | x^4 | x^5 | x^6 | x^7 | x^8 | x^9 | x^10 | x^11 | x^12 | x^13 | x^14 | x^15 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.047198 | 1.065763 | 1.096623 | 1.148381 | 1.202581 | 1.259340 | 1.318778 | 1.381021 | 1.446202 | 1.514459 | 1.585938 | 1.660790 | 1.739176 | 1.821260 | 1.907219 | 1.997235 |
1 | 1.117011 | 1.006086 | 1.247713 | 1.393709 | 1.556788 | 1.738948 | 1.942424 | 2.169709 | 2.423588 | 2.707173 | 3.023942 | 3.377775 | 3.773011 | 4.214494 | 4.707635 | 5.258479 |
2 | 1.186824 | 0.695374 | 1.408551 | 1.671702 | 1.984016 | 2.354677 | 2.794587 | 3.316683 | 3.936319 | 4.671717 | 5.544505 | 6.580351 | 7.809718 | 9.268760 | 11.000386 | 13.055521 |
3 | 1.256637 | 0.949799 | 1.579137 | 1.984402 | 2.493673 | 3.133642 | 3.937850 | 4.948448 | 6.218404 | 7.814277 | 9.819710 | 12.339811 | 15.506664 | 19.486248 | 24.487142 | 30.771450 |
4 | 1.326450 | 1.063496 | 1.759470 | 2.333850 | 3.095735 | 4.106339 | 5.446854 | 7.224981 | 9.583578 | 12.712139 | 16.862020 | 22.366630 | 29.668222 | 39.353420 | 52.200353 | 69.241170 |
In [2]: 这段代码使用了scikit-learn库中的train_test_split函数和mean_absolute_error函数。
变量power,表示要将特征x扩展到的幂的最高次数15。
列表predictors,初始时只包含一个元素'x'。使用extend函数将'x^2'到'x^power'添加到predictors中,这样predictors就包含了扩展后的特征。
最后调用train_test_split函数将数据集data中的predictors作为特征,'y'作为目标变量进行划分,划分得到的训练集和测试集分别为X_train, X_test, y_train, y_test。
其中,random_state参数用于设置随机种子,保证每次划分的结果一致。
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
power=15
predictors=['x']
predictors.extend(['x^%d'%i for i in range(2,power+1)])#we extend the orignial feature x to the power of 2, 3, ..., 15
X_train, X_test, y_train, y_test = train_test_split(data[predictors], data['y'], random_state=0)
2.2 使用一个复杂的模型(可能有过拟合问题的)来拟合
比如使用多项式回归:polynomial regression
y=a0+a1∗x+a2∗x^2+a3∗x^3+...a15∗x^15
In [3]:
from sklearn.linear_model import LinearRegression
plt.figure(figsize=(8,5))
power=15
linreg = LinearRegression(normalize=True) #set up the model.
# If this parameter "normalize" is set to True,
#the regressor X will be normalized before regression.
#The normalization will be done by subtracting the mean and dividing it by L2 norm.
linreg.fit(X_train,y_train)
y_pred_train = linreg.predict(X_train)
plt.plot(X_train['x'],y_pred_train,'.',markersize=16)
plt.plot(X_train['x'],y_train,'.',markersize=16)
plt.title('Polynomial regression of power: %d'%power,fontsize=30)
plt.legend(['predicted','real data'],fontsize=16)
plt.show()
In [4]: 让我们看一下这个拟合的训练误差和系数。
print('Training Error= %f' %mean_absolute_error(y_train, y_pred_train))
print('intercept=%f' %linreg.intercept_)
for i in range(len(linreg.coef_)):
print('a%d=%f' %(i+1,linreg.coef_[i]))
Training Error= 0.078419
intercept=-78062.588903
a1=510579.728538
a2=-1521395.142083
a3=2740029.641778
a4=-3336467.496880
a5=2910717.643664
a6=-1880308.841635
a7=916425.589768
a8=-339980.434840
a9=96076.671899
a10=-20529.028446
a11=3259.769600
a12=-372.653688
a13=28.978615
a14=-1.371763
a15=0.029822
训练误差小。但是系数非常大。
In [5]: 验证错误呢?
y_pred = linreg.predict(X_test)
print('Validation Error= %f' %mean_absolute_error(y_test, y_pred))
plt.plot(X_test['x'],y_pred,'.',markersize=16)
plt.plot(X_test['x'],y_test,'.',markersize=16)
plt.title('Polynomial regression of power: %d'%power,fontsize=30)
plt.legend(['predicted','real data'],fontsize=16)
plt.show()
Validation Error= 0.230021
误差还是比较大的,表现一般。
2.3 岭回归对系数大小的penalty调整:
回归的penalty: α * (系数平方和)
In [6]:
from sklearn.linear_model import Ridge
plt.figure(figsize=(8,5))
alpha=0.1 #the penlty term is set to be 0.1
#Fit the model
ridgereg = Ridge(alpha=alpha,normalize=True)
ridgereg.fit(X_train,y_train)
y_pred_train = ridgereg.predict(X_train)
#Check if a plot is to be made for the entered alpha
plt.plot(X_train['x'],y_pred_train,'.',markersize=16)
plt.plot(X_train['x'],y_train,'.',markersize=16)
plt.title('Plot for penalty lambda: %.3g'%alpha,fontsize=30)
plt.show()
In [7]: 看一下这个拟合的训练误差和系数。
print('Training Error= %f' %mean_absolute_error(y_train, y_pred_train))
print('intercept=%f' %ridgereg.intercept_)
for i in range(len(ridgereg.coef_)):
print('a%d=%f' %(i+1,ridgereg.coef_[i]))
Training Error= 0.132973 intercept=1.377927 a1=-0.207283 a2=-0.032188 a3=-0.005316 a4=-0.000836 a5=-0.000121 a6=-0.000015 a7=-0.000001 a8=-0.000000 a9=0.000000 a10=0.000000 a11=0.000000 a12=0.000000 a13=0.000000 a14=0.000000 a15=0.000000
训练误差较大。但系数明显减小了。模型被简化为了更简单的形式。
In [8]: 看看validation error
#The model complexity was controled. What about validation performance?
y_pred = ridgereg.predict(X_test)
print('Validation Error= %f' %mean_absolute_error(y_test, y_pred))
plt.plot(X_test['x'],y_pred,'.',markersize=16)
plt.plot(X_test['x'],y_test,'.',markersize=16)
plt.title('Polynomial regression of power: %d'%power,fontsize=30)
plt.legend(['predicted','real data'],fontsize=16)
plt.show()
Validation Error= 0.205105
In [9]: 循环找到最合适的误差最小的alpha参数
alpha_test=[i/100 for i in range(0,20)] #search this hyperparameter from 0 to 0.19, with a step of 0.01
for alpha in alpha_test:
ridgereg = Ridge(alpha=alpha,normalize=True).fit(X_train,y_train)
y_pred = ridgereg.predict(X_test)
print(alpha,'Validation Error= %f' %mean_absolute_error(y_test, y_pred))
0.0 Validation Error= 0.230024 0.01 Validation Error= 0.187489 0.02 Validation Error= 0.197223 0.03 Validation Error= 0.201260 0.04 Validation Error= 0.203306 0.05 Validation Error= 0.204372 0.06 Validation Error= 0.204928 0.07 Validation Error= 0.205190 0.08 Validation Error= 0.205269 0.09 Validation Error= 0.205228 0.1 Validation Error= 0.205105 0.11 Validation Error= 0.204926 0.12 Validation Error= 0.204708 0.13 Validation Error= 0.204460 0.14 Validation Error= 0.204193 0.15 Validation Error= 0.203912 0.16 Validation Error= 0.203622 0.17 Validation Error= 0.203325 0.18 Validation Error= 0.203025 0.19 Validation Error= 0.202723
可以看出,α=0.01时,误差最小。
3. 案例:关于乳腺癌诊断的后剪枝决策树(含复杂性的惩罚调整)
Post pruning decision trees with complexity penalty for Breast Cancer Diagnostic
资料参考:
数据集来源:https://goo.gl/U2Uwz2
这个案例数据集的数据是,患者们体内的乳腺肿块里的细胞核特征及确诊乳腺癌与否的情况。
In [10]: 要用到的库
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier
乳腺肿块经过FNA得到图像,图像经过测量得到细胞数据组成了数据集。具体的数据集描述详见:data descriptionhttps://scikit-learn.org/stable/datasets/toy_dataset.html#breast-cancer-dataset
In [11]: 数据集加载
cancer = load_breast_cancer()
df_feat = pd.DataFrame(cancer['data'],
columns = cancer['feature_names'])
# cancer column is our target
df_target = pd.DataFrame(cancer['target'],
columns =['Cancer'])
#randomly split the dataset. By default, the test_size=0.25
In [12]: 查看数据集
df_feat.head()
Out[12]:
mean radius | mean texture | mean perimeter | mean area | mean smoothness | mean compactness | mean concavity | mean concave points | mean symmetry | mean fractal dimension | ... | worst radius | worst texture | worst perimeter | worst area | worst smoothness | worst compactness | worst concavity | worst concave points | worst symmetry | worst fractal dimension | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 17.99 | 10.38 | 122.80 | 1001.0 | 0.11840 | 0.27760 | 0.3001 | 0.14710 | 0.2419 | 0.07871 | ... | 25.38 | 17.33 | 184.60 | 2019.0 | 0.1622 | 0.6656 | 0.7119 | 0.2654 | 0.4601 | 0.11890 |
1 | 20.57 | 17.77 | 132.90 | 1326.0 | 0.08474 | 0.07864 | 0.0869 | 0.07017 | 0.1812 | 0.05667 | ... | 24.99 | 23.41 | 158.80 | 1956.0 | 0.1238 | 0.1866 | 0.2416 | 0.1860 | 0.2750 | 0.08902 |
2 | 19.69 | 21.25 | 130.00 | 1203.0 | 0.10960 | 0.15990 | 0.1974 | 0.12790 | 0.2069 | 0.05999 | ... | 23.57 | 25.53 | 152.50 | 1709.0 | 0.1444 | 0.4245 | 0.4504 | 0.2430 | 0.3613 | 0.08758 |
3 | 11.42 | 20.38 | 77.58 | 386.1 | 0.14250 | 0.28390 | 0.2414 | 0.10520 | 0.2597 | 0.09744 | ... | 14.91 | 26.50 | 98.87 | 567.7 | 0.2098 | 0.8663 | 0.6869 | 0.2575 | 0.6638 | 0.17300 |
4 | 20.29 | 14.34 | 135.10 | 1297.0 | 0.10030 | 0.13280 | 0.1980 | 0.10430 | 0.1809 | 0.05883 | ... | 22.54 | 16.67 | 152.20 | 1575.0 | 0.1374 | 0.2050 | 0.4000 | 0.1625 | 0.2364 | 0.07678 |
5 rows × 30 columns
In [13]: 目标数据列是Cancer
df_target.head()
Out[13]:
Cancer | |
---|---|
0 | 0 |
1 | 0 |
2 | 0 |
3 | 0 |
4 | 0 |
In [14]: 数据集划分。
X_train, X_test, y_train, y_test = train_test_split(df_feat, np.ravel(df_target), random_state=0)
3.1. 使用cost_complexity_pruning_path生成惩罚alpha
In [15]:
clf = DecisionTreeClassifier(random_state=0)
path = clf.cost_complexity_pruning_path(X_train, y_train) #seeking alpha
ccp_alphas, impurities = path.ccp_alphas, path.impurities #output apha and tree impurities
fig, ax = plt.subplots()
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
Out[15]:
Text(0.5, 1.0, 'Total Impurity vs effective alpha for training set')
3.2. 使用不同的惩罚alpha生成不同的树结构。
In [16]:
clfs = [] #this list will be used to store a list of Decision Tree Models with different penalty alpha values.
for alpha in ccp_alphas:
print('the penalty term alpha is %f' %alpha)
clf = DecisionTreeClassifier(random_state=0, ccp_alpha=alpha) # induce a tree using the ccp_alpha value
clf.fit(X_train, y_train)
clfs.append(clf)
the penalty term alpha is 0.000000 the penalty term alpha is 0.002266 the penalty term alpha is 0.004647 the penalty term alpha is 0.004660 the penalty term alpha is 0.005634 the penalty term alpha is 0.007042 the penalty term alpha is 0.007842 the penalty term alpha is 0.009114 the penalty term alpha is 0.011444 the penalty term alpha is 0.018988 the penalty term alpha is 0.023142 the penalty term alpha is 0.034225 the penalty term alpha is 0.327298
3.2.1 使用第一个面板alpha的第一个树(ccp_alpha [0])
In [17]:
from sklearn.tree import plot_tree
print('the first alphas is: ', ccp_alphas[0]) #the first alpha value in the list of ccp_alphas generated.
plot_tree(clfs[0], filled=True)
plt.title("Decision tree trained on the first ccp_alpha")
plt.show()
the first alphas is: 0.0
In [18]:
print('the 5th alphas is: ', ccp_alphas[5])
plot_tree(clfs[5], filled=True)
plt.title("Decision tree trained on the 5-th ccp_alpha")
plt.show()
the 5th alphas is: 0.007042253521126761
In [19]:
print('the last alphas is: ', ccp_alphas[-1])
plot_tree(clfs[-1], filled=True)
plt.title("Decision tree trained on the last ccp_alpha")
plt.show()
the last alphas is: 0.3272984419327777
3.3 模型复杂性VS惩罚alpha
In [20]:
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1)
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
3.4 GridSearchCV
In [21]: 要用到的库
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
In [22]: 用训练集训练模型,得到模型预测结果
# train the model on train set
model = SVC()
model.fit(X_train, y_train)
# print prediction results
predictions = model.predict(X_test)
print(classification_report(y_test, predictions))
precision recall f1-score support 0 0.98 0.85 0.91 53 1 0.92 0.99 0.95 90 accuracy 0.94 143 macro avg 0.95 0.92 0.93 143 weighted avg 0.94 0.94 0.94 143
In [23]: 用GridSearchCV找最佳超参数
from sklearn.model_selection import GridSearchCV
# defining hyperparameter options
param_grid = {'C': [0.1, 1, 10, 100, 1000],
'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
'kernel': ['rbf']}
grid = GridSearchCV(SVC(), param_grid, refit = True, verbose = 3)
# fitting the model for grid search
grid.fit(X_train, y_train)
Out[23]:
GridSearchCV(estimator=SVC(), param_grid={'C': [0.1, 1, 10, 100, 1000], 'gamma': [1, 0.1, 0.01, 0.001, 0.0001], 'kernel': ['rbf']}, verbose=3)
In [24]:
# print best parameter after tuning
print(grid.best_params_)
# print how our model looks after hyper-parameter tuning
print(grid.best_estimator_)
{'C': 1, 'gamma': 0.0001, 'kernel': 'rbf'}
SVC(C=1, gamma=0.0001)
In [25]: 超参数调优后的模型结果
grid_predictions = grid.predict(X_test)
# print classification report
print(classification_report(y_test, grid_predictions))
precision recall f1-score support 0 0.92 0.92 0.92 53 1 0.96 0.96 0.96 90 accuracy 0.94 143 macro avg 0.94 0.94 0.94 143 weighted avg 0.94 0.94 0.94 143
4. 使用交叉验证Cross-validation来选择模型
使用分层交叉验证来评估使用虹膜数据集的3类分类问题。
In [26]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
In [27]: 直接用自带的iris数据集
data = pd.read_csv('iris.txt',header=None) #use tool "read_csv" to read the data in "iris.txt" file. the data is stored in "data"
data.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'class'] #set the column names of the data table
data.head(10) #
Out[27]:
sepal length | sepal width | petal length | petal width | class | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
5 | 5.4 | 3.9 | 1.7 | 0.4 | Iris-setosa |
6 | 4.6 | 3.4 | 1.4 | 0.3 | Iris-setosa |
7 | 5.0 | 3.4 | 1.5 | 0.2 | Iris-setosa |
8 | 4.4 | 2.9 | 1.4 | 0.2 | Iris-setosa |
9 | 4.9 | 3.1 | 1.5 | 0.1 | Iris-setosa |
In [28]: 目标变量:class
Y = data['class'] #this is our prediction target
X = data.drop(['class'],axis=1)
In [29]:
from sklearn.model_selection import cross_val_score
clf=KNeighborsClassifier(10)
scores = cross_val_score(clf, X, Y, cv=5)
print('KNN CV Score:', scores)
KNN CV Score: [0.96666667 1. 1. 0.93333333 1. ]
In [30]:
clf=DecisionTreeClassifier(max_depth=10)
scores = cross_val_score(clf, X, Y, cv=5)
print('DT CV Score:', scores)
DT CV Score: [0.96666667 0.96666667 0.9 0.93333333 1. ]
In [31]: 使用所有分类模型进行模型比较:
h = 0.02 # step size in the mesh
names = [
"Nearest Neighbors",
"Linear SVM",
"RBF SVM",
"Gaussian Process",
"Decision Tree",
"Random Forest",
"Neural Net",
"AdaBoost",
"Naive Bayes",
"QDA",
]
classifiers = [
KNeighborsClassifier(10),
SVC(kernel="linear", C=0.025),
SVC(gamma=2, C=1),
GaussianProcessClassifier(1.0 * RBF(1.0)),
DecisionTreeClassifier(max_depth=5),
RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),
MLPClassifier(alpha=1, max_iter=1000),
AdaBoostClassifier(),
GaussianNB(),
QuadraticDiscriminantAnalysis(),
]
In [32]:
for i in range(len(classifiers)):
clf=classifiers[i] #use the i-th model in the "classifiers" list
scores = cross_val_score(clf, X, Y, cv=5)
print('CV Score of '+ names[i], scores)
CV Score of Nearest Neighbors [0.96666667 1. 1. 0.93333333 1. ] CV Score of Linear SVM [0.93333333 0.96666667 0.9 0.93333333 1. ] CV Score of RBF SVM [0.96666667 1. 0.9 0.96666667 1. ] CV Score of Gaussian Process [0.96666667 1. 0.93333333 0.93333333 1. ] CV Score of Decision Tree [0.96666667 0.96666667 0.9 0.93333333 1. ] CV Score of Random Forest [0.96666667 0.96666667 0.93333333 0.9 1. ] CV Score of Neural Net [1. 1. 0.96666667 0.96666667 1. ] CV Score of AdaBoost [0.96666667 0.93333333 0.9 0.93333333 1. ] CV Score of Naive Bayes [0.93333333 0.96666667 0.93333333 0.93333333 1. ] CV Score of QDA [1. 1. 0.96666667 0.93333333 1. ]
In [33]: 保留分类的报告结果
from sklearn.metrics import classification_report # this library directly generates precision, recall, f-measure
from sklearn.model_selection import train_test_split
clf=DecisionTreeClassifier(max_depth=10)
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=1)
clf.fit(X_train,y_train)
y_pred=clf.predict(X_test)
print(classification_report(y_test, y_pred))
precision recall f1-score support Iris-setosa 1.00 1.00 1.00 17 Iris-versicolor 0.95 0.95 0.95 19 Iris-virginica 0.93 0.93 0.93 14 accuracy 0.96 50 macro avg 0.96 0.96 0.96 50 weighted avg 0.96 0.96 0.96 50
对于多类分类问题,微平均的precision score可以定义为所有类别的真正确数true positives(预测结果为True且正确)之和除以所有正预测数positive predictions(预测结果为True的数量)。正预测是所有真正确和假正确的总和。
总结:
1. 如何用线性回归模型
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
2. 学习如何使用不同的评估指标进行预测
from sklearn.metrics import r2_score
from sklearn.metrics import mean_absolute_error
3. 学习如何更改正则化惩罚的α regularization penalties (alpha)
ridge_reg = Ridge(alpha=1, solver="cholesky")