机器学习之模型评估（新手自学记录）

最新推荐文章于 2024-09-07 22:28:34 发布

蛋肠加蛋不加香菜

最新推荐文章于 2024-09-07 22:28:34 发布

阅读量632

点赞数 21

分类专栏： python机器学习文章标签：机器学习 python sklearn

本文链接：https://blog.csdn.net/weixin_46714992/article/details/138260620

版权

python机器学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

（每日碎碎念：又困又累，但该学的还得学）

本篇有些英文名词不太会翻译，所以会中英结合。

可以参考书籍：Introduction to Data Mining Pang-Ning Tan, Michael Steinbach, Vipin Kumar, ISBN: 0-321-32136-7, 2005.

学习目标：

1、验证集validation dataset的使用

2、正则化线性回归模型 Linear regression with regularization

3、具有复杂性惩罚（complexity penalty）的后剪枝决策树post pruning decision tree

4、交叉验证 cross-validation

toolbox:

pruning decision trees: cost_complexity_pruning_path Post pruning decision trees with cost complexity pruning — scikit-learn 1.4.2 documentation
Support vector machine for classification: from sklearn.svm import SVC sklearn.svm.SVC — scikit-learn 1.4.2 documentation
GridSearch CV: from sklearn.model_selection import GridSearchCV （好用，爱用sklearn.model_selection.GridSearchCV — scikit-learn 1.4.2 documentation
Classification report: from sklearn.metrics import classification_report sklearn.metrics.classification_report — scikit-learn 1.4.2 documentation

1. 岭回归 Ridge Regression with pre-determined penalty

1.1 首先从y=Sin(x)+噪声中生成一个数据集进行拟合

In [1]:

#Importing libraries. The same will be used throughout the article.
import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib.pylab import rcParams
plt.figure(figsize=(8,5))
#Define input array with angles from 60deg to 300deg converted to radians
x = np.array([i*np.pi/180 for i in range(60,300,4)])
np.random.seed(10)  #Setting seed for reproducibility
y = np.sin(x) + np.random.normal(0,0.15,len(x))
data = pd.DataFrame(np.column_stack([x,y]),columns=['x','y'])
plt.plot(data['x'],data['y'],".",color='black',markersize=16)
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

#create new variables for x^k
for i in range(2,16):  #power of 1 is already there
    colname = 'x^%d'%i      #new var will be x_power
    data[colname] = data['x']**i
data.head()

Out[1]:

	x	y	x^2	x^3	x^4	x^5	x^6	x^7	x^8	x^9	x^10	x^11	x^12	x^13	x^14	x^15
0	1.047198	1.065763	1.096623	1.148381	1.202581	1.259340	1.318778	1.381021	1.446202	1.514459	1.585938	1.660790	1.739176	1.821260	1.907219	1.997235
1	1.117011	1.006086	1.247713	1.393709	1.556788	1.738948	1.942424	2.169709	2.423588	2.707173	3.023942	3.377775	3.773011	4.214494	4.707635	5.258479
2	1.186824	0.695374	1.408551	1.671702	1.984016	2.354677	2.794587	3.316683	3.936319	4.671717	5.544505	6.580351	7.809718	9.268760	11.000386	13.055521
3	1.256637	0.949799	1.579137	1.984402	2.493673	3.133642	3.937850	4.948448	6.218404	7.814277	9.819710	12.339811	15.506664	19.486248	24.487142	30.771450
4	1.326450	1.063496	1.759470	2.333850	3.095735	4.106339	5.446854	7.224981	9.583578	12.712139	16.862020	22.366630	29.668222	39.353420	52.200353	69.241170

In [2]: 这段代码使用了scikit-learn库中的train_test_split函数和mean_absolute_error函数。
变量power，表示要将特征x扩展到的幂的最高次数15。
列表predictors，初始时只包含一个元素'x'。使用extend函数将'x^2'到'x^power'添加到predictors中，这样predictors就包含了扩展后的特征。
最后调用train_test_split函数将数据集data中的predictors作为特征，'y'作为目标变量进行划分，划分得到的训练集和测试集分别为X_train, X_test, y_train, y_test。

其中，random_state参数用于设置随机种子，保证每次划分的结果一致。

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
power=15
predictors=['x']
predictors.extend(['x^%d'%i for i in range(2,power+1)])#we extend the orignial feature x to the power of 2, 3, ..., 15
X_train, X_test, y_train, y_test = train_test_split(data[predictors], data['y'], random_state=0)

2.2 使用一个复杂的模型（可能有过拟合问题的）来拟合

比如使用多项式回归：polynomial regression

y=a0+a1∗x+a2∗x^2+a3∗x^3+...a15∗x^15

In [3]:

from sklearn.linear_model import LinearRegression

plt.figure(figsize=(8,5))

power=15

linreg = LinearRegression(normalize=True) #set up the model. 
# If this parameter "normalize" is set to True, 
#the regressor X will be normalized before regression. 
#The normalization will be done by subtracting the mean and dividing it by L2 norm.
linreg.fit(X_train,y_train)
y_pred_train = linreg.predict(X_train)

plt.plot(X_train['x'],y_pred_train,'.',markersize=16)
plt.plot(X_train['x'],y_train,'.',markersize=16)
plt.title('Polynomial regression of power: %d'%power,fontsize=30)
plt.legend(['predicted','real data'],fontsize=16)
plt.show()

In [4]: 让我们看一下这个拟合的训练误差和系数。

print('Training Error= %f' %mean_absolute_error(y_train, y_pred_train))
print('intercept=%f' %linreg.intercept_)
for i in range(len(linreg.coef_)):
    print('a%d=%f' %(i+1,linreg.coef_[i]))

Training Error= 0.078419
intercept=-78062.588903
a1=510579.728538
a2=-1521395.142083
a3=2740029.641778
a4=-3336467.496880
a5=2910717.643664
a6=-1880308.841635
a7=916425.589768
a8=-339980.434840
a9=96076.671899
a10=-20529.028446
a11=3259.769600
a12=-372.653688
a13=28.978615
a14=-1.371763
a15=0.029822

训练误差小。但是系数非常大。

In [5]: 验证错误呢?

y_pred = linreg.predict(X_test)
print('Validation Error= %f' %mean_absolute_error(y_test, y_pred))
plt.plot(X_test['x'],y_pred,'.',markersize=16)
plt.plot(X_test['x'],y_test,'.',markersize=16)
plt.title('Polynomial regression of power: %d'%power,fontsize=30)
plt.legend(['predicted','real data'],fontsize=16)
plt.show()

Validation Error= 0.230021

误差还是比较大的，表现一般。

2.3 岭回归对系数大小的penalty调整：

回归的penalty: α * (系数平方和)

In [6]:

from sklearn.linear_model import Ridge
plt.figure(figsize=(8,5))

alpha=0.1 #the penlty term is set to be 0.1
#Fit the model
ridgereg = Ridge(alpha=alpha,normalize=True)
ridgereg.fit(X_train,y_train)
y_pred_train = ridgereg.predict(X_train)

    
#Check if a plot is to be made for the entered alpha
plt.plot(X_train['x'],y_pred_train,'.',markersize=16)
plt.plot(X_train['x'],y_train,'.',markersize=16)
plt.title('Plot for penalty lambda: %.3g'%alpha,fontsize=30)
plt.show()

In [7]: 看一下这个拟合的训练误差和系数。

print('Training Error= %f' %mean_absolute_error(y_train, y_pred_train))
print('intercept=%f' %ridgereg.intercept_)
for i in range(len(ridgereg.coef_)):
    print('a%d=%f' %(i+1,ridgereg.coef_[i]))

Training Error= 0.132973
intercept=1.377927
a1=-0.207283
a2=-0.032188
a3=-0.005316
a4=-0.000836
a5=-0.000121
a6=-0.000015
a7=-0.000001
a8=-0.000000
a9=0.000000
a10=0.000000
a11=0.000000
a12=0.000000
a13=0.000000
a14=0.000000
a15=0.000000

训练误差较大。但系数明显减小了。模型被简化为了更简单的形式。

In [8]: 看看validation error

#The model complexity was controled. What about validation performance?
y_pred = ridgereg.predict(X_test)
print('Validation Error= %f' %mean_absolute_error(y_test, y_pred))
plt.plot(X_test['x'],y_pred,'.',markersize=16)
plt.plot(X_test['x'],y_test,'.',markersize=16)
plt.title('Polynomial regression of power: %d'%power,fontsize=30)
plt.legend(['predicted','real data'],fontsize=16)
plt.show()

Validation Error= 0.205105

In [9]: 循环找到最合适的误差最小的alpha参数

alpha_test=[i/100 for i in range(0,20)] #search this hyperparameter from 0 to 0.19, with a step of 0.01

for alpha in alpha_test:
    ridgereg = Ridge(alpha=alpha,normalize=True).fit(X_train,y_train)
    y_pred = ridgereg.predict(X_test)
    print(alpha,'Validation Error= %f' %mean_absolute_error(y_test, y_pred))

0.0 Validation Error= 0.230024
0.01 Validation Error= 0.187489
0.02 Validation Error= 0.197223
0.03 Validation Error= 0.201260
0.04 Validation Error= 0.203306
0.05 Validation Error= 0.204372
0.06 Validation Error= 0.204928
0.07 Validation Error= 0.205190
0.08 Validation Error= 0.205269
0.09 Validation Error= 0.205228
0.1 Validation Error= 0.205105
0.11 Validation Error= 0.204926
0.12 Validation Error= 0.204708
0.13 Validation Error= 0.204460
0.14 Validation Error= 0.204193
0.15 Validation Error= 0.203912
0.16 Validation Error= 0.203622
0.17 Validation Error= 0.203325
0.18 Validation Error= 0.203025
0.19 Validation Error= 0.202723

可以看出，α=0.01时，误差最小。

3. 案例：关于乳腺癌诊断的后剪枝决策树（含复杂性的惩罚调整）

Post pruning decision trees with complexity penalty for Breast Cancer Diagnostic

资料参考：

Referencehttps://scikit-learn.org/stable/auto_examples/tree/plot_cost_complexity_pruning.html#sphx-glr-auto-examples-tree-plot-cost-complexity-pruning-py

数据集来源：https://goo.gl/U2Uwz2

这个案例数据集的数据是，患者们体内的乳腺肿块里的细胞核特征及确诊乳腺癌与否的情况。

In [10]: 要用到的库

import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier

乳腺肿块经过FNA得到图像，图像经过测量得到细胞数据组成了数据集。具体的数据集描述详见：data descriptionhttps://scikit-learn.org/stable/datasets/toy_dataset.html#breast-cancer-dataset

In [11]: 数据集加载

cancer = load_breast_cancer()

df_feat = pd.DataFrame(cancer['data'], 
                       columns = cancer['feature_names']) 
  
# cancer column is our target 
df_target = pd.DataFrame(cancer['target'],  
                     columns =['Cancer'])

#randomly split the dataset. By default, the test_size=0.25

In [12]: 查看数据集

df_feat.head()

Out[12]:

	mean radius	mean texture	mean perimeter	mean area	mean smoothness	mean compactness	mean concavity	mean concave points	mean symmetry	mean fractal dimension	...	worst radius	worst texture	worst perimeter	worst area	worst smoothness	worst compactness	worst concavity	worst concave points	worst symmetry	worst fractal dimension
0	17.99	10.38	122.80	1001.0	0.11840	0.27760	0.3001	0.14710	0.2419	0.07871	...	25.38	17.33	184.60	2019.0	0.1622	0.6656	0.7119	0.2654	0.4601	0.11890
1	20.57	17.77	132.90	1326.0	0.08474	0.07864	0.0869	0.07017	0.1812	0.05667	...	24.99	23.41	158.80	1956.0	0.1238	0.1866	0.2416	0.1860	0.2750	0.08902
2	19.69	21.25	130.00	1203.0	0.10960	0.15990	0.1974	0.12790	0.2069	0.05999	...	23.57	25.53	152.50	1709.0	0.1444	0.4245	0.4504	0.2430	0.3613	0.08758
3	11.42	20.38	77.58	386.1	0.14250	0.28390	0.2414	0.10520	0.2597	0.09744	...	14.91	26.50	98.87	567.7	0.2098	0.8663	0.6869	0.2575	0.6638	0.17300
4	20.29	14.34	135.10	1297.0	0.10030	0.13280	0.1980	0.10430	0.1809	0.05883	...	22.54	16.67	152.20	1575.0	0.1374	0.2050	0.4000	0.1625	0.2364	0.07678

5 rows × 30 columns

In [13]: 目标数据列是Cancer

df_target.head()

Out[13]:

	Cancer
0	0
1	0
2	0
3	0
4	0

In [14]: 数据集划分。

X_train, X_test, y_train, y_test = train_test_split(df_feat, np.ravel(df_target), random_state=0)

3.1. 使用cost_complexity_pruning_path生成惩罚alpha

In [15]:

clf = DecisionTreeClassifier(random_state=0)
path = clf.cost_complexity_pruning_path(X_train, y_train) #seeking alpha
ccp_alphas, impurities = path.ccp_alphas, path.impurities #output apha and tree impurities

fig, ax = plt.subplots()
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")

Out[15]:

Text(0.5, 1.0, 'Total Impurity vs effective alpha for training set')

3.2. 使用不同的惩罚alpha生成不同的树结构。

In [16]:

clfs = [] #this list will be used to store a list of Decision Tree Models with different penalty alpha values.
for alpha in ccp_alphas:
    print('the penalty term alpha is %f' %alpha)
    clf = DecisionTreeClassifier(random_state=0, ccp_alpha=alpha) # induce a tree using the ccp_alpha value
    clf.fit(X_train, y_train)
    clfs.append(clf)

the penalty term alpha is 0.000000
the penalty term alpha is 0.002266
the penalty term alpha is 0.004647
the penalty term alpha is 0.004660
the penalty term alpha is 0.005634
the penalty term alpha is 0.007042
the penalty term alpha is 0.007842
the penalty term alpha is 0.009114
the penalty term alpha is 0.011444
the penalty term alpha is 0.018988
the penalty term alpha is 0.023142
the penalty term alpha is 0.034225
the penalty term alpha is 0.327298

3.2.1 使用第一个面板alpha的第一个树(ccp_alpha [0])

In [17]:

from sklearn.tree import plot_tree
print('the first alphas is: ', ccp_alphas[0]) #the first alpha value in the list of ccp_alphas generated.
plot_tree(clfs[0], filled=True)
plt.title("Decision tree trained on the first ccp_alpha")
plt.show()

the first alphas is:  0.0

In [18]:

print('the 5th alphas is: ', ccp_alphas[5])
plot_tree(clfs[5], filled=True)
plt.title("Decision tree trained on the 5-th ccp_alpha")
plt.show()

the 5th alphas is:  0.007042253521126761

In [19]:

print('the last alphas is: ', ccp_alphas[-1])
plot_tree(clfs[-1], filled=True)
plt.title("Decision tree trained on the last ccp_alpha")
plt.show()

the last alphas is:  0.3272984419327777

3.3 模型复杂性VS惩罚alpha

In [20]:

clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]

node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1)
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()

3.4 GridSearchCV

In [21]: 要用到的库

from sklearn.svm import SVC 
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report

In [22]: 用训练集训练模型，得到模型预测结果

# train the model on train set 
model = SVC() 
model.fit(X_train, y_train) 
  
# print prediction results 
predictions = model.predict(X_test) 
print(classification_report(y_test, predictions))

              precision    recall  f1-score   support

           0       0.98      0.85      0.91        53
           1       0.92      0.99      0.95        90

    accuracy                           0.94       143
   macro avg       0.95      0.92      0.93       143
weighted avg       0.94      0.94      0.94       143

In [23]: 用GridSearchCV找最佳超参数

from sklearn.model_selection import GridSearchCV 
  
# defining hyperparameter options 
param_grid = {'C': [0.1, 1, 10, 100, 1000],  
              'gamma': [1, 0.1, 0.01, 0.001, 0.0001], 
              'kernel': ['rbf']}  
  
grid = GridSearchCV(SVC(), param_grid, refit = True, verbose = 3) 
  
# fitting the model for grid search 
grid.fit(X_train, y_train)

Out[23]:

GridSearchCV(estimator=SVC(),
             param_grid={'C': [0.1, 1, 10, 100, 1000],
                         'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
                         'kernel': ['rbf']},
             verbose=3)

In [24]:

# print best parameter after tuning 
print(grid.best_params_) 
  
# print how our model looks after hyper-parameter tuning 
print(grid.best_estimator_)

{'C': 1, 'gamma': 0.0001, 'kernel': 'rbf'}
SVC(C=1, gamma=0.0001)

In [25]: 超参数调优后的模型结果

grid_predictions = grid.predict(X_test) 
  
# print classification report 
print(classification_report(y_test, grid_predictions))

              precision    recall  f1-score   support

           0       0.92      0.92      0.92        53
           1       0.96      0.96      0.96        90

    accuracy                           0.94       143
   macro avg       0.94      0.94      0.94       143
weighted avg       0.94      0.94      0.94       143

4. 使用交叉验证Cross-validation来选择模型

交叉验证参考: 3.1. Cross-validation: evaluating estimator performance — scikit-learn 1.4.2 documentationLearning the parameters of a prediction function and testing it on the same data is a methodological mistake: a model that would just repeat the labels of the samples that it has just seen would ha...https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation-iterators

使用分层交叉验证来评估使用虹膜数据集的3类分类问题。

In [26]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

In [27]: 直接用自带的iris数据集

data = pd.read_csv('iris.txt',header=None)  #use tool "read_csv" to read the data in "iris.txt" file. the data is stored in "data"
data.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'class'] #set the column names of the data table 
data.head(10) #

Out[27]:

	sepal length	sepal width	petal length	petal width	class
0	5.1	3.5	1.4	0.2	Iris-setosa
1	4.9	3.0	1.4	0.2	Iris-setosa
2	4.7	3.2	1.3	0.2	Iris-setosa
3	4.6	3.1	1.5	0.2	Iris-setosa
4	5.0	3.6	1.4	0.2	Iris-setosa
5	5.4	3.9	1.7	0.4	Iris-setosa
6	4.6	3.4	1.4	0.3	Iris-setosa
7	5.0	3.4	1.5	0.2	Iris-setosa
8	4.4	2.9	1.4	0.2	Iris-setosa
9	4.9	3.1	1.5	0.1	Iris-setosa

In [28]: 目标变量：class

Y = data['class'] #this is our prediction target
X = data.drop(['class'],axis=1)

In [29]:

from sklearn.model_selection import cross_val_score
clf=KNeighborsClassifier(10)
scores = cross_val_score(clf, X, Y, cv=5)
print('KNN CV Score:', scores)

KNN CV Score: [0.96666667 1.         1.         0.93333333 1.        ]

In [30]:

clf=DecisionTreeClassifier(max_depth=10)
scores = cross_val_score(clf, X, Y, cv=5)
print('DT CV Score:', scores)

DT CV Score: [0.96666667 0.96666667 0.9        0.93333333 1.        ]

In [31]: 使用所有分类模型进行模型比较：

h = 0.02  # step size in the mesh

names = [
    "Nearest Neighbors",
    "Linear SVM",
    "RBF SVM",
    "Gaussian Process",
    "Decision Tree",
    "Random Forest",
    "Neural Net",
    "AdaBoost",
    "Naive Bayes",
    "QDA",
]

classifiers = [
    KNeighborsClassifier(10),
    SVC(kernel="linear", C=0.025),
    SVC(gamma=2, C=1),
    GaussianProcessClassifier(1.0 * RBF(1.0)),
    DecisionTreeClassifier(max_depth=5),
    RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),
    MLPClassifier(alpha=1, max_iter=1000),
    AdaBoostClassifier(),
    GaussianNB(),
    QuadraticDiscriminantAnalysis(),
]

In [32]:

for i in range(len(classifiers)):
    clf=classifiers[i] #use the i-th model in the "classifiers" list
    scores = cross_val_score(clf, X, Y, cv=5)
    print('CV Score of '+ names[i], scores)

CV Score of Nearest Neighbors [0.96666667 1.         1.         0.93333333 1.        ]
CV Score of Linear SVM [0.93333333 0.96666667 0.9        0.93333333 1.        ]
CV Score of RBF SVM [0.96666667 1.         0.9        0.96666667 1.        ]
CV Score of Gaussian Process [0.96666667 1.         0.93333333 0.93333333 1.        ]
CV Score of Decision Tree [0.96666667 0.96666667 0.9        0.93333333 1.        ]
CV Score of Random Forest [0.96666667 0.96666667 0.93333333 0.9        1.        ]
CV Score of Neural Net [1.         1.         0.96666667 0.96666667 1.        ]
CV Score of AdaBoost [0.96666667 0.93333333 0.9        0.93333333 1.        ]
CV Score of Naive Bayes [0.93333333 0.96666667 0.93333333 0.93333333 1.        ]
CV Score of QDA [1.         1.         0.96666667 0.93333333 1.        ]

In [33]: 保留分类的报告结果

from sklearn.metrics import classification_report # this library directly generates precision, recall, f-measure
from sklearn.model_selection import train_test_split

clf=DecisionTreeClassifier(max_depth=10)
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=1)
clf.fit(X_train,y_train)
y_pred=clf.predict(X_test)

print(classification_report(y_test, y_pred))

                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        17
Iris-versicolor       0.95      0.95      0.95        19
 Iris-virginica       0.93      0.93      0.93        14

       accuracy                           0.96        50
      macro avg       0.96      0.96      0.96        50
   weighted avg       0.96      0.96      0.96        50

对于多类分类问题，微平均的precision score可以定义为所有类别的真正确数true positives（预测结果为True且正确）之和除以所有正预测数positive predictions（预测结果为True的数量）。正预测是所有真正确和假正确的总和。

总结:

1. 如何用线性回归模型

from sklearn.linear_model import LinearRegression

from sklearn.linear_model import Ridge

from sklearn.linear_model import Lasso

2. 学习如何使用不同的评估指标进行预测

from sklearn.metrics import r2_score

from sklearn.metrics import mean_absolute_error

3. 学习如何更改正则化惩罚的α regularization penalties (alpha)

ridge_reg = Ridge(alpha=1, solver="cholesky")

4. 学习如何模型选择 model selection

End，凌晨1:07，lol启动！

蛋肠加蛋不加香菜

关注

21
点赞
踩
24

收藏

觉得还不错? 一键收藏
打赏
1
评论
机器学习之模型评估（新手自学记录）

1、验证集validation dataset的使用2、正则化线性回归模型 Linear regression with regularization3、具有复杂性惩罚（complexity penalty）的后剪枝决策树post pruning decision tree4、交叉验证 cros
复制链接

扫一扫