一.calibration
1.简介:
该模块用于进行"概率校准"(Probability Calibration)
2.使用
(1)类:
基于"保序回归"(isotonic regression)或"逻辑回归"(logistic regression)的概率校准:class sklearn.calibration.CalibratedClassifierCV([base_estimator=None,method='sigmoid',cv=None,n_jobs=None,ensemble=True])
#参数说明:
base_estimator:指定基本估计器;为estimator instance
method:指定用于校准的方法;为"sigmoid"/"isotonic"
cv:指定交叉验证的拆分策略;为int/cross-validation generator/iterable/"prefit"
n_jobs:指定并行计算的任务数;为int
ensemble:指定cv不为"prefit"时如何进行校准;为bool
#If Truethe base_estimator is fitted using training data and calibrated using testing data,for each cv fold.The final estimator is an ensemble of n_cv fitted classifer and calibrator pairs,where n_cv is the number of cross-validation folds.The output is the average predicted probabilities of all pairs
#If False,cv is used to compute unbiased predictions,via cross_val_predict,which are then used for calibration.At prediction time, the classifier used is the base_estimator trained on all the data.Note that this method is also internally implemented in sklearn.svm estimators with the probabilities=True parameter
(2)方法:
求"校准曲线"(calibration curve)的"预测概率"(predicted probabilities)与"实际概率"(true probabilities):[<prob_true>,<prob_pred>=]sklearn.calibration.calibration_curve(<y_true>,<y_prob>[,normalize=False,n_bins=5,strategy='uniform'])
#参数说明:
y_true:指定实际的标签;为1×n_samples array-like
y_prob:指定为正类的概率;为1×n_samples array-like
normalize:指定是否对<y_prob>进行归一化;为bool
n_bins:指定将[0,1]拆分成的bin的数量;为int
strategy:指定如何确定bin的数量;为"uniform"(具有相同宽度)/"quantile"(包含相同数量的样本)
prob_true:返回每个bin中为正类的样本的比例;为1×n_bins ndarray or smaller
prob_pred:返回每个bin中的平均概率预测值;为1×n_bins ndarray or smaller
二.discriminant_analysis
1.简介:
该模块用于进行"线性判别分析"(Linear Discriminant Analysis)和"二次判别分析"(Quadratic Discriminant Analysis)
2.使用:
"线性判别分析"(Linear Discriminant Analysis):class sklearn.discriminant_analysis.LinearDiscriminantAnalysis([solver='svd',shrinkage=