常用的包
from sklearn.linear_model import LinearRegression....metrics . mean_squared_error ( true , pred ) MSEfrom sklearn.cross_validation import train_test_split, KFold,cross_val_score import pandas as pdimport numpy as npfrom sklearn import metrics metrics.accuracy_score(y_, y_test)metrics.mean_absolute_error(true, pred) MAE
np.sqrt(metrics.mean_squared_error(true, pred)) RMSEmetrics.confusion_matrix(y_test, y_pred_class) 混淆矩阵 metrics.roc_auc_score() roc曲线param_grid = dict(params_name=k_range) params_name应与model中需要的参数同名from sklearn.grid_search import RandomizedSearchCV,GridSearchCV
grid = GridSearchCV(model, param_grid, cv=10, scoring='accuracy')
ROC曲线(classification)
真正(True Positive , TP)被模型预测为正的正样本; 假负(False Negative , FN)被模型预测为负的正样本; 假正(False Positive , FP)被模型预测为正的负样本; 真负(True Negative , TN)被模型预测为负的负样本。
真正率(True Positive Rate ,TPR)或灵敏度(sensitivity) TPR = TP /(TP +FN)
(正样本预测结果数 /正样本实际数) 假负率(False Negative Rate , FNR) FNR = FN /(TP + FN) (被预测为负的正样本结果数 / 正样本实际数 ) 假正率(False Positive Rate , FPR) FPR = FP /(FP + TN) (被预测为正的负样本结果数 /负样本实际数) 真负率(True Negative Rate ,TNR)或特指度(specificity) TNR = TN /(TN + FP) (负样本预测结果数 / 负样本实际数)
目标属性的被选中的那个期望值称作是“正”(positive)
feature选择
- MAE is the easiest to understand, because it's the average error.
- MSE is more popular than MAE, because MSE "punishes" larger errors.
- RMSE is even more popular than MSE, because RMSE is interpretable in the "y" units.
X = 有[A, B, C]三个特征, model.fit(X_train, y_train); y_pred = model.predict(X_test);
error = np.sqrt(metrics.mean_squared_error(y_test, y_pred))现在X = 有[A, B,]二个特征, 重复计算error_1, 若error_1 < error 说明C这个特征对模型准确性影响大,可以remove