机器学习模型常用技巧（持续更新中……）

最新推荐文章于 2024-06-29 12:15:00 发布

置顶 xiaotian127

最新推荐文章于 2024-06-29 12:15:00 发布

阅读量439

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/xiaotian127/article/details/97124717

版权

机器学习专栏收录该内容

8 篇文章 2 订阅

订阅专栏

1、网格搜索的套路函数（以决策树为例）：

from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.tree import DecisionTreeClassifier
def check_model(x,y):
    ##以决策树为例##
    classifier = DecisionTreeClassifier(random_state=1)
    parameters = {
        'max_leaf_nodes': list(range(2,100)),    # 参数是决策树分类器中的，以便进行网格超参数搜索
        'min_samples_split': [8,10,15]
    }
    # StratifiedKFold与KFold类似，但它是分层采样，确保训练集、测试集中各类别样本的比例与原始数据集中的相同
    folder = StratifiedKFold(n_splits=3, shuffle=True)
    # Exhaustive search over specified parameter values for an estimator.
    grid_search = GridSearchCV(
        estimator=classifier,
        param_grid=parameters,
        cv=folder,
        n_jobs=2,
        verbose=1    # Controls the verbosity: the higher, the more messages.
    )
    grid_search = grid_search.fit(x,y)
    print(grid_search.best_params_)
    return grid_search
model = check_model(x_train,y_train)
moedl = model.best_estimator_    # 选择最好的分类器
##进行预测模型评估等……##

2、保存模型：

##保存模型##
### 方法一
import os, pickle
if not os.path.isfile('test_model.pkl'):
    with open('test_model.pkl', 'wb') as f:
        pickle.dump(model, f)
else:
    with open('test_model.pkl', 'rb') as f:    ##这个就是读取模型文件## 追加用'w+'
    model = pickle.load(f)

### 方法二（用sklearn里面的工具）：
from sklearn.externals import joblib
# 保存
joblib.dump(model, 'test.pkl')
# 加载
estimator = joblib.load('test.pkl')

3、模型评价的指标：

分类模型的有：accuracy_score（精确率）；

回归模型的有：MSE（均方误差）；

一般地，使用roc也可以；

①classification_report:

from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
## 输出的结果由精确率、召回率和f1-score

②roc_curve、roc_auc_score

from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt
logit_roc_auc = roc_auc_score(y_test, y_pred)
fpr, tpr, threshold = roc_curve(y_test, clf.predict_prob(x_test)[:,1])

plt.figure()
plt.plot(fpr, tpr, label='logit regression (area=%0.2f)'%logit_roc_auc)
plt.plot([0,1],[0,1],'r--')
plt.xlim([0.0,1.0])
plt.ylim([0.0,1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic')
plt.legend(loc='lower right')
plt.savefig('1.png')
plt.show()

4、绘制决策树图形

# conda install python-graphviz
# conda install pydotplus
from sklearn.tree import export_graphviz
from sklearn.externals.six import StringIO  
from IPython.display import Image  
import pydotplus

dot_data = StringIO()
export_graphviz(clf, out_file=dot_data,  # clf是已经建立好的决策树模型
                filled=True, rounded=True,
                special_characters=True,feature_names = feature_cols,class_names=['0','1'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())  
graph.write_png('diabetes.png')
Image(graph.create_png())

5、样本不均衡问题处理

这篇文章介绍的还算全面，包括了机器学习、视觉、NLP中样本不均衡的处理方法（炼丹笔记一：样本不平衡问题）

xiaotian127

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习模型常用技巧（持续更新中……）

1、网格搜索的套路函数（以决策树为例）：from sklearn.model_selection import GridSearchCV, StratifiedKFoldfrom sklearn.tree import DecisionTreeClassifierdef check_model(x,y): ##以决策树为例## classifier = DecisionT...
复制链接

扫一扫

专栏目录