python使用grid-search调参

博文参考:http://blog.csdn.net/abcjennifer/article/details/23884761

目标就是解决:

vectorizer取多少个word呢?

预处理时候要过滤掉tf>max_df的words,max_df设多少呢?

tfidftransformer只用tf还是加idf呢?

classifier分类时迭代几次?学习率怎么设?
……..

本文对随机梯度下降和svm(rbf)进行了调参,针对的是知网期刊的文章分类。

需要注意的是:print sorted(pipeline.get_params().keys())

pipeline = Pipeline([
(‘vect’,CountVectorizer()),
(‘tfidf’,TfidfTransformer()),
(‘clf’,svm.SVC()),
]);

parameters = {
“clf__C”:[0.1, 1, 10],
“clf__gamma”: [1, 0.1, 0.01]

}
名字要对应。
随机梯度下降结果如下:

*************************
Feature Extraction
*************************
Performing grid search...
('pipeline:', ['vect', 'tfidf', 'clf'])
parameters:
{'clf__n_iter': (10, 50), 'clf__alpha': (1e-05, 1e-06), 'tfidf__use_idf': (True, False), 'vect__max_features': (None, 5000, 10000), 'vect__max_df': (0.5, 0.75)}
Fitting 3 folds for each of 48 candidates, totalling 144 fits
[Parallel(n_jobs=1)]: Done  49 tasks       | elapsed:  1.0min
[Parallel(n_jobs=1)]: Done 144 out of 144 | elapsed:  3.1min finished
done in 188.100s
()
Best score: 0.848
    clf__alpha: 1e-05
    clf__n_iter: 50
    tfidf__use_idf: True
    vect__max_df: 0.5
    vect__max_features: None
svm结果如下:
*************************
Feature Extraction
*************************
['clf', 'clf__C', 'clf__cache_size', 'clf__class_weight', 'clf__coef0', 'clf__decision_function_shape', 'clf__degree', 'clf__gamma', 'clf__kernel', 'clf__max_iter', 'clf__probability', 'clf__random_state', 'clf__shrinking', 'clf__tol', 'clf__verbose', 'steps', 'tfidf', 'tfidf__norm', 'tfidf__smooth_idf', 'tfidf__sublinear_tf', 'tfidf__use_idf', 'vect', 'vect__analyzer', 'vect__binary', 'vect__decode_error', 'vect__dtype', 'vect__encoding', 'vect__input', 'vect__lowercase', 'vect__max_df', 'vect__max_features', 'vect__min_df', 'vect__ngram_range', 'vect__preprocessor', 'vect__stop_words', 'vect__strip_accents', 'vect__token_pattern', 'vect__tokenizer', 'vect__vocabulary']
Fitting 3 folds for each of 9 candidates, totalling 27 fits
[Parallel(n_jobs=1)]: Done  27 out of  27 | elapsed:  9.6min finished
The best parameters are {'clf__gamma': 1, 'clf__C': 10} with a score of 0.85
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值