当我们创建好模型后,还要调整各个模型的参数,才找到最好的匹配。即使模型还可以,如果它的参数设置不匹配,同样无法输出好的结果。 常用的调参方式有Grid search 和 Random search ,Grid search 是全空间扫描,所以比较慢,Random search 虽然快,但可能错失空间上的一些重要的点,精度不够。 而Hyperopt是一种通过贝叶斯优化来调整参数的工具,该方法较快的速度,并有较好的效果。此外,Hyperopt结合MongoDB可以进行分布式调参,快速找到相对较优的参数。安装的时候需要指定dev版本才能使用模拟退火调参,也支持暴力调参、随机调参等策略。
(贝叶斯优化,又叫序贯模型优化(Sequential model-based optimization,SMBO),是最有效的函数优化方法之一。与共轭梯度下降法等标准优化策略相比,SMBO的优势有:利用平滑性而无需计算梯度;可处理实数、离散值、条件变量等;可处理大量变量并行优化。)
Let's go!!!
1 安装
pip install hyperopt
安装hyperopt时也会安装 networkx,如果在调用时出现 TypeError: 'generator' object is not subscriptable
报错,可以将其换成1.11版本。
pip uninstall networkx
pip install networkx==1.11
2 重点知识
2.1 fmin
from hyperopt import fmin, tpe, hp
best = fmin(
fn=lambda x: x,
space=hp.uniform('x', 0, 1),
algo=tpe.suggest,
max_evals=100)
print best
输出结果为:{'x': 0.0006154621520631152}
函数fmin
首先接受一个函数来最小化,记为fn
,在这里用一个函数lambda x: x
来指定。该函数可以是任何有效的值返回函数,例如回归中的平均绝对误差。
下一个参数指定搜索空间,在本例中,它是0到1之间的连续数字范围,由hp.uniform('x', 0, 1)
指定。hp.uniform
是一个内置的hyperopt
函数,它有三个参数:名称x
,范围的下限和上限0
和1
。
algo
参数指定搜索算法,本例中tpe
表示 tree of Parzen estimators。该主题超出了本文的范围,但有数学背景的读者可以细读这篇文章。algo
参数也可以设置为hyperopt.random
,但是这里我们没有涉及,因为它是众所周知的搜索策略。
最后,我们指定fmin
函数将执行的最大评估次数max_evals
。这个fmin
函数将返回一个python字典。
当我们调整max_evals=1000时,输出结果为:{'x': 3.7023587264309516e-06},可以发现结果更接近于0。
为了更好的理解,可以看下面这个更复杂一些的例子。
best = fmin(
fn=lambda x: (x-1)**2,
space=hp.uniform('x', -2, 2),
algo=tpe.suggest,
max_evals=100)
print best
输出结果为:{'x': 1.007633842139922}
2.2 space
对于变量的变化范围与取值概率,有以下几类。
看个例子,
from hyperopt import hp
import hyperopt.pyll.stochastic
space = {
'x': hp.uniform('x', 0, 1),
'y': hp.normal('y', 0, 1),
'name': hp.choice('name', ['alice', 'bob']),
}
print hyperopt.pyll.stochastic.sample(space)
输出结果为:{'y': -1.3901709472842074, 'x': 0.4335747017293238, 'name': 'bob'}
2.3 通过 Trials 捕获信息
Trials用来记录每次eval的时候,具体使用了什么参数以及相关的返回值。这时候,fn的返回值变为dict,除了loss
,还有一个status
。Trials对象将数据存储为一个BSON对象,可以利用MongoDB
做分布式运算。
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from matplotlib import pyplot as plt
fspace = {
'x': hp.uniform('x', -5, 5)
}
def f(params):
x = params['x']
val = x**2
return {'loss': val, 'status': STATUS_OK}
trials = Trials()
best = fmin(fn=f, space=fspace, algo=tpe.suggest, max_evals=50, trials=trials)
print 'best:', best
print 'trials:'
for trial in trials.trials[:2]:
print trial
对于STATUS_OK的返回,会统计它的loss值,而对于STATUS_FAIL的返回,则会忽略。
输出结果如下,
best: {'x': -0.0025882455372094326}
trials:
{'refresh_time': datetime.datetime(2018, 12, 5, 3, 5, 43, 152000), 'book_time': datetime.datetime(2018, 12, 5, 3, 5, 43, 152000), 'misc': {'tid': 0, 'idxs': {'x': [0]}, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'vals': {'x': [-2.511797855178682]}, 'workdir': None}, 'state': 2, 'tid': 0, 'exp_key': None, 'version': 0, 'result': {'status': 'ok', 'loss': 6.309128465280228}, 'owner': None, 'spec': None}
{'refresh_time': datetime.datetime(2018, 12, 5, 3, 5, 43, 153000), 'book_time': datetime.datetime(2018, 12, 5, 3, 5, 43, 153000), 'misc': {'tid': 1, 'idxs': {'x': [1]}, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'vals': {'x': [3.43836093884876]}, 'workdir': None}, 'state': 2, 'tid': 1, 'exp_key': None, 'version': 0, 'result': {'status': 'ok', 'loss': 11.822325945800927}, 'owner': None, 'spec': None}
可以通过这里面的值,把一些变量与loss的点绘图,来看匹配度。或者tid与变量绘图,看它搜索的位置收敛(非数学意义上的收敛)情况。
trials有这几种:
- trials.trials - a list of dictionaries representing everything about the search
- trials.results - a list of dictionaries returned by ‘objective’ during the search
- trials.losses() - a list of losses (float for each ‘ok’ trial)
- trials.statuses() - a list of status strings
我们可以将上述trials进行可视化,值 vs. 时间与损失 vs. 值。f, ax = plt.subplots(1) xs = [t['tid'] for t in trials.trials] ys = [t['misc']['vals']['x'] for t in trials.trials] ax.set_xlim(xs[0]-10, xs[-1]+10) ax.scatter(xs, ys, s=20, linewidth=0.01, alpha=0.75) ax.set_title('$x$ $vs$ $t$ ', fontsize=18) ax.set_xlabel('$t$', fontsize=16) ax.set_ylabel('$x$', fontsize=16)
f, ax = plt.subplots(1)
xs = [t['misc']['vals']['x'] for t in trials.trials]
ys = [t['result']['loss'] for t in trials.trials]
ax.scatter(xs, ys, s=20, linewidth=0.01, alpha=0.75)
ax.set_title('$val$ $vs$ $x$ ', fontsize=18)
ax.set_xlabel('$x$', fontsize=16)
ax.set_ylabel('$val$', fontsize=16)
3 Hyperopt应用
3.1 K近邻
需要注意的是,由于我们试图最大化交叉验证的准确率,而hyperopt
只知道如何最小化函数,所以必须对准确率取负。最小化函数f
与最大化f
的负数是相等的。
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.cross_validation import cross_val_score
from hyperopt import hp,STATUS_OK,Trials,fmin,tpe
from matplotlib import pyplot as plt
iris=load_iris()
X=iris.data
y=iris.target
def hyperopt_train(params):
clf=KNeighborsClassifier(**params)
return cross_val_score(clf,X,y).mean()
space_knn={'n_neighbors':hp.choice('n_neighbors',range(1,100))}
def f(parmas):
acc=hyperopt_train(parmas)
return {'loss':-acc,'status':STATUS_OK}
trials=Trials()
best=fmin(f,space_knn,algo=tpe.suggest,max_evals=100,trials=trials)
print 'best',best
输出结果为:best {'n_neighbors': 4}
f, ax = plt.subplots(1)#, figsize=(10,10))
xs = [t['misc']['vals']['n_neighbors'] for t in trials.trials]
ys = [-t['result']['loss'] for t in trials.trials]
ax.scatter(xs, ys, s=20, linewidth=0.01, alpha=0.5)
ax.set_title('Iris Dataset - KNN', fontsize=18)
ax.set_xlabel('n_neighbors', fontsize=12)
ax.set_ylabel('cross validation accuracy', fontsize=12)
k 大于63后,准确率急剧下降。这是因为数据集中每个类的数量。这三个类中每个类只有50个实例。所以让我们将'n_neighbors'
的值限制为较小的值来进一步探索。
'n_neighbors': hp.choice('n_neighbors', range(1,50))
重新运行后,得到的图像如下,
现在我们可以清楚地看到k的最佳值为4
。
3.2 支持向量机(SVM)
由于这是一个分类任务,我们将使用sklearn
的SVC
类。代码如下
from sklearn.datasets import load_iris
from sklearn.cross_validation import cross_val_score
from hyperopt import hp,STATUS_OK,Trials,fmin,tpe
from matplotlib import pyplot as plt
from sklearn.svm import SVC
import numpy as np
iris=load_iris()
X=iris.data
y=iris.target
def hyperopt_train_test(params):
clf =SVC(**params)
return cross_val_score(clf, X, y).mean()
space_svm = {
'C': hp.uniform('C', 0, 20),
'kernel': hp.choice('kernel', ['linear', 'sigmoid', 'poly', 'rbf']),
'gamma': hp.uniform('gamma', 0, 20),
}
def f(params):
acc = hyperopt_train_test(params)
return {'loss': -acc, 'status': STATUS_OK}
trials = Trials()
best = fmin(f, space_svm, algo=tpe.suggest, max_evals=100, trials=trials)
print 'best:',best
parameters = ['C', 'kernel', 'gamma']
cols = len(parameters)
f, axes = plt.subplots(nrows=1, ncols=cols, figsize=(20,5))
cmap = plt.cm.jet
for i, val in enumerate(parameters):
xs = np.array([t['misc']['vals'][val] for t in trials.trials]).ravel()
ys = [-t['result']['loss'] for t in trials.trials]
axes[i].scatter(xs, ys, s=20, linewidth=0.01, alpha=0.25, c=cmap(float(i)/len(parameters)))
axes[i].set_title(val)
axes[i].set_ylim([0.9, 1.0])
输出结果为:best:{'kernel': 3, 'C': 3.6332677642526985, 'gamma': 2.0192849151350796}
3.3 决策树
我们将尝试只优化决策树的一些参数。代码如下。
from sklearn.datasets import load_iris
from sklearn.cross_validation import cross_val_score
from hyperopt import hp,STATUS_OK,Trials,fmin,tpe
from matplotlib import pyplot as plt
from sklearn.tree import DecisionTreeClassifier
import numpy as np
iris=load_iris()
X=iris.data
y=iris.target
def hyperopt_train_test(params):
clf = DecisionTreeClassifier(**params)
return cross_val_score(clf, X, y).mean()
space_dt = {
'max_depth': hp.choice('max_depth', range(1,20)),
'max_features': hp.choice('max_features', range(1,5)),
'criterion': hp.choice('criterion', ["gini", "entropy"]),
}
def f(params):
acc = hyperopt_train_test(params)
return {'loss': -acc, 'status': STATUS_OK}
trials = Trials()
best = fmin(f, space_dt, algo=tpe.suggest, max_evals=300, trials=trials)
print 'best:',best
parameters = ['max_depth', 'max_features', 'criterion'] # decision tree
cols = len(parameters)
f, axes = plt.subplots(nrows=1, ncols=cols, figsize=(20,5))
cmap = plt.cm.jet
for i, val in enumerate(parameters):
xs = np.array([t['misc']['vals'][val] for t in trials.trials]).ravel()
ys = [-t['result']['loss'] for t in trials.trials]
axes[i].scatter(xs, ys, s=20, linewidth=0.01, alpha=0.25, c=cmap(float(i)/len(parameters)))
axes[i].set_title(val)
axes[i].set_ylim([0.9, 1.0])
输出结果为:best:{'max_features': 1, 'criterion': 1, 'max_depth': 13}
3.4 随机森林
让我们来看看集成分类器随机森林发生了什么,随机森林只是在不同分区数据上训练的决策树集合,每个分区都对输出类进行投票,并将绝大多数类的选择为预测。
from sklearn.datasets import load_iris
from sklearn.cross_validation import cross_val_score
from hyperopt import hp,STATUS_OK,Trials,fmin,tpe
from matplotlib import pyplot as plt
from sklearn.ensemble import RandomForestClassifier
import numpy as np
iris=load_iris()
X=iris.data
y=iris.target
def hyperopt_train_test(params):
clf = RandomForestClassifier(**params)
return cross_val_score(clf, X, y).mean()
space4rf = {
'max_depth': hp.choice('max_depth', range(1,20)),
'max_features': hp.choice('max_features', range(1,5)),
'n_estimators': hp.choice('n_estimators', range(1,20)),
'criterion': hp.choice('criterion', ["gini", "entropy"]),
}
best = 0
def f(params):
global best
acc = hyperopt_train_test(params)
if acc > best:
best = acc
print 'new best:', best, params
return {'loss': -acc, 'status': STATUS_OK}
trials = Trials()
best = fmin(f, space4rf, algo=tpe.suggest, max_evals=300, trials=trials)
print 'best:',best
parameters = ['n_estimators', 'max_depth', 'max_features', 'criterion']
f, axes = plt.subplots(nrows=1,ncols=4, figsize=(20,5))
cmap = plt.cm.jet
for i, val in enumerate(parameters):
print i, val
xs = np.array([t['misc']['vals'][val] for t in trials.trials]).ravel()
ys = [-t['result']['loss'] for t in trials.trials]
ys = np.array(ys)
axes[i].scatter(xs, ys, s=20, linewidth=0.01, alpha=0.25, c=cmap(float(i)/len(parameters)))
axes[i].set_title(val)
输出结果为:best: {'max_features': 3, 'n_estimators': 11, 'criterion': 1, 'max_depth': 2}
4 多模型调优
从众多模型和众多参数中找到最优模型及其参数
from sklearn.datasets import load_iris
from sklearn.cross_validation import cross_val_score
from hyperopt import hp,STATUS_OK,Trials,fmin,tpe
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.svm import SVC
iris=load_iris()
X=iris.data
y=iris.target
def hyperopt_train_test(params):
t = params['type']
del params['type']
if t == 'naive_bayes':
clf = BernoulliNB(**params)
elif t == 'svm':
clf = SVC(**params)
elif t == 'dtree':
clf = DecisionTreeClassifier(**params)
elif t == 'knn':
clf = KNeighborsClassifier(**params)
else:
return 0
return cross_val_score(clf, X, y).mean()
space = hp.choice('classifier_type', [
{
'type': 'naive_bayes',
'alpha': hp.uniform('alpha', 0.0, 2.0)
},
{
'type': 'svm',
'C': hp.uniform('C', 0, 10.0),
'kernel': hp.choice('kernel', ['linear', 'rbf']),
'gamma': hp.uniform('gamma', 0, 20.0)
},
{
'type': 'randomforest',
'max_depth': hp.choice('max_depth', range(1,20)),
'max_features': hp.choice('max_features', range(1,5)),
'n_estimators': hp.choice('n_estimators', range(1,20)),
'criterion': hp.choice('criterion', ["gini", "entropy"]),
'scale': hp.choice('scale', [0, 1])
},
{
'type': 'knn',
'n_neighbors': hp.choice('knn_n_neighbors', range(1,50))
}
])
count = 0
best = 0
def f(params):
global best, count
count += 1
acc = hyperopt_train_test(params.copy())
if acc > best:
print 'new best:', acc, 'using', params['type']
best = acc
if count % 50 == 0:
print 'iters:', count, ', acc:', acc, 'using', params
return {'loss': -acc, 'status': STATUS_OK}
trials = Trials()
best = fmin(f, space, algo=tpe.suggest, max_evals=1500, trials=trials)
print 'best:',best
输出结果为:best:{'kernel': 0, 'C': 1.4211568317201784, 'classifier_type': 1, 'gamma': 8.74017707300719}