Hyperopt

18 篇文章 0 订阅
18 篇文章 1 订阅
hyper parameter 调参框架:

Hyperopt
Hyperopt是一个python库,结合MongoDB可以进行分布式调参,快速找到相对较优的参数。安装的时候需要指定dev版本才能使用模拟退火调参,也支持暴力调参、随机调参等策略。
安装比较简单:
pip install hyperopt
hyperopt用到了bson(PyMongo专用的一种二进制json格式),安装 bson:
pip install bson
python -m pip install pymongo (需要mongoDB来存储。 http://api.mongodb.com/python/current/installation.html)
测试代码, test.py:
# define an objective function
def objective(args):
case, val = args
if case == 'case 1':
return val
else:
return val ** 2

# define a search space
from hyperopt import hp
space = hp.choice('a',
[
('case 1', 1 + hp.lognormal('c1', 0, 1)),
('case 2', hp.uniform('c2', -10, 10))
])

# minimize the objective over the space
from hyperopt import fmin, tpe
best = fmin(objective, space, algo=tpe.suggest, max_evals=100)

print best
# -> {'a': 1, 'c2': 0.01420615366247227}
import hyperopt
print hyperopt.space_eval(space, best)
# -> ('case 2', 0.01420615366247227}

Algorithms
Currently two algorithms are implemented in hyperopt:
  • Random Search (algo=hyperopt.random.suggest)
  • Tree of Parzen Estimators (TPE) (algo=hyperopt.tpe.suggest)
Hyperopt has been designed to accommodate Bayesian optimization algorithms based on Gaussian processes and regression trees, but these are not currently implemented.
All algorithms can be run either serially, or in parallel by communicating via  MongoDB . using mongodb for parallel search .

Tutorial文档:

Fmin的参数是:
The way to use hyperopt is to describe:
  • the objective function to minimize
  • the space over which to search
  • the database in which to store all the point evaluations of the search
  • the search algorithm to use

在[-10,10]区间内搜索x^2的最小值:
import pickle import time from hyperopt import fmin, tpe, hp, STATUS_OK def objective (x): return { 'loss' : x ** 2 , 'status' : STATUS_OK }best = fmin(objective, space = hp.uniform( 'x' , - 10 , 10 ), algo = tpe.suggest, max_evals = 100 ) print best
#{'x': 0.028499577187215047}

还可以增加在objective中增加其他参数来记录详细信息:
The Trials Object
To really see the purpose of returning a dictionary, let's modify the objective function to return some more things, and pass an explicit trials argument to fmin.
import pickle import time from hyperopt import fmin, tpe, hp, STATUS_OK , Trials def objective (x): return { 'loss' : x ** 2 , 'status' : STATUS_OK , # -- store other results like this 'eval_time' : time.time(), 'other_stuff' : { 'type' : None , 'value' : [ 0 , 1 , 2 ]}, # -- attachments are handled differently 'attachments' : { 'time_module' : pickle.dumps(time.time)} } trials = Trials() best = fmin(objective, space = hp.uniform( 'x' , - 10 , 10 ), algo = tpe.suggest, max_evals = 100 , trials = trials) print best

打印下面的这些结果就可以看到中间过程的信息:
trials.trials - a list of dictionaries representing everything about the search
trials.results - a list of dictionaries returned by 'objective' during the search
trials.losses() - a list of losses (float for each 'ok' trial)
trials.statuses() - a list of status strings

后台v1: pyspark+auto-sklearn 效果并不理想
后台model实现v2:
(1)预处理:
缺失值填充:数值型-均值,中位数;枚举型-转化为1,2,3(貌似不是one-hot)
添加log变量
(2)model: grid search
for model_type in ['xgboost', 'extraTree', 'randomForest']:
#外层遍历三种树,xgboost和sklearn的两种树,先getmodel在train,getmodel中每个模型会返回model和grid,train的时候遍历grid,
model_class, grid = self.__get_model(model_type, self.task_type,
self.random_seed) # get Model and param grid to search #grid = ParameterGrid({"max_depth": [3, 5, 10]
score, param = self.__train(grid=grid, model_class=model_class, X=X, y=y,
index=index) # get the best score and corresponding param #for param in grid
results[score] = (model_class, param)
目前实现的优化:
GridSearch: 模型和参数的遍历优化
random forest树的个数优化,假设初始1000棵树,记录每棵(前100,前200,前...)的acc或者r^2,然后选择结果最优并且棵树少的参数.(同样适用于其他参数)
交叉验证的优化,如果开始准确率较低,下面的就不进行了.
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值