Hyperopt

最新推荐文章于 2024-06-25 12:01:50 发布

hyperminer

最新推荐文章于 2024-06-25 12:01:50 发布

阅读量2.8k

点赞数

分类专栏：机器学习 python 算法

本文链接：https://blog.csdn.net/zhangweijiqn/article/details/53214239

版权

python 同时被 3 个专栏收录

20 篇文章 0 订阅

订阅专栏

机器学习

18 篇文章 0 订阅

订阅专栏

算法

18 篇文章 1 订阅

订阅专栏

  hyper parameter 调参框架： 

  optunity： 
 http://optunity.readthedocs.io/en/latest/index.html 

 
 Hyperopt 

 
 Hyperopt是一个python库，结合MongoDB可以进行分布式调参，快速找到相对较优的参数。安装的时候需要指定dev版本才能使用模拟退火调参，也支持暴力调参、随机调参等策略。 

 
 http://jaberg.github.io/hyperopt/ 

  更详细信息参考： 
 http://iopscience.iop.org/article/10.1088/1749-4699/8/1/014008/pdf 

  安装比较简单： 

  pip install hyperopt 

  hyperopt用到了bson(PyMongo专用的一种二进制json格式)，安装 bson: 

  pip install bson 

  python -m pip install pymongo (需要mongoDB来存储。 
 http://api.mongodb.com/python/current/installation.html) 

  测试代码, test.py: 

  # define an objective function 

  def objective(args): 

  case, val = args 

  if case == 'case 1': 

  return val 

  else: 

  return val ** 2 

  # define a search space 

  from hyperopt import hp 

  space = hp.choice('a', 

[

  ('case 1', 1 + hp.lognormal('c1', 0, 1)), 

  ('case 2', hp.uniform('c2', -10, 10)) 

])

  # minimize the objective over the space 

  from hyperopt import fmin, tpe 

  best = fmin(objective, space, algo=tpe.suggest, max_evals=100) 

  print best 

  # -> {'a': 1, 'c2': 0.01420615366247227} 

  import hyperopt 

  print hyperopt.space_eval(space, best) 

  # -> ('case 2', 0.01420615366247227} 

 
 Algorithms 

 
 Currently two algorithms are implemented in hyperopt: 

Random Search (algo=hyperopt.random.suggest)
Tree of Parzen Estimators (TPE) (algo=hyperopt.tpe.suggest)

 
 Hyperopt has been designed to accommodate Bayesian optimization algorithms based on Gaussian processes and regression trees, but these are not currently implemented. 

 
 All algorithms can be run either serially, or in parallel by communicating via  
 MongoDB 
 . 
 using mongodb for parallel search 
 . 

 
 Tutorial文档： 

 
 https://github.com/jaberg/hyperopt/wiki/FMin 

  Fmin的参数是: 

 
 The way to use hyperopt is to describe: 

the objective function to minimize
the space over which to search
the database in which to store all the point evaluations of the search
the search algorithm to use

 
 在[-10,10]区间内搜索x^2的最小值： 

 
 import 
  pickle 
 import 
  time 
 from 
  hyperopt 
 import 
  fmin, tpe, hp, 
 STATUS_OK 
 def 
 objective 
 (x): 
 return 
  { 
 'loss' 
 : x  
 ** 
 2 
 , 
 'status' 
 : 
 STATUS_OK 
  }best 
 = 
  fmin(objective, 
 space 
 = 
 hp.uniform( 
 'x' 
 , 
 - 
 10 
 , 
 10 
 ), 
 algo 
 = 
 tpe.suggest, 
 max_evals 
 = 
 100 
 ) 
 print 
  best 

 
 #{'x': 0.028499577187215047} 

  还可以增加在objective中增加其他参数来记录详细信息： 

  The Trials Object 

  To really see the purpose of returning a dictionary, let's modify the objective function to return some more things, and pass an explicit trials argument to fmin. 

 
 import 
  pickle 
 import 
  time 
 from 
  hyperopt 
 import 
  fmin, tpe, hp, 
 STATUS_OK 
 , Trials 
 def 
 objective 
 (x): 
 return 
  { 
 'loss' 
 : x 
 ** 
 2 
 , 
 'status' 
 : 
 STATUS_OK 
 , 
 # -- store other results like this 
 'eval_time' 
 : time.time(), 
 'other_stuff' 
 : { 
 'type' 
 : 
 None 
 , 
 'value' 
 : [ 
 0 
 , 
 1 
 , 
 2 
 ]}, 
 # -- attachments are handled differently 
 'attachments' 
 : { 
 'time_module' 
 : pickle.dumps(time.time)} } trials  
 = 
  Trials() best 
 = 
  fmin(objective, 
 space 
 = 
 hp.uniform( 
 'x' 
 , 
 - 
 10 
 , 
 10 
 ), 
 algo 
 = 
 tpe.suggest, 
 max_evals 
 = 
 100 
 , 
 trials 
 = 
 trials) 
 print 
  best 

  打印下面的这些结果就可以看到中间过程的信息： 

  trials.trials - a list of dictionaries representing everything about the search 

  trials.results - a list of dictionaries returned by 'objective' during the search 

  trials.losses() - a list of losses (float for each 'ok' trial) 

  trials.statuses() - a list of status strings 

  后台v1: pyspark+auto-sklearn 效果并不理想 

  后台model实现v2: 

  （1）预处理： 

  缺失值填充：数值型-均值，中位数；枚举型-转化为1,2,3（貌似不是one-hot） 

  添加log变量 

  （2）model： grid search 

  for model_type in ['xgboost', 'extraTree', 'randomForest']: 

  #外层遍历三种树，xgboost和sklearn的两种树，先getmodel在train，getmodel中每个模型会返回model和grid,train的时候遍历grid， 

  model_class, grid = self.__get_model(model_type, self.task_type, 

  self.random_seed) # get Model and param grid to search #grid = ParameterGrid({"max_depth": [3, 5, 10] 

  score, param = self.__train(grid=grid, model_class=model_class, X=X, y=y, 

  index=index) # get the best score and corresponding param #for param in grid 

  results[score] = (model_class, param) 

  目前实现的优化: 

  GridSearch: 模型和参数的遍历优化 

  random forest树的个数优化，假设初始1000棵树，记录每棵(前100,前200,前...)的acc或者r^2，然后选择结果最优并且棵树少的参数．（同样适用于其他参数） 

  交叉验证的优化,如果开始准确率较低,下面的就不进行了. 

hyperminer

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Hyperopt

hyper parameter 调参框架：optunity：http://optunity.readthedocs.io/en/latest/index.htmlHyperoptHyperopt是一个python库，结合MongoDB可以进行分布式调参，快速找到相对较优的参数。安装的时候需要指定dev版本才能使用模拟退火调参，也支持暴力调参、随机调参等策略。http:/
复制链接

扫一扫

专栏目录