自动化机器学习tpot多核加速对比(附代码)

tpot是利用遗传算法,自动生成机器学习pipeline的自动化机器学习库。你可以输入你的训练集数据,并配置好遗传算法的参数,代码会自动给您训练出来一套pipeline。本文展示的代码分别用三种方法调用了tpot(串行,并行,dask),供大家交流参考。

import dask.array as da
from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np
from dask.distributed import Client, LocalCluster

import time

#load data
iris = load_iris()
#split training set and test set
X_train, X_test, y_train, y_test = train_test_split(iris.data, 
    iris.target, train_size=0.75, test_size=0.25, random_state=42)

#=========================>>串行运行tpot
#set simple tpot method 
tpot = TPOTClassifier(generations=5, population_size=50, verbosity=0, random_state=42, n_jobs=1, use_dask=False)
time_start = time.time()
#use tpot to train
tpot.fit(X_train, y_train)
time_end = time.time()
print("time to use tpot on 1 core without dask is ", time_end - time_start, " s.\n")

print("Fitting score : ", tpot.score(X_test, y_test))
tpot.export('tpot_iris_pipeline.py')

#=========================>>并行运行tpot, 此处使用了参数n_jobs=-1, 代表用上计算机上的所有核
#set simple tpot method
tpot = TPOTClassifier(generations=5, population_size=50, verbosity=0, random_state=42, n_jobs=-1, use_dask=False)
time_start = time.time()
#use tpot to train
tpot.fit(X_train, y_train)
time_end = time.time()
print("time to use tpot on cores without dask is ", time_end - time_start, " s.\n")

print("Fitting score : ", tpot.score(X_test, y_test))
tpot.export('tpot_iris_pipeline.py')

#==========================>>调用dask运行tpot
#set dask client
client = Client(LocalCluster(processes=False, threads_per_worker=1, n_workers=4))

#set tpot method with dask
tpot_d = TPOTClassifier(generations=5, population_size=50, verbosity=0, random_state=42, n_jobs=-1, use_dask=True)
time_start = time.time()
#use tpot to train
tpot_d.fit(X_train, y_train)
time_end = time.time()
print("time to use tpot on cores with dask is ", time_end - time_start, " s.\n")

print("Fitting score : ", tpot_d.score(X_test, y_test))
tpot_d.export('tpot_dask_iris_pipeline.py')

这是在本人电脑上运行输出的结果(本人电intel酷睿i7-1165G7八核),可以看到调用多核情况下训练速度确实比单核好了一些,但是跟dask共用的时候反而慢了,但是这并不能说明dask对性能有影响,或许dask的主要应用场景是集群计算。

time to use tpot on 1 core without dask is  97.58220362663269  s.
Fitting score :  0.9736842105263158

time to use tpot on cores without dask is  62.37632417678833  s.
Fitting score :  0.9736842105263158

time to use tpot on cores with dask is  151.76679229736328  s.
Fitting score :  1.0

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值