tensorflow 分布式训练

最新推荐文章于 2024-07-26 13:24:37 发布

weixin_30646505

最新推荐文章于 2024-07-26 13:24:37 发布

阅读量96

点赞数

文章标签：人工智能

原文链接：http://www.cnblogs.com/xiaoniu-666/p/10916360.html

版权

TF实现分布式流程

1、创建集群 ClusterSpec & Server
    cluster = tf.train.ClusterSpec({"ps": ps_hosts, "worker": worker_hosts})
    server = tf.train.Server(cluster,  job_name=FLAGS.job_name, task_index=FLAGS.task_index)
2、设置ps节点
    tf.train.replica_device_setter(cluster=cluster)
    server.join()
3、设置worker节点
    chief 设置    
4、同步训练配置【可选】
    同步
        train.SyncReplicasOptimizer
    同步&chief
        chief_queue_runner 
5、train——session
    # sv =tf.train.Supervisor
    # sess = sv.prepare_or_wait_for_session(server.target)
    tf.train.MonitoredTrainingSession()
    # tf.train.Supervisor已经被弃用了

2、示例

MINIST_demo:

　　https://github.com/novav/mnist_distributed/tree/master

转载于:https://www.cnblogs.com/xiaoniu-666/p/10916360.html

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_30646505

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
tensorflow 分布式训练

TF实现分布式流程1、创建集群 ClusterSpec & Server cluster = tf.train.ClusterSpec({"ps": ps_hosts, "worker": worker_hosts}) server = tf.train.Server(cluster, job_name=FLAGS.job_name, task_in...
复制链接

扫一扫