-
tensorflow 分布式脚本启动方式:
https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/hadoop.md -
在分布式程序执行代码中使用multiprocessing 分别启动ps, master, worker
multiprocessing.Process(target=start_dist, args=(params, ps_index, 'ps', '')).start() time.sleep(10.0) # 添加后解决
启动过程中报出错误: at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /root/Deles/pipeline.config (inode 59875): File does not exist. Holder DFSClient_NONMAPREDUCE_184200389_1 does not have any open files.
-
分析:错误可能原因,多个进程读取创建同一个目录导致 :https://www.cnblo