6. Parameter Server

最新推荐文章于 2022-06-22 16:28:16 发布

weixin_zdpau

最新推荐文章于 2022-06-22 16:28:16 发布

阅读量119

点赞数

分类专栏： Ray 文章标签：深度学习 python 机器学习

本文链接：https://blog.csdn.net/weixin_43938663/article/details/118685692

版权

Ray 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

本文档将介绍如何使用actors来实现简单的同步和异步参数服务器。要运行该应用程序，首先安装一些依赖项。

pip install tensorflow

你可以查看这个例子(链接)的代码。这些示例可以运行如下。

# Run the asynchronous parameter server.
python ray/examples/parameter_server/async_parameter_server.py --num-workers=4

# Run the synchronous parameter server.
python ray/examples/parameter_server/sync_parameter_server.py --num-workers=4

请注意，这些示例使用分布式的actor操作符，这些操作符仍被视为实验性的。

Asynchronous Parameter Server

异步参数服务器本身作为一个actor来实现，它expose了push和pull方法。

@ray.remote
class ParameterServer(object):
    def __init__(self, keys, values):
        values = [value.copy() for value in values]
        self.weights = dict(zip(keys, values))

    def push(self, keys, values):
        for key, value in zip(keys, values):
            self.weights[key] += value

    def pull(self, keys):
        return [self.weights[key] for key in keys]

然后我们定义一个工作任务，它将参数服务器作为参数并向其提交任务。代码的结构如下所示。

@ray.remote
def worker_task(ps):
    while True:
        # Get the latest weights from the parameter server.
        weights = ray.get(ps.pull.remote(keys))

        # Compute an update.
        ...

        # Push the update to the parameter server.
        ps.push.remote(keys, update)

然后我们可以创建一个参数服务器并开始如下训练。

ps = ParameterServer.remote(keys, initial_values)
worker_tasks = [worker_task.remote(ps) for _ in range(4)]

synchronous Parameter Server

参数服务器被实现为actor，它expose了方法apply_gradients和get_weights。通过根据worker的数量缩放学习速率来应用恒定线性缩放规则。（The parameter server is implemented as an actor, which exposes the methods apply_gradients and get_weights. A constant linear scaling rule is applied by scaling the learning rate by the number of workers.）

@ray.remote
class ParameterServer(object):
    def __init__(self, learning_rate):
        self.net = model.SimpleCNN(learning_rate=learning_rate)

    def apply_gradients(self, *gradients):
        self.net.apply_gradients(np.mean(gradients, axis=0))
        return self.net.variables.get_flat()

    def get_weights(self):
        return self.net.variables.get_flat()

worker是expose方法compute_gradients的actor。(Workers are actors which expose the method compute_gradients.)

@ray.remote
class Worker(object):
    def __init__(self, worker_index, batch_size=50):
        self.worker_index = worker_index
        self.batch_size = batch_size
        self.mnist = input_data.read_data_sets("MNIST_data", one_hot=True,
                                               seed=worker_index)
        self.net = model.SimpleCNN()

    def compute_gradients(self, weights):
        self.net.variables.set_flat(weights)
        xs, ys = self.mnist.train.next_batch(self.batch_size)
        return self.net.compute_gradients(xs, ys)

在给定参数服务器当前权重的情况下，计算梯度之间进行交替训练，并使用所产生的梯度更新参数服务器的权重。

while True:
    gradients = [worker.compute_gradients.remote(current_weights)
                 for worker in workers]
    current_weights = ps.apply_gradients.remote(*gradients)

这两个示例都使用单个actor服务器，但是他们可以很容易地扩展为跨多个参与者分解参数。(shard the parameters across multiple actors.)

weixin_zdpau

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
6. Parameter Server

本文档将介绍如何使用actors来实现简单的同步和异步参数服务器。要运行该应用程序，首先安装一些依赖项。pip install tensorflow你可以查看这个例子(链接)的代码。这些示例可以运行如下。# Run the asynchronous parameter server.python ray/examples/parameter_server/async_parameter_server.py --num-workers=4# Run the synchronous parame
复制链接

扫一扫

专栏目录