深度学习并行运算原理以及 keras实现GPU并行

_刘文凯_

已于 2023-02-10 16:32:57 修改

阅读量2k

点赞数

分类专栏：深度学习文章标签：深度学习 tensorflow 人工智能

于 2021-11-05 14:31:16 首次发布

本文链接：https://blog.csdn.net/qq_24211837/article/details/121161725

版权

深度学习专栏收录该内容

36 篇文章 16 订阅

订阅专栏

有多个GPU进行并行运算可以分为数据并行和模型并行

模型并行： 不同的 GPU 训练模型的不同部分，比较适合神经元活动比较丰富的计算。
数据并行： 不同的 GPU训练不同的数据案例，比较适合权重矩阵比较多的计算。

1、数据并行

数据并行比较简单，例如有2个GPU那么一次性读取1个bitch_size，平均分成2份，分别输入2个GPU,计算得到的梯度取平均，然后更新2个GPU的参数，一般采用一个GPU为主GPU作为梯度计算GPU:
图片来源：https://blog.csdn.net/xsc_c/article/details/42420167

keras 实现：

from keras.utils import multi_gpu_model

# 返回一个并行模型
parallel_model = multi_gpu_model(model, gpus=8)
parallel_model.compile(loss='categorical_crossentropy',
                       optimizer='rmsprop')

# This `fit` call will be distributed on 8 GPUs.
# Since the batch size is 256, each GPU will process 32 samples.
parallel_model.fit(x, y, epochs=20, batch_size=256)

或者：

import tensorflow as tf
from keras.models import Sequential
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
    # This could be any kind of model -- Functional, subclass...
    model = tf.keras.Sequential([
        tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
        tf.keras.layers.GlobalMaxPooling2D(),
        tf.keras.layers.Dense(10)
    ])
    model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  optimizer=tf.keras.optimizers.Adam(),
                  metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
                
model.fit(train_dataset, epochs=12, callbacks=callbacks)

2、模型并行

将模型一个模型放到不同GPU上进行运算：

图片来源：https://blog.csdn.net/xsc_c/article/details/42420167
keras与tf实现：

import tensorflow as tf
import keras
# Model where a shared LSTM is used to encode two different sequences in parallel
input_a = keras.Input(shape=(140, 256))
input_b = keras.Input(shape=(140, 256))

shared_lstm = keras.layers.LSTM(64)

# Process the first sequence on one GPU
with tf.device_scope('/gpu:0'):
    encoded_a = shared_lstm(input_a)
# Process the next sequence on another GPU
with tf.device_scope('/gpu:1'):
    encoded_b = shared_lstm(input_b)

# Concatenate results on CPU
with tf.device_scope('/cpu:0'):
    merged_vector = keras.layers.concatenate(
        [encoded_a, encoded_b], axis=-1)

参考资料：
https://blog.csdn.net/Lin_RD/article/details/97675912
图片来源：
https://blog.csdn.net/xsc_c/article/details/42420167

_刘文凯_

关注

0
点赞
踩
10

收藏

觉得还不错? 一键收藏
打赏
0
评论
深度学习并行运算原理以及 keras实现GPU并行

有多个GPU进行并行运算可以分为数据并行和模型并行模型并行：不同的 GPU 训练模型的不同部分，比较适合神经元活动比较丰富的计算。数据并行：不同的 GPU训练不同的数据案例，比较适合权重矩阵比较多的计算。1、数据并行数据并行比较简单，例如有2个GPU那么一次性读取1个bitch_size，平均分成2份，分别输入2个GPU,计算得到的梯度取平均，然后更新2个GPU的参数，一般采用一个GPU为主GPU作为梯度计算GPU:keras 实现：from keras.utils import mult
复制链接

扫一扫