hopkinsyang-CSDN博客

排序：: 按最后发布时间; 按访问量; RSS订阅

空空如也

Training Deeper Models by GPU Memory Optimization on TensorFlow

With the advent of big data, easy-to-get GPGPU and progresses in neural network modeling techniques, training deep learning model on GPU becomes a popular choice. However, due to the inherent complexity of deep learning models and the limited memory resources on modern GPUs, training deep models is still a nontrivial task, especially when the model size is too big for a single GPU. In this paper, we propose a general dataflow-graph based GPU memory optimization strategy, i.e.,“swap-out/in”, to utilize host memory as a bigger memory pool to overcome the limitation of GPU memory. Meanwhile, to optimize the memory-consuming sequence-to-sequence (Seq2Seq) models, dedicated optimization strategies are also proposed. These strategies are integrated into TensorFlow seamlessly without accuracy loss. In the extensive experiments, significant memory usage reductions are observed. The max training batch size can be increased by 2 to 30 times given a fixed model and system configuration.

2020-01-10

Distributed TensorFlow with MPI.pdf

Machine Learning and Data Mining (MLDM) algorithms are becoming increasingly important in analyzing large volume of data generated by simulations, experiments and mobile devices. With increasing data volume, distributed memory systems (such as tightly connected supercomputers or cloud computing systems) are becoming important in designing in-memory and massively parallel MLDM algorithms. Yet, the majority of open source MLDM software is limited to sequential execution with a few supporting multi-core/manycore execution. In this paper, we extend recently proposed Google TensorFlow for execution on large scale clusters using Message Passing Interface (MPI). Our approach requires minimal changes to the TensorFlow runtime – making the proposed implementation generic and readily usable to increasingly large users of TensorFlow. We evaluate our implementation using an InﬁniBand cluster and several well known datasets. Our evaluation indicates the eﬃciency of our proposed implementation.

2020-01-09

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

hopkinsyang的博客

空空如也

Training Deeper Models by GPU Memory Optimization on TensorFlow

Distributed TensorFlow with MPI.pdf

CUDA优化2.pptx

虚拟与离散变量回归模型.pdf

Tensorflow XLA详解.pdf

空空如也