mxnet代码剖析之--Executor篇

最新推荐文章于 2020-10-28 15:53:37 发布

mydear_11000

最新推荐文章于 2020-10-28 15:53:37 发布

阅读量3.8k

点赞数

分类专栏： mxnet

本文链接：https://blog.csdn.net/mydear_11000/article/details/51191590

版权

mxnet 专栏收录该内容

15 篇文章 5 订阅

订阅专栏

enum DataEntryType

kBindByExternal /// 数据块由外部绑定
kTobeBindByExternal /// 数据块将被外部绑定
kInternalAllocated /// 数据块由内部申请，执行完成后需要释放
kNotInitialized /// 数据块由执行器内部管理，还没有初始化，

/// ------------------------------------------------------
struct DataEntryInfo

NDArray data; /// 实际数据存储区入口
OpReqType op_req; /// 数据操作模式，kNullOp, kWriteTo, kWriteInplace, kAddTo
int inplace_op_id; /// 数据块的io操作节点序号
DataEntryType type; /// 数据入口类型
TShape shape; /// 数据入口shape定义
int type_flag /// 数据入口的type
GraphStorageAllocator::StorageID storage_id; /// 如果内部申请数据块，记录存储区id
uint32_t temp_ref_count; /// 数据块引用计数器，初始化时用于标记被其它符号使用的次数
uint32_t ref_count; /// 数据块的实际引用计数器

/// ------------------------------------------------------
struct OpExecEntry

Engine::AsyncFn exec_fun; /// 操作执行函数入口
std::vector<Engine::VarHandle> use_vars; /// 只读变量
std::vector<Engine::VarHandle> mutate_vars; /// 读写变量，需要与其它操作串行

/// ------------------------------------------------------
struct OpNode

bool activated; /// 本节点是否被激活，非激活节点不需要实际操作
Context ctx; /// 操作节点运行时上下文，cpu/gpu
std::vector<DataEntryInfo> outputs; /// 操作节点的输入数据入口,对于反向计算，输出变量数等于正向计算时输入参数个数
std::vector<DataEntryInfo> aux_states; /// 操作节点的辅助状态
std::shared_ptr<Operator> op; /// 实际操作函数指针
OpContext op_ctx; /// 操作函数上下文，Forward and Backward, This is the superset of RunContext.
OpExecEntry cached_exec; ///

/// ------------------------------------------------------
class GraphExecutor

StaticGraph graph_; /// 计算图，对应caffe中的net定义，对于正反向计算，包括正向反向计算所涉及的所有节点(变量节点与操作节点)
std::vector<uint32_t> topo_order_; /// 网络中各节点的topo排序，包括正反向计算节点排序，确保反向计算节点位于前向计算之后
bool enable_inplace_allocation_; /// 是否允许inplace操作， such as writeInplace, ...
size_t total_allocated_bytes_; /// 共申请存储空间大小bytes
size_t total_allocated_temp_; /// 共申请临时存储空间大小bytes
size_t num_forward_nodes_; /// 网络中共有前向计算节点个数
std::vector<uint32_t> head_grad_nodes_; /// 如果反向计算，需要梯度的节点集
std::map<uint32_t, uint32_t> mirror_source_map_; /// 镜向网络结构
std::vector<StaticGraph::DataEntry> arg_grads_; /// 如果反向计算时，存储梯度计算时需要的变量节点
std::vector<OpNode> op_nodes_; /// 网络计算节点集,包含正反向计算的所有节点，序号由topo_order指定
std::vector<NDArray> heads_ndarray_; /// 输出节点内部申请空间的集合
std::shared_ptr<GraphStoragePool> shared_mem_; ///

符号/网络执行器！

函数说明：
1 Init(Symbol symbol /// python定义的网络计算图
         , const Context& default_ctx /// 默认上下文定义
         , const std::map<std::string, Context>& ctx_map
         , const std::vector<NDArray> &in_args /// 输入符号变量集
         , const std::vector<NDArray> &arg_grad_store /// 梯度变量集
         , const std::vector<OpReqType> &grad_req_type /// 梯度计算时内存操作类型定义，对应arg_grad_store变量
         , const std::vector<NDArray> &aux_states /// 辅助状态
         , Executor* shared_exec = nullptr):

/// ------------------------------------------
/// 构建包括正反向传播的完整计算图
1.1 InitGraph(const Symbol &symbol
                       , const Context& default_ctx
                       , const std::map<std::string, Context>& ctx_map
                       , const std::vector<NDArray> &in_args
                       , const std::vector<NDArray> &arg_grad_store
                       , const std::vector<OpReqType> &grad_req_type
                       , bool need_backward)

1.1.0 主要构建topo_order排序集，并且为每个节点指定上下文环境！
1.1.1 create staticGraph from symbol
1.1.2 if(backward) create mirror symbol, setup mirror_source_map_
1.1.3 为所有的变量定制上下文，调用函数AssignContext，其中反向梯度变量的上下文应该与正向节点上下文一致，如果输入节点与操作节点上下文不一致，需要生成辅助copy节点
1.1.4 构建所有节点的topo排序，生成topo_order集，

/// ------------------------------------------
/// 以下部分主要完成存储资源申请与绑定
/// arg节点(input, arg_grad)
1.2 InitDataEntryInfo(const std::vector<NDArray> &in_args /// 正向输入变量节点
                                    , const std::vector<NDArray> &arg_grad_store /// 梯度变量集
                                    , const std::vector<OpReqType> &grad_req_type /// 梯度操作类型
                                    , const std::vector<NDArray> &aux_states) /// 其它辅助状态

1.2.0 完成所有输入输出变量属性设置与推导，包括引用计数器，激活状态等
1.2.1 bind inputs 所有的输入变量节点，其操作输出数据属性定义为kBindByExternal，data = in_args[i]，并且节点属性设置为激活状态，统计引用计数器
1.2.2 bind grads 对于梯度变量，其操作输出数据属性定义为kBindByExternal，data = arg_grad_store[i], 并且节点属性设置为激活状态，统计引用计数器
1.2.3 反向topo_sort_遍历，如果节点激活，由当前节点的所有输入节点标记为激活，并且输入节点的引用计数器+1
1.2.4 推导所有节点的shape, type等信息
1.2.5 bind aux args

/// output变量
1.3 InitDataEntryMemory()

1.3.0 为操作节点的output开辟空间
1.3.1 为所有激活的非变量节点开辟空间，本节点的所有输入节点应该完成初始化，所有输出节点类型 != kInternalAllocated
1.3.2 判断是否可以Inplace操作，如果满足条件，设置output为inplace操作
1.3.3 其它所有未初始化outpu变量指定为kInternalAllocated
1.3.4 释放多余的Input/output内存(引用计数器为0,并且属性为kInternalAllocated)
1.3.5 完成所有节点output变量的实际内存申请（属性为kInternalAllocated），构建heads_ndarray_数据集

/// 临时存储空间
1.4 InitResources()

1.4.0 为操作节点申请其它临时存储空间
1.4.1 为每个节点申请临时存储空间（节点激活，非变量）
1.4.2 构建高并发图，对于串行执行的节点，试着利用临时存储空间，达到节省内存的目的

/// ------------------------------------------
/// 生成实际操作节点
1.5 InitOpNodes()

1.5.0
1.5.1 对每个操作节点，根据input_shape, input_type,推导 output_shape, output_type
1.5.2 绑定实际操作函数与输入输出变量，如果支持操作缓冲，从执行引擎生成缓冲执行操作子

/// ------------------------------------------------------
2 void RunOps(bool is_train, size_t topo_start, size_t topo_end)

2.0 完成正反演实际计算（from topo_start to topo_end）
2.1 如果当前操作是拷贝操作，调用 ndarray->CopyFromTo函数，实际数据在各设备之间的异步拷贝。注拷贝操作仅支持一个输入一个输出
2.2 push 实际操作到执行引擎

mydear_11000

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
mxnet代码剖析之--Executor篇

enum DataEntryTypekBindByExternal /// 数据块由外部绑定kTobeBindByExternal /// 数据块将被外部绑定kInternalAllocated /// 数据块由内部申请，执行完成后需要释放kNotInitialized /// 数据块由执行器内部管理，还没有初始化，/// -----------------------
复制链接

扫一扫

专栏目录