### -------------------------------------------------------------------------
Basic Class:
1 shape: kDimension, kSubDim, shape_[]
size(), slice(), subShape()
2 stream: <gpu> stream_, blas_handle_, blas_handle_ownership_
dnn_handle_, dnn_handle_ownership_
3 Tensor: kDevCPU, kSubDim, *dptr_, shape_, stride_, stream_
MemSize()
4 Saver: Save(DType &a, DType b) {a = b}
plusto, minusto, multo, divto
5 Red(reduce): sum, maximum, minimum
6 Expression engine:
7 IStream: Read()
Write()
SaveBinary()
LoadBinary()
8 TensorContainer: pad_, data_
9 TShape: ndim_, num_heap_allocated_, data_stack_[kStackCache], kStackCache, *data_heap_
10 TBlob: *dptr_, shape_, stride_, dev_mask_, type_flag_
### -------------------------------------------------------------------------
namespace mshadow-ps
1 ThreadPQueue: use_fifo, pqueue_, fqueue_, lock_, counter_
push(), pop(), abort()
2 ThreadSafeMap: lock, map
### -------------------------------------------------------------------------
Some public method:
1 InitTensorEngine(int device_id = 0):
2 ShutdownTensorEngine(void): nothing
3 SetDevice(int devid):
4 NewStream<Device>(true, false):
5 DeleteStream(Stream<Device> *stream):
6 AllocSpace(cpu/gpu): attention pitch!
7 FreeSpace(cpu/gpu):
8 NewTensor(): ??
9 Copy(cpu <-> gpu): wrap cudaMemcpy2D()
10 Softmax(cpu/gpu):
11 SoftmaxGrad(cpu/gpu):
12 MapExp(cpu/gpu):
13 MapReduceKeepLowest(cpu/gpu):
14 MapReduceKeepHighDim(cpu/gpu):
15 VectorDot(dst, lhs, rhs):
### -------------------------------------------------------------------------
Basic Class:
1 shape: kDimension, kSubDim, shape_[]
size(), slice(), subShape()
2 stream: <gpu> stream_, blas_handle_, blas_handle_ownership_
dnn_handle_, dnn_handle_ownership_
Wait(): cudaStreamSynchronize(stream_)
CheckIdle(): cudaStreamQuery(stream_);
GetStream():
3 Tensor: kDevCPU, kSubDim, *dptr_, shape_, stride_, stream_
MemSize()
4 Saver: Save(DType &a, DType b) {a = b}
plusto, minusto, multo, divto
5 Red(reduce): sum, maximum, minimum
6 Expression engine:
6.1 type: kRValue, kMapper, kChainer, kComplex
6.2 ScalarExp
TypecastExp
TransposeExp
RValueExp
DotExp: Eval() --> BLASEngine<xpu, type>gemm()
BLASEngine: gemm(), gemv(), ger(), dot()
BLASEngine<cpu, float>: gemm(), gemv(), ger(), dot()
BLASEngine<cpu, double>: gemm(), gemv(), ger(), dot()
BLASEngine<gpu, float>: gemm(), gemv(), ger(), dot()
BLASGngine<gpu, double>: gemm(), gemv(), ger(), dot()
BinaryMapExp:
UnaryMapExp:
7 IStream: Read()
Write()
SaveBinary()
LoadBinary()
8 TensorContainer: pad_, data_
9 TShape: ndim_, num_heap_allocated_, data_stack_[kStackCache], kStackCache, *data_heap_
10 TBlob: *dptr_, shape_, stride_, dev_mask_, type_flag_
### -------------------------------------------------------------------------
namespace mshadow-ps
1 ThreadPQueue: use_fifo, pqueue_, fqueue_, lock_, counter_
push(), pop(), abort()
2 ThreadSafeMap: lock, map
### -------------------------------------------------------------------------
Some public method:
1 InitTensorEngine(int device_id = 0):
cpu: nothing to do
gpu: check device_id valid, setDevice(device_id)
2 ShutdownTensorEngine(void): nothing
3 SetDevice(int devid):
4 NewStream<Device>(true, false):
5 DeleteStream(Stream<Device> *stream):
6 AllocSpace(cpu/gpu): attention pitch!
7 FreeSpace(cpu/gpu):
8 NewTensor(): ??
9 Copy(cpu <-> gpu): wrap cudaMemcpy2D()
10 Softmax(cpu/gpu):
11 SoftmaxGrad(cpu/gpu):
12 MapExp(cpu/gpu):
13 MapReduceKeepLowest(cpu/gpu):
14 MapReduceKeepHighDim(cpu/gpu):
15 VectorDot(dst, lhs, rhs):
### -------------------------------------------------------------------------