MXNet中依赖库介绍及简单使用

本文详细介绍了MXNet深度学习框架依赖的多个关键开源库,包括OpenBLAS、DLPack、MShadow、DMLC-Core、TVM、OpenCV、CUDA和cudnn。文章通过具体的测试代码展示了这些库的功能和使用方法,对于理解MXNet内部运行机制及优化策略有重要价值。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

MXNet是一种开源的深度学习框架,核心代码是由C++实现,在编译源码的过程中,它需要依赖其它几种开源库,这里对MXNet依赖的开源库进行简单的说明:

1. OpenBLAS:全称为Open Basic Linear Algebra Subprograms,是开源的基本线性代数子程序库,是一个优化的高性能多核BLAS库,主要包括矩阵与矩阵、矩阵与向量、向量与向量等操作。它的License是BSD-3-Clause,可以商用,目前最新的发布版本是0.3.3。它的源码放在GitHub上,由张先轶老师等持续维护。

OpenBLAS是由中科院软件所并行软件与计算科学实验室发起的基于GotoBLAS2 1.13 BSD版的开源BLAS库高性能实现。

BLAS是一个应用程序接口(API)标准,用以规范发布基础线性代数操作的数值库(如矢量或矩阵乘法)。该程序集最初发布于1979年,并用于建立更大的数值程序包(如LAPACK)。在高性能计算领域,BLAS被广泛使用。

测试代码如下(openblas_test.cpp):

#include "openblas_test.hpp"
#include <iostream>
#include <cblas.h>

int test_openblas_1()
{
	int th_model = openblas_get_parallel();
	switch (th_model) {
	case OPENBLAS_SEQUENTIAL:
		printf("OpenBLAS is compiled sequentially.\n");
		break;
	case OPENBLAS_THREAD:
		printf("OpenBLAS is compiled using the normal threading model\n");
		break;
	case OPENBLAS_OPENMP:
		printf("OpenBLAS is compiled using OpenMP\n");
		break;
	}

	int n = 2;
	double* x = (double*)malloc(n*sizeof(double));
	double* upperTriangleResult = (double*)malloc(n*(n + 1)*sizeof(double) / 2);

	for (int j = 0; j<n*(n + 1) / 2; j++)
		upperTriangleResult[j] = 0;

	x[0] = 1; x[1] = 3;

	cblas_dspr(CblasRowMajor, CblasUpper, n, 1, x, 1, upperTriangleResult);
	double*& A = upperTriangleResult;
	std::cout << A[0] << "\t" << A[1] << std::endl << "*\t" << A[2] << std::endl;

	free(upperTriangleResult);
	free(x);

	return 0;
}

执行结果如下:

2. DLPack:仅有一个头文件dlpack.h。DLPack是一种开放的内存张量(tensor)结构,用于在不同框架之间共享张量,如Tensorflow, PyTorch, NXNet,不发生任何数据复制或拷贝。

dlpack.h文件中包括两个枚举类型,四个结构体:

枚举类型DLDeviceType:支持的设备类型包括CPU、CUDA GPU、OpenCL、Apple GPU、AMD GPU等。

枚举类型DLDataTypeCode:支持的数据类型包括有符号int、无符号int、float。

结构体DLContext:A Device context for Tensor and operator,数据成员包括设备类型和设备id。

结构体DLDataType:tensor支持的数据类型,数据成员包括code基本类型,值必须为DLDataTypeCode支持的;位数(bits)可以是8,16,32;类型的lanes数。

结构体DLTensor:tensor对象,不管理内存。数据成员包括数据指针(void*)、DLContext、维数、DLDataType、tensor的shape、tensor的stride、数据开始指针的字节偏移量。

结构体DLManagedTensor:管理DLTensor内存。

3. MShadow:全称Matrix Shadow,用C++/CUDA实现的轻量级的CPU/GPU矩阵和tensor模板库。它的文件全部为.h或.cuh,使用时直接include即可。注意:如果在工程属性预处理器定义中没有加入MSHADOW_STAND_ALONE,则需要包括额外的CBLASMKLCUDA的支持。如果不依赖其它库,定义MSHADOW_STAND_ALONE,则会导致有些函数没有实现,如dot_engine-inl.h中,函数体中会包括语句:LOG(FATAL) << “Not implemented!”;

这里为了测试不开启MSHADOW_STAND_ALONE宏,仅开启MSHADOW_USE_CBLAS宏。

测试代码如下(mshadow_test.cpp):

#include "mshadow_test.hpp"
#include <iostream>
#include <cmath>
#include "mshadow/tensor.h"

// reference: mshadow source code: mshadow/guide

int test_mshadow_1()
{
	// intialize tensor engine before using tensor operation
	mshadow::InitTensorEngine<mshadow::cpu>();

	// assume we have a float space
	float data[20];
	// create a 2 x 5 x 2 tensor, from existing space
	mshadow::Tensor<mshadow::cpu, 3> ts(data, mshadow::Shape3(2, 5, 2));
	// take first subscript of the tensor
	mshadow::Tensor<mshadow::cpu, 2> mat = ts[0];
	// Tensor object is only a handle, assignment means they have same data content
	// we can specify content type of a Tensor, if not specified, it is float bydefault
	mshadow::Tensor<mshadow::cpu, 2, float> mat2 = mat;
	mat = mshadow::Tensor<mshadow::cpu, 1>(data, mshadow::Shape1(10)).FlatTo2D();

	// shaape of matrix, note size order is same as numpy
	fprintf(stdout, "%u X %u matrix\n", mat.size(0), mat.size(1));

	// initialize all element to zero
	mat = 0.0f;
	// assign some values
	mat[0][1] = 1.0f; mat[1][0] = 2.0f;
	// elementwise operations
	mat += (mat + 10.0f) / 10.0f + 2.0f;

	// print out matrix, note: mat2 and mat1 are handles(pointers)
	for (mshadow::index_t i = 0; i < mat.size(0); ++i) {
		for (mshadow::index_t j = 0; j < mat.size(1); ++j) {
			fprintf(stdout, "%.2f ", mat2[i][j]);
		}
		fprintf(stdout, "\n");
	}

	mshadow::TensorContainer<mshadow::cpu, 2> lhs(mshadow::Shape2(2, 3)), rhs(mshadow::Shape2(2, 3)), ret(mshadow::Shape2(2, 2));
	lhs = 1.0;
	rhs = 1.0;
	ret = mshadow::expr::implicit_dot(lhs, rhs.T());
	mshadow::VectorDot(ret[0].Slice(0, 1), lhs[0], rhs[0]);
	fprintf(stdout, "vdot=%f\n", ret[0][0]);
	int cnt = 0;
	for (mshadow::index_t i = 0; i < ret.size(0); ++i) {
		for (mshadow::index_t j = 0; j < ret.size(1); ++j) {
			fprintf(stdout, "%.2f ", ret[i][j]);
		}
		fprintf(stdout, "\n");
	}
	fprintf(stdout, "\n");

	for (mshadow::index_t i = 0; i < lhs.size(0); ++i) {
		for (mshadow::index_t j = 0; j < lhs.size(1); ++j) {
			lhs[i][j] = cnt++;
			fprintf(stdout, "%.2f ", lhs[i][j]);
		}
		fprintf(stdout, "\n");
	}
	fprintf(stdout, "\n");

	mshadow::TensorContainer<mshadow::cpu, 1> index(mshadow::Shape1(2)), choosed(mshadow::Shape1(2));
	index[0] = 1; index[1] = 2;
	choosed = mshadow::expr::mat_choose_row_element(lhs, index);
	for (mshadow::index_t i = 0; i < choosed.size(0); ++i) {
		fprintf(stdout, "%.2f ", choosed[i]);
	}
	fprintf(stdout, "\n");

	mshadow::TensorContainer<mshadow::cpu, 2> recover_lhs(mshadow::Shape2(2, 3)), small_mat(mshadow::Shape2(2, 3));
	small_mat = -100.0f;
	recover_lhs = mshadow::expr::mat_fill_row_element(small_mat, choosed, index);
	for (mshadow::index_t i = 0; i < recover_lhs.size(0); ++i) {
		for (mshadow::index_t j = 0; j < recover_lhs.size(1); ++j) {
			fprintf(stdout, "%.2f ", recover_lhs[i][j] - lhs[i][j]);
		}
	}
	fprintf(stdout, "\n");

	rhs = mshadow::expr::one_hot_encode(index, 3);

	for (mshadow::index_t i = 0; i < lhs.size(0); ++i) {
		for (mshadow::index_t j = 0; j < lhs.size(1); ++j) {
			fprintf(stdout, "%.2f ", rhs[i][j]);
		}
		fprintf(stdout, "\n");
	}
	fprintf(stdout, "\n");
	mshadow::TensorContainer<mshadow::cpu, 1> idx(mshadow::Shape1(3));
	idx[0] = 8;
	idx[1] = 0;
	idx[2] = 1;

	mshadow::TensorContainer<mshadow::cpu, 2> weight(mshadow::Shape2(10, 5));
	mshadow::TensorContainer<mshadow::cpu, 2> embed(mshadow::Shape2(3, 5));

	for (mshadow::index_t i = 0; i < weight.size(0); ++i) {
		for (mshadow::index_t j = 0; j < weight.size(1); ++j) {
			weight[i][j] = i;
		}
	}
	embed = mshadow::expr::take(idx, weight);
	for (mshadow::index_t i = 0; i < embed.size(0); ++i) {
		for (mshadow::index_t j = 0; j < embed.size(1); ++j) {
			fprintf(stdout, "%.2f ", embed[i][j]);
		}
		fprintf(stdout, "\n");
	}
	fprintf(stdout, "\n\n");
	weight = mshadow::expr::take_grad(idx, embed, 10);
	for (mshadow::index_t i = 0; i < weight.size(0); ++i) {
		for (mshadow::index_t j = 0; j < weight.size(1); ++j) {
			fprintf(stdout, "%.2f ", weight[i][j]);
		}
		fprintf(stdout, "\n");
	}

	fprintf(stdout, "upsampling\n");

#ifdef small
#undef small
#endif

	mshadow::TensorContainer<mshadow::cpu, 2> small(mshadow::Shape2(2, 2));
	small[0][0] = 1.0f;
	small[0][1] = 2.0f;
	small[1][0] = 3.0f;
	small[1][1] = 4.0f;
	mshadow::TensorContainer<mshadow::cpu, 2> large(mshadow::Shape2(6, 6));
	large = mshadow::expr::upsampling_nearest(small, 3);
	for (mshadow::index_t i = 0; i < large.size(0); ++i) {
		for (mshadow::index_t j = 0; j < large.size(1); ++j) {
			fprintf(stdout, "%.2f ", large[i][j]);
		}
		fprintf(stdout, "\n");
	}
	small = mshadow::expr::pool<mshadow::red::sum>(large, small.shape_, 3, 3, 3, 3);
	// shutdown tensor enigne after usage
	for (mshadow::index_t i = 0; i < small.size(0); ++i) {
		for (mshadow::index_t j = 0; j < small.size(1); ++j) {
			fprintf(stdout, "%.2f ", small[i][j]);
		}
		fprintf(stdout, "\n");
	}

	fprintf(stdout, "mask\n");
	mshadow::TensorContainer<mshadow::cpu, 2> mask_data(mshadow::Shape2(6, 8));
	mshadow::TensorContainer<mshadow::cpu, 2> mask_out(mshadow::Shape2(6, 8));
	mshadow::TensorContainer<mshadow::cpu, 1> mask_src(mshadow::Shape1(6));

	mask_data = 1.0f;
	for (int i = 0; i < 6; ++i) {
		mask_src[i] = static_cast<float>(i);
	}
	mask_out = mshadow::expr::mask(mask_src, mask_data);
	for (mshadow::index_t i = 0; i < mask_out.size(0); ++i) {
		for (mshadow::index_t j = 0; j < mask_out.size(1); ++j) {
			fprintf(stdout, "%.2f ", mask_out[i][j]);
		}
		fprintf(stdout, "\n");
	}

	mshadow::ShutdownTensorEngine<mshadow::cpu>();

	return 0;
}

// user defined unary operator addone
struct addone {
	// map can be template function
	template<typename DType>
	MSHADOW_XINLINE static DType Map(DType a) {
		return  a + static_cast<DType>(1);
	}
};
// user defined binary operator max of two
struct maxoftwo {
	// map can also be normal functions,
	// however, this can only be applied to float tensor
	MSHADOW_XINLINE static float Map(float a, float b) {
		if (a > b) return a;
		else return b;
	}
};

int test_mshadow_2()
{
	// intialize tensor engine before using tensor operation, needed for CuBLAS
	mshadow::InitTensorEngine<mshadow::cpu>();
	// take first subscript of the tensor
	mshadow::Stream<mshadow::cpu> *stream_ = mshadow::NewStream<mshadow::cpu>(0);
	mshadow::Tensor<mshadow::cpu, 2, float> mat = mshadow::NewTensor<mshadow::cpu>(mshadow::Shape2(2, 3), 0.0f, stream_);
	mshadow::Tensor<mshadow::cpu, 2, float> mat2 = mshadow::NewTensor<mshadow::cpu>(mshadow::Shape2(2, 3), 0.0f, stream_);

	mat[0][0] = -2.0f;
	mat = mshadow::expr::F<maxoftwo>(mshadow::expr::F<addone>(mat) + 0.5f, mat2);

	for (mshadow::index_t i = 0; i < mat.size(0); ++i) {
		for (mshadow::index_t j = 0; j < mat.size(1); ++j) {
			fprintf(stdout, "%.2f ", mat[i][j]);
		}
		fprintf(stdout, "\n");
	}

	mshadow::FreeSpace(&mat); mshadow::FreeSpace(&mat2);
	mshadow::DeleteStream(stream_);
	// shutdown tensor enigne after usage
	mshadow::ShutdownTensorEngine<mshadow::cpu>();

	return 0;
}

其中test_mshadow_2的执行结果如下:

4. DMLC-Core:全称Distributed Machine Learning Common Codebase,它是支持所有DMLC项目的基础模块,用于构建高效且可扩展的分布式机器学习通用库。

测试代码如下(dmlc_test.cpp):

#include "dmlc_test.hpp"
#include <iostream>
#include <cstdio>
#include <functional>
#include <dmlc/parameter.h>
#include <dmlc/registry.h>

// reference: dmlc-core/example and dmlc-core/test

struct MyParam : public dmlc::Parameter<MyParam> {
	float learning_rate;
	int num_hidden;
	int activation;
	std::string name;
	// declare parameters in header file
	DMLC_DECLARE_PARAMETER(MyParam) {
		DMLC_DECLARE_FIELD(num_hidden).set_range(0, 1000)
			.describe("Number of hidden unit in the fully connected layer.");
		DMLC_DECLARE_FIELD(learning_rate).set_default(0.01f)
			.describe("Learning rate of SGD optimization.");
		DMLC_DECLARE_FIELD(activation).add_enum("relu", 1).add_enum("sigmoid", 2)
			.describe("Activation function type.");
		DMLC_DECLARE_FIELD(name).set_default("mnet")
			.describe("Name of the net.");

		// user can also set nhidden besides num_hidden
		DMLC_DECLARE_ALIAS(num_hidden, nhidden);
		DMLC_DECLARE_ALIAS(activation, act);
	}
};

// register it in cc file
DMLC_REGISTER_PARAMETER(MyParam);

int test_dmlc_parameter()
{
	int argc = 4;
	char* argv[4] = {
#ifdef _DEBUG
		"E:/GitCode/MXNet_Test/lib/dbg/x64/ThirdPartyLibrary_Test.exe",
#else
		"E:/GitCode/MXNet_Test/lib/rel/x64/ThirdPartyLibrary_Test.exe",
#endif
		"num_hidden=100",
		"name=aaa",
		"activation=relu"
	};

	MyParam param;
	std::map<std::string, std::string> kwargs;
	for (int i = 0; i < argc; ++i) {
		char name[256], val[256];
		if (sscanf(argv[i], "%[^=]=%[^\n]", name, val) == 2) {
			kwargs[name] = val;
		}
	}
	fprintf(stdout, "Docstring\n---------\n%s", MyParam::__DOC__().c_str());

	fprintf(stdout, "start to set parameters ...\n");
	param.Init(kwargs);
	fprintf(stdout, "-----\n");
	fprintf(stdout, "param.num_hidden=%d\n", param.num_hidden);
	fprintf(stdout, "param.learning_rate=%f\n", param.learning_rate);
	fprintf(stdout, "param.name=%s\n", param.name.c_str());
	fprintf(stdout, "param.activation=%d\n", param.activation);

	return 0;
}

namespace tree {
	struct Tree {
		virtual void Print() = 0;
		virtual ~Tree() {}
	};

	struct BinaryTree : public Tree {
		virtual void Print() {
			printf("I am binary tree\n");
		}
	};

	struct AVLTree : public Tree {
		virtual void Print() {
			printf("I am AVL tree\n");
		}
	};
	// registry to get the trees
	struct TreeFactory
		: public dmlc::FunctionRegEntryBase<TreeFactory, std::function<Tree*()> > {
	};

#define REGISTER_TREE(Name)                                             \
  DMLC_REGISTRY_REGISTER(::tree::TreeFactory, TreeFactory, Name)        \
  .set_body([]() { return new Name(); } )

	DMLC_REGISTRY_FILE_TAG(my_tree);

}  // namespace tree

// usually this sits on a seperate file
namespace dmlc {
	DMLC_REGISTRY_ENABLE(tree::TreeFactory);
}

namespace tree {
	// Register the trees, can be in seperate files
	REGISTER_TREE(BinaryTree)
		.describe("This is a binary tree.");

	REGISTER_TREE(AVLTree);

	DMLC_REGISTRY_LINK_TAG(my_tree);
}

int test_dmlc_registry()
{
	// construct a binary tree
	tree::Tree *binary = dmlc::Registry<tree::TreeFactory>::Find("BinaryTree")->body();
	binary->Print();
	// construct a binary tree
	tree::Tree *avl = dmlc::Registry<tree::TreeFactory>::Find("AVLTree")->body();
	avl->Print();

	delete binary;
	delete avl;

	return 0;
}

其中test_dmlc_parameter的执行结果如下:

5. TVM:深度学习系统的编译器堆栈(compiler stack)。它旨在缩小深度学习框架与以性能、效率为重点的硬件后端之间的差距。它与深度学习框架协同工作,为不同的后端提供端到端的编译。TVM除了依赖dlpack、dmlc-core外,还依赖HalideIR。而且编译TVM时,一大堆C2440、C2664错误,即无法从一种类型转换为另一种类型的错误。因为在编译MXNet源码时,目前MXNet仅需要tvm源码nnvm/src下的c_api, core, pass三个目录的文件参与编译,因此后面再调试TVM库。

6. OpenCV:可选的,编译过程可参考: https://blog.csdn.net/fengbingchun/article/details/84030309  

7. CUDA和cudnn:可选的,编译过程可参考:https://blog.csdn.net/fengbingchun/article/details/53892997

GitHub:  https://github.com/fengbingchun/MXNet_Test

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值