MxNet
MxNet 学习交流
城俊BLOG
从此好好码代码。。
展开
-
mxnet根据相似度进行人脸样本对图片清理
背景:每个ID有ID图片和prob图片做法:加载模型,提取特征,计算id和prob的平均相似度,排序,保存import numpy as npimport osimport mxnet as mxfrom menpo.visualize import print_progressfrom collections import namedtupleBatch = namedtuple('Batch', ['data'])def constructmodel(prefix, ctx, epo原创 2021-06-21 17:22:49 · 204 阅读 · 2 评论 -
人脸识别模型训练加速
Turing架构,设置fp16=True,对精度可能有影响使用partial fc,设置config.sample_rate < 1 比如 0.5, 0.1之类的mxnet module.init_optimizer(kvstore=‘device’)增加batch_size直到训练速度(# sample/secode)不再增加,gpu-util在90%左右...原创 2021-06-01 16:31:35 · 231 阅读 · 0 评论 -
joint learning
用途:domain差别较大的数据集之前联合训练的方法方法:同一个backbone,不同的classifier。大的数据集训练n 个batch,小的数据集训练1个batchlr steps的确定:根据实现而不同。一种是将两种数据集中的样本数量加总之后计算总的数据量,然后根据 10,20,30个epoch去就计算lr每次下降所需的训练steps个数Loss nan的问题:...原创 2021-06-01 10:36:46 · 686 阅读 · 0 评论 -
pytorch mxnet ValueError: too many dimensions ‘NDArray‘
报错:Traceback (most recent call last): File "topFAR_COX_py2_fc1_pytorch_PY3.py", line 199, in <module> IDimage_features_dict = getfeatures_dict(model, IDimage_list, IDimage_path, featurelen) File "topFAR_COX_py2_fc1_pytorch_PY3.py", line 106,原创 2021-04-09 11:36:38 · 569 阅读 · 0 评论 -
mxnet使用mxboard可视化模型权重参数报错:No handlers could be found for logger “mxboard.event_file_writer“
No handlers could be found for logger “mxboard.event_file_writer”解决:$ pip install tensorboard在代码中加入:import logginglogging.basicConfig(level=logging.DEBUG)https://github.com/reminisce/mxboard-demo/blob/master/train_mnist.py原创 2021-03-29 17:11:58 · 122 阅读 · 0 评论 -
NN训练问题debug,看loss知问题
1, loss很快下降到0附近,后面不再动弹。验证集准确率始终保持在50%-60%附近图:问题:可能是传入的gt标签有错误,比如每次传入的都是同一个标签。需检查原创 2021-03-12 19:58:14 · 156 阅读 · 0 评论 -
mxnet报错TypeError: type <class ‘mxnet.initializer.InitDesc‘> not supported
手动写mxnet预测代码对单个图像进行预测时,报错:Traceback (most recent call last): File "/home/user1/pjs/frvt/arcface_Siamese_offline/recognition/tools/eval_on_train_set.py", line 163, in <module> fc7_mod.init_params(arg_params=fc7_overall, aux_params=None) File原创 2021-03-11 23:17:37 · 157 阅读 · 3 评论 -
mxnet报错 Check failed: dshp.ndim() == 4U (3 vs. 4) : Input data should be 4D in batch-num_filter-y-x
报错:mxnet.base.MXNetError: Error in operator conv0: [17:40:27] src/operator/nn/convolution.cc:152: Check failed: dshp.ndim() == 4U (3 vs. 4) : Input data should be 4D in batch-num_filter-y-x明明输入数据是4维的,为什么报错?因为用collections.namedtuple装载数据,进行前向预测时,没有给data外面加原创 2021-03-11 23:11:49 · 637 阅读 · 0 评论 -
mxnet/module/base_module.py“, line 855, in forward raise NotImplementedError()
Traceback (most recent call last): File "/home/user1/pjs/frvt/mask/arcface_Siamese_offline/recognition/train_0305.py", line 497, in <module> main() File "/home/user1/pjs/frvt/mask/arcface_Siamese_offline/recognition/train_0305.py", line 493,原创 2021-03-10 17:41:19 · 186 阅读 · 0 评论 -
pycharm mxnet src/base.cc:49: GPU context requested, but no GPUs found.
报错:src/base.cc:49: GPU context requested, but no GPUs found.src/storage/storage.cc???? Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: CUDA: no CUDA-capable device is detected[14:07:55] src/base.cc:49: GPU context requested, but no GPUs原创 2021-03-10 13:50:43 · 779 阅读 · 0 评论 -
mxnet各种归一化:batch norm, l2 norm和mxnet.nd.L2Normalization
https://mxnet.apache.org/versions/1.6/api/r/docs/api/mx.nd.norm.htmlhttps://mxnet.apache.org/versions/1.6/api/r/docs/api/mx.nd.L2Normalization.html原创 2021-03-09 22:35:32 · 627 阅读 · 0 评论 -
mxnet loss nan accuracy 0,模型提取特征输出为0
nan表示结果太大,正无穷或者负无穷可能的原因:1,学习率太大,比如应该从0.1改为0.01或0.001。如果还不行,那应该是代码有问题2, 你loss算的不对,导致太大3, 你梯度算的不对,如果loss是手动构建的, 请手动推导梯度计算公式,并正确实现...原创 2021-03-08 18:05:32 · 464 阅读 · 0 评论 -
手动求梯度,二范数的偏导,二范数平方的偏导
二范数的偏导二范数平方的偏导d(||x||^2)/d(x) = 2x原创 2021-03-08 16:27:10 · 4794 阅读 · 0 评论 -
mxnet加载模型 IndexError: list index out of range
loading ['/data/user1/log/frvt/models/r34_webface_mask/siamese/siamese34-arcface-webfaceSiamese/model-0009.params']Traceback (most recent call last): File "train_0305.py", line 447, in <module> main() File "train_0305.py", line 443, in main原创 2021-03-07 21:53:35 · 292 阅读 · 0 评论 -
mxnet stream_gpu-inl.h:62: Check failed: e == cudaSuccess: CUDA: unspecified launch failure Stack tr
Traceback (most recent call last): File "train_parall_fc7.py", line 409, in <module> main() File "train_parall_fc7.py", line 406, in main train_net(args) File "train_parall_fc7.py", line 401, in train_net epoch_end_callback = epoch_原创 2021-02-18 23:10:20 · 365 阅读 · 1 评论 -
mxnet构建的图和生成的图不一致,mxnet可视化模型结构
坑爹的同事的代码逻辑混乱mxnet在最后生成计算图的时候,会自动修剪掉那些最后没有用到其输出的分支,即使你之前已经构建过了。所以你会发现一个神奇的事情:你明明构建了那部分计算图最后可视化和get_symbol的时候却没有!!!!...原创 2021-02-05 00:39:58 · 276 阅读 · 0 评论 -
import mxnet error: ImportError: cannot import name _LIB,ImportError: cannot import name Union
报错多种多样,在不同路径下报错不同:其中一种报错:ImportError: cannot import name _LIB$ pythonPython 2.7.18 (default, Aug 4 2020, 11:16:42)[GCC 9.3.0] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> import mxnetTraceback (most re原创 2021-02-03 17:59:21 · 3096 阅读 · 0 评论 -
mxnet可视化模型中间层feature map输出
注:model输入 112x112保存的图片可能是白色的(这个还没有修复),但是在pycharm中运行时可以通过scientific tool窗口看到#构造辅助函数做预处理, 注意mxnet中为通道在前格式即BCHW, 输入时要对通道维度调整,#其预训练模型采用减均值除方差的标准化预处理(均值标准差使用imagenet数据集的[0.485, 0.456, 0.406], [0.229, 0.224, 0.225])#mxnet使用专有数据类型nd.arrayimport cv2from翻译 2021-01-30 20:53:30 · 404 阅读 · 0 评论 -
cv2.warpPerspective error: (-215:Assertion failed) _src.total() > 0
你传入的矩阵是空的,可能是文件路径不对。重新检查一下。https://github.com/penincillin/DREAM/issues/5翻译 2021-01-04 15:47:21 · 2742 阅读 · 0 评论 -
从mxnet .rec文件中解压出图片数组,报错:TypeError: __call__(): incompatible function arguments. The following argum
目的:从mxnet train.rec文件中解压还原出图片数组,送入dlib做人脸检测报错:TypeError: call(): incompatible function arguments. The following argument types are supported:1. (self: _dlib_pybind11.fhog_object_detector, image: array, upsample_num_times: int=0) -> _dlib_pybind11.rec原创 2021-01-04 15:18:27 · 4209 阅读 · 1 评论 -
mxnet ndarray ValueError: setting an array element with a sequence.
ValueError: setting an array element with a sequence.详细:Traceback (most recent call last): File "C:\Program Files\JetBrains\PyCharm 2020.2.1\plugins\python\helpers\pydev\pydevd.py", line 1448, in _exec pydev_imports.execfile(file, globals, locals)原创 2020-12-10 13:54:02 · 661 阅读 · 0 评论 -
mxnet src/imperative/./imperative_utils.h:72: Check failed: inputs[i]->ctx().dev_mask() == ctx.dev_m
mxnet 1.6 自定义OP (计算metric)训练报错:src/imperative/./imperative_utils.h:72: Check failed: inputs[i]->ctx().dev_mask() == ctx.dev_mask() (1 vs. 2) : Operator broadcast_add require all inputs live on the same context. But the first argument is on gpu(0) while原创 2020-12-09 15:27:34 · 310 阅读 · 0 评论 -
mxnet OSError: libnvrtc.so.11.0: cannot open shared object file: No such file or directory
import mxnet报错:Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/user1/.local/lib/python3.6/site-packages/mxnet/__init__.py", line 23, in <module> from .context import Context, current_context, c原创 2020-11-26 23:55:06 · 5814 阅读 · 1 评论 -
mx.metric.CrossEntropy()里面的坑
默认加了一个非常小的常数,为了防止求对数的时候,真数部分为0,一般输入时不受影响。但当输入非常小的时候,输出变化非常大。以下两段代码在一般输入(概率值不是极其小)时等效。from mxnet import ndloss = nd.mean(-nd.pick(prob, label).log())和ce = mx.metric.CrossEntropy()ce.update(global_label, prob_softmax)loss = ce.get()[1]在输入极其小时,不等效。原创 2020-11-26 11:40:25 · 297 阅读 · 0 评论 -
mxnet softmaxOutput softmaxActivation softmax_cross_entropy
真是晕,搞一堆名字相近的API。真不知道咋想的,你搞出来了新的,麻烦就把旧的废弃掉好不好。。。。mx.symbol.SoftmaxOutput计算交叉熵损失相对于softmax输出的梯度。1). 根据网络输出计算softmax输出 (网络输出通过指数计算为核心的softmax函数转成0~1之间的概率分布)2). 根据softmax输出和label计算交叉熵损失 (基于两种概率分布计算交叉熵损失)3). 根据交叉熵损失和labelmx.symbol.SoftmaxActivation将s原创 2020-11-12 11:10:30 · 647 阅读 · 7 评论 -
MXNetError: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node at 0-th output: expe
报错:MXNetError: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node at 0-th output: expected [51249,512], got [59643,512]这种奇奇怪怪的报错其实很头疼,mxnet给出的提示根本很难定位到具体代码,详细:Traceback (most recent call last): File "train_0723.py", line 494, in原创 2020-11-09 20:43:58 · 1279 阅读 · 0 评论 -
mxnet 1.7.0 ubuntu 18.04训练报错:AttributeError: ‘FaceImageIter‘ object has no attribute ‘provide_data‘
Description: Ubuntu 18.04.5 LTSCodename: bionicpython: 3.7.7出错部分的简略代码:from image_iter import FaceImageItertrain_dataiter = FaceImageIter(...)train_dataiter = mx.io.PrefetchingIter(train_dataiter)报错有两条:AttributeError: ‘FaceImageIter’ obj原创 2020-11-07 23:49:11 · 287 阅读 · 0 评论 -
mxnet simple_bind error inconsistent shape
构建模型后bind出错,提示shape不对。Error in operator fullyconnected0: Shape inconsistent, Provided=[1024,xxx], inferred shape=(1024,xxx)Provided: 模型中定义的inferred: 实际运行时产生的(可能infer的时候你才知道是多少)可能你给网络层定义了权重,是权重的shape不对导致的。那就调整网络层权重的shape,保证provide和infer的shape中第二位的xxx保持原创 2020-11-05 16:34:46 · 851 阅读 · 0 评论 -
mxnet InferShape pass cannot decide shapes for the following arguments (xxx means unknown dimensions
mxnet构建模型时报错:InferShape pass cannot decide shapes for the following arguments (xxx means unknown dimensions). Please consider providing them as inputs:xxx引起报错的代码:sym_input = mx.symbol.Variable('sym_input',)原因:你定义的占位符mx.symbol.Variable没有指定一个空的shape解决原创 2020-11-05 14:46:43 · 314 阅读 · 0 评论 -
mxnet recordio读取rec文件pos = ctypes.c_size_t(self..报错 KeyError
Traceback (most recent call last): File "image_iter.py", line 42, in <module> C = FaceImageIter(20, (3,112,112), path_imgrec='/home/user1/data/deepglint/train.rec') File "image_iter.py", line 39, in __init__ s = self.imgrec.read_idx(0)原创 2020-11-04 23:05:25 · 673 阅读 · 0 评论 -
mxnet ubuntu18.04 python3.7.7 cuda10.1 AttributeError: module ‘mxnet‘ has no attribute xxx
报错:$ pythonPython 3.7.7 (default, Mar 26 2020, 15:48:22)[GCC 7.3.0] :: Anaconda, Inc. on linuxType "help", "copyright", "credits" or "license" for more information.>>> import mxnet as mx>>> mx.cpuAttributeError: module 'mxnet' has原创 2020-10-26 16:28:04 · 1113 阅读 · 0 评论 -
ModuleNotFoundError: No module named ‘mxnet‘
明明装了mxnet。死活就是找不到:>>> import mxnetTraceback (most recent call last): File "<stdin>", line 1, in <module>ModuleNotFoundError: No module named 'mxnet'原因:之前你可能装过mxnet-mkl之类的库,然后卸载得不干净。导致了已有的mxnet-cu90损坏。如:$ pip uninstall mxnet-mk原创 2020-09-17 11:41:51 · 5397 阅读 · 0 评论 -
mxnet Segmentation fault: 11 libmxnet.so(+0x40c6b50)
Segmentation fault: 11Stack trace: [bt] (0) /data/user1/pkgs/conda/envs/drc/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x40c6b50) [0x7f2dfe4f8b50] [bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f2edc06b4b0] [bt] (2) /lib/x86_64-linux-gn原创 2020-09-07 16:37:01 · 1044 阅读 · 0 评论 -
mxnet sklearn ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64‘).
Traceback (most recent call last): File "train_0723.py", line 455, in <module> main() File "train_0723.py", line 451, in main train_net(args) File "train_0723.py", line 445, in train_net epoch_end_callback=epoch_cb) File "/home/us原创 2020-08-17 11:39:41 · 555 阅读 · 0 评论 -
mxnet训练卡死 重启后依然无法解决
3480 root 20 0 0 0 0 R 100.0 0.0 11:53.46 UVM GPU1 BH1724 root -51 0 0 0 0 R 100.0 0.0 11:59.19 irq/88-nvidia原创 2020-08-17 11:15:13 · 1104 阅读 · 0 评论 -
MxNet base.h:459: Check failed: e == cudaSuccess (30 vs. 0) : CUDA: unknown error nvidia-smi显卡ERR!
命令输出缓慢,并且报ERR1,重启机器2,重装驱动3,维修或换显卡原创 2020-08-09 11:33:27 · 1221 阅读 · 0 评论 -
mxnet mshadow/./stream_gpu-inl.h:62: Check failed: e == cudaSuccess: CUDA: an illegal memory access
报错:Traceback (most recent call last): File "train_0723.py", line 455, in <module> main() File "train_0723.py", line 451, in main train_net(args) File "train_0723.py", line 445, in train_net epoch_end_callback=epoch_cb) File "/hom原创 2020-08-03 09:56:50 · 745 阅读 · 0 评论 -
mxnet load生成的json模型告警src/nnvm/legacy_json_util.cc:204: Warning: loading symbol saved by MXNet versio
mxnet生成模型,load警告src/nnvm/legacy_json_util.cc:204: Warning: loading symbol saved by MXNet version 10600 with lower version of MXNet模型生成代码:save_model_json.pyimport syssys.path.append(r'/home/user1/recognition')from config import configimport osimpor原创 2020-07-31 11:33:56 · 594 阅读 · 0 评论 -
mxnet Check failed CUDA: unknown error simple_bind
mxnet 1.6.0 运行报错:...cuda error:Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: unknown error....Traceback (most recent call last):File "/home/user1/anaconda3/lib/python3.x/site-packages/mxnet/symbol/symbol.py", line 1488, in s原创 2020-07-31 11:26:38 · 679 阅读 · 0 评论 -
src/storage/storage.cc: Compile with USE_CUDA=1 to enable GPU usage【MXNET GPU版】
【环境】Win10一、想在GPU上运行MXNET,报错如下:MXNetError: [23:23:48] src/storage/storage.cc:xxx: Compile with USE_CUDA=1 to enable GPU usage【原因】安装的是cpu版的mxnet,不是gpu版的【解决】卸载原来cpu版本的mxnet(如果猜的没错,你是通过pip insta...原创 2019-05-11 15:11:29 · 12132 阅读 · 12 评论