使用TF2.2运行tensorboard callback,希望存些profile进行分析,报以下错误:
Error during training
E
tensorflow
/core/profiler/internal/
gpu
/
cupti_tracer
.cc:1408] function
cupti_interface
_->Subscribe( &subscriber_, (
CUpti_CallbackFunc
)
ApiCallback
, this)failed with error
CUPTI_ERROR_NOT_INITIALIZED
function
cupti_interface
_->
ActivityRegisterCallbacks
(
AllocCuptiActivityBuffer
,
FreeCuptiActivityBuffer
)failed with error
CUPTI_ERROR_NOT_INITIALIZED
function
cupti_interface
_->
EnableCallback
( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API,
cbid
)failed with error CUPTI_ERROR_INVALID_PARAMETER
function
cupti_interface
_->Subscribe( &subscriber_, (
CUpti_CallbackFunc
)
ApiCallback
, this)failed with error
CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
function
cupti_interface
_->
ActivityRegisterCallbacks
(
AllocCuptiActivityBuffer
,
FreeCuptiActivityBuffer
)failed with error
CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
function
cupti_interface
_->
EnableCallback
( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API,
cbid
)failed with error
CUPTI_ERROR_INVALID_PARAMETER
tensorboard 打开没有kernel的信息
运行环境:
Linux Ubuntu 16.04
TensorFlow 2.2.0 (使用conda install 安装)
python 3.6.12
CUDA 10.1
cudnn 7.6.5
查到一些帖子
简单来说原因是nvidia driver 限制GPU performance的读取,以防信息被窃取. 不开放权限给 ‘user’
以下两种方法较为方便 (第一种对我没有效果,第二种可以)
1.
adding
options
nvidia
“
NVreg_RestrictProfilingToAdminUsers
=0”
to /
etc
/
modprobe.d
/
nvidia
-kernel-
common.conf
, and reboot should resolve the permission
issue (doesn’t work)
nvidia
-kernel-
common.conf
is a
readonly
file
2.
run training in
sudo
mode
sudo /opt/anaconda3/
envs
/TF2_2/bin/python (
work
)