Scene-Graph-Benchmark环境配置过程(基于ubuntu18.04,cuda11.1,cudnn8.0.5,torch1.8.1)
安装cuda11.1和cudnn8.0.5
由于已经2025年了,太旧的cuda适应不了新的显卡,太新的cuda搞不定旧的环境,踩了很多坑,采取这个方案,在ubuntu18.04上进行配置。cuda11.1和cudnn8.0.5不讲了,直接参考https://blog.csdn.net/m0_71087087/article/details/135828903
安装torch-1.8.1和torchvision-0.9.1
anaconda创建虚拟环境,不必须但是建议
conda create -n sgbm python=3.7
然后把下面的语句加入.bashrc文件的最后一行,这样打开新的终端会自动进入创建的虚拟环境中
conda activate sgbm
可以用迅雷把这两个包下载下来,使用离线安装的方法,会比较快
https://download.pytorch.org/whl/cu111/torch-1.8.1%2Bcu111-cp37-cp37m-linux_x86_64.whl
https://download.pytorch.org/whl/cu111/torchvision-0.9.1%2Bcu111-cp37-cp37m-linux_x86_64.whl
然后在离线安装包所在的目录运行下面的命令。
pip install torch-1.8.1+cu111-cp37-cp37m-linux_x86_64.whl torchvision-0.9.1+cu111-cp37-cp37m-linux_x86_64.whl torchaudio==0.8.1 torchtext==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html -i https://pypi.tuna.tsinghua.edu.cn/simple
-i https://pypi.tuna.tsinghua.edu.cn/simple可以使用清华镜像源加速。
不想安装torchaudio==0.8.1 torchtext==0.9.1的可以直接把这部分删掉。
基本上到这里torch-1.8.1和torchvision-0.9.1环境就搞定了。
配置Scene-Graph-Benchmark
上面安装cuda11.1的配置里面好像没有设置CUDA_HOME,我.bashrc里面的CUDA相关配置是这样的
export PATH=/usr/local/cuda-11.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda-11.1 #/usr/local/cuda
- 安装依赖包
pip install ipython scipy h5py ninja yacs cython matplotlib tqdm opencv-python -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install overrides -i https://pypi.tuna.tsinghua.edu.cn/simple
别问为什么overrides要单独安装,不知道,反正放一起报错了。
- 安装cocoapi,这个网上很多细节教程,我直接使用的下面命令。
pip install pycocotools -i https://pypi.tuna.tsinghua.edu.cn/simple
喜欢细节的可以参考https://blog.csdn.net/gaoqing_dream163/article/details/112554621
安装apex
cd到准备放置apex的目录
git clone https://github.com/NVIDIA/apex.git
cd apex
git reset --hard 3fe10b5597ba14a748ebb271a6ab97c09c5701ac
python setup.py install --cuda_ext --cpp_ext
这时候可能会报错
File "/home/ps/anaconda3/envs/sgbm/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1683, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
在终端运行下面命令打开错误所在文件。把1631行左右的的command = ['ninja', '-v']改成command = ['ninja', '--version']。
gedit /home/ps/anaconda3/envs/sgbm/lib/python3.7/site-packages/torch/utils/cpp_extension.py
再次执行python setup.py install --cuda_ext --cpp_ext就安装成功了。
到这里环境配置就完成了
Scene-Graph-Benchmark 编译安装
cd到你希望放置这个工程的目录,然后运行。
git clone https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch.git
cd scene-graph-benchmark
python setup.py build develop
执行这一步有人会报错,这个错我在其中一台电脑上解决了,当时忘记记录,在另外一台电脑没解决。就把之前电脑行的那个/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc下面的东西全部拷贝到另一个电脑对应的目录下了,然后再执行 python setup.py build develop就成功了。知道怎么解决的同学欢迎评论,我将十分感谢。
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cpu/ROIAlign_cpu.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cpu/nms_cpu.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cuda/ROIAlign_cuda.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cuda/ROIPool_cuda.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cuda/SigmoidFocalLoss_cuda.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cuda/deform_conv_cuda.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cuda/deform_conv_kernel_cuda.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cuda/deform_pool_cuda.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cuda/deform_pool_kernel_cuda.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cuda/nms.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/vision.o: 没有那个文件或目录
error: command '/usr/bin/g++' failed with exit code 1
Scene-Graph-Benchmark 运行报错
在运行程序之前,还要按照要求准备好数据集和相关文件,这里不再赘述
报错1
/home/ps/anaconda3/envs/sgbm/bin/python /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/my_relation_train_net.py
Traceback (most recent call last):
File "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/my_relation_train_net.py", line 8, in <module>
from maskrcnn_benchmark.utils.env import setup_environment # noqa F401 isort:skip
File "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/utils/env.py", line 4, in <module>
from maskrcnn_benchmark.utils.imports import import_file
File "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/utils/imports.py", line 4, in <module>
if torch._six.PY3:
AttributeError: module 'torch._six' has no attribute 'PY3'
解决1
找到"/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/utils/imports.py"把torch._six.PY3改成torch._six.PY37
报错2
File "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/modeling/rpn/rpn.py", line 178, in _forward_train
anchors, objectness, rpn_box_regression, targets
File "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/modeling/rpn/loss.py", line 106, in __call__
sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)
File "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/modeling/balanced_positive_negative_sampler.py", line 38, in __call__
positive = torch.nonzero(matched_idxs_per_image >= 1).squeeze(1)
RuntimeError: CUDA error: device-side assert triggered
解决2
现在代码的最开始加上
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
再次运行程序,就可以定位到真正的报错地方
File "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/modeling/rpn/loss.py", line 106, in __call__
sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)
File "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/modeling/balanced_positive_negative_sampler.py", line 53, in __call__
neg_idx_per_image = negative[perm2]
RuntimeError: CUDA error: device-side assert triggered
报错位置在这里,perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]生成perm2的过程有问题。
# randomly select positive and negative examples
perm1 = torch.randperm(positive.numel(), device=positive.device)[:num_pos]
perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]
pos_idx_per_image = positive[perm1]
neg_idx_per_image = negative[perm2]
将perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]改成如下过程问题就解决了。
# perm2 = torch.randperm(min(negative.numel(), 20485), device=negative.device)[:num_neg]
if negative.numel() > 20480:
perm2 = torch.randperm(negative.numel(), device='cpu').to(negative.device)[:num_neg]
else:
perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]
其他问题或解决方案欢迎在下面评论
训练过程中,博主使用的配置文件是e2e_relation_X_101_32_8_FPN_1x.yaml,在训练过程中,“MotifPredictor” 、“VCTreePredictor”、"TransformerPredictor"使用如下设置取得了较好的效果。TransformerPredictor一开始总是没有取得好结果,在TransformerPredictor的训练参数上摸索了比较久。
INPUT:
MIN_SIZE_TRAIN: (600,)
MAX_SIZE_TRAIN: 1000
MIN_SIZE_TEST: 600
MAX_SIZE_TEST: 1000
MODEL:
PRETRAINED_DETECTOR_CKPT: "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/checkpoints/pretrained_faster_rcnn/model_final.pth"
META_ARCHITECTURE: "GeneralizedRCNN"
WEIGHT: "catalog://ImageNetPretrained/FAIR/20171220/X-101-32x8d"
BACKBONE:
CONV_BODY: "R-101-FPN" # VGG-16
RESNETS:
BACKBONE_OUT_CHANNELS: 256
STRIDE_IN_1X1: False
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
RELATION_ON: True
ATTRIBUTE_ON: False
FLIP_AUG: False # if there is any left-right relation, FLIP AUG should be false
RPN:
USE_FPN: True
ANCHOR_SIZES: (32, 64, 128, 256, 512)
ANCHOR_STRIDE: (4, 8, 16, 32, 64)
ASPECT_RATIOS: (0.23232838, 0.63365731, 1.28478321, 3.15089189) # from neural-motifs
PRE_NMS_TOP_N_TRAIN: 6000
PRE_NMS_TOP_N_TEST: 6000
POST_NMS_TOP_N_TRAIN: 1000
POST_NMS_TOP_N_TEST: 1000
FPN_POST_NMS_TOP_N_TRAIN: 1000
FPN_POST_NMS_TOP_N_TEST: 1000
FPN_POST_NMS_PER_BATCH: False
RPN_MID_CHANNEL: 256
ROI_HEADS:
USE_FPN: True
POSITIVE_FRACTION: 0.5
BG_IOU_THRESHOLD: 0.3
BATCH_SIZE_PER_IMAGE: 256
DETECTIONS_PER_IMG: 80
NMS_FILTER_DUPLICATES: True
ROI_BOX_HEAD:
POOLER_RESOLUTION: 7
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
POOLER_SAMPLING_RATIO: 2
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
PREDICTOR: "FPNPredictor"
NUM_CLASSES: 151 # 151 for VG, 1201 for GQA
MLP_HEAD_DIM: 4096
ROI_ATTRIBUTE_HEAD:
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
PREDICTOR: "FPNPredictor"
USE_BINARY_LOSS: True # choose binary, because cross_entropy loss deteriorate the box head, even with 0.1 weight
POS_WEIGHT: 50.0
ATTRIBUTE_LOSS_WEIGHT: 1.0
NUM_ATTRIBUTES: 201 # 201 for VG, 501 for GQA
MAX_ATTRIBUTES: 10
ATTRIBUTE_BGFG_SAMPLE: True
ATTRIBUTE_BGFG_RATIO: 3
ROI_RELATION_HEAD:
USE_GT_BOX: False # for choose sgdet, sgcls, precls
USE_GT_OBJECT_LABEL: False # for choose sgdet, sgcls, precls
REQUIRE_BOX_OVERLAP: False # for sgdet, during training, only train pairs with overlap 重叠
ADD_GTBOX_TO_PROPOSAL_IN_TRAIN: True # for sgdet only, in case some gt boxes are missing
NUM_CLASSES: 51 # 51 for VG, 201 for GQA (not contain "to the left of" & "to the right of")
BATCH_SIZE_PER_IMAGE: 1024 # sample as much as possible
POSITIVE_FRACTION: 0.25
CONTEXT_POOLING_DIM: 4096
CONTEXT_HIDDEN_DIM: 512 #1024 for VCTree 512 for Others
POOLING_ALL_LEVELS: True
LABEL_SMOOTHING_LOSS: False
FEATURE_EXTRACTOR: "RelationFeatureExtractor"
#################### Select Relationship Model ####################
# PREDICTOR: "MotifPredictor"
# PREDICTOR: "VCTreePredictor"
PREDICTOR: "TransformerPredictor"
# PREDICTOR: "VtransePredictor"
# PREDICTOR: "CausalAnalysisPredictor"
################# Parameters for Motif Predictor ##################
CONTEXT_OBJ_LAYER: 1
CONTEXT_REL_LAYER: 1
############# Parameters for Causal Unbias Predictor ##############
### Implementation for paper "Unbiased Scene Graph Generation from Biased Training"
CAUSAL:
EFFECT_TYPE: 'none' # candicates: 'TDE', 'NIE', 'TE', 'none'
FUSION_TYPE: 'sum' # candicates: 'sum', 'gate'
SEPARATE_SPATIAL: False # separate spatial in union feature
CONTEXT_LAYER: "motifs" # candicates: motifs, vctree, vtranse
SPATIAL_FOR_VISION: True
EFFECT_ANALYSIS: True
############### Parameters for Transformer Predictor ##############
TRANSFORMER:
DROPOUT_RATE: 0.1
OBJ_LAYER: 4
REL_LAYER: 2
NUM_HEAD: 8
KEY_DIM: 64
VAL_DIM: 64
INNER_DIM: 2048
DATASETS:
TRAIN: ("VG_stanford_filtered_with_attribute_train",)
VAL: ("VG_stanford_filtered_with_attribute_val",)
TEST: ("VG_stanford_filtered_with_attribute_test",) # VG_stanford_filtered_with_attribute_test
DATALOADER:
SIZE_DIVISIBILITY: 32
SOLVER:
BIAS_LR_FACTOR: 2 # 2 for Transformer, 1 for Motif, VCTree
BASE_LR: 0.001 # 0.0001 for Transformer, 0.01 for Motif, VCTree
WARMUP_FACTOR: 0.01 # 0.01 for Transformer, 0.1 for Motif, VCTree
WEIGHT_DECAY: 0.0001 # 0.01 for Transformer, 0.0001 for Motif, VCTree
MOMENTUM: 0.9
GRAD_NORM_CLIP: 1.0 # 1 for Transformer, 5.0 for Motif, VCTree
STEPS: (8000, 12000, 16000) # (8000, 12000, 16000) for Transformer, (10000, 16000) for Motif, VCTree
MAX_ITER: 20000 # 20000 for Transformer, 40000 for Motif, VCTree
VAL_PERIOD: 2000
CHECKPOINT_PERIOD: 1000
PRINT_GRAD_FREQ: 1000 # 1000 for Transformer, 4000 for Motif, VCTree
IMS_PER_BATCH: 16 # 16 for Transformer, 64 for Motif, VCTree
PRE_VAL: False
SCHEDULE:
# the following paramters are only used for WarmupReduceLROnPlateau
TYPE: "WarmupMultiStepLR" # WarmupMultiStepLR for TransformerPredictor, WarmupReduceLROnPlateau for Motif, VCTree
PATIENCE: 2
THRESHOLD: 0.001
COOLDOWN: 0
FACTOR: 0.1
MAX_DECAY_STEP: 3
OUTPUT_DIR: '/home/ps/Disk2/sgbm_model'
TEST:
ALLOW_LOAD_FROM_CACHE: False
RELATION:
SYNC_GATHER: True # turn on will slow down the evaluation to solve the sgdet test out of memory problem
REQUIRE_OVERLAP: False
LATER_NMS_PREDICTION_THRES: 0.5
CUSTUM_EVAL: False # eval SGDet model on custum images, output a json
CUSTUM_PATH: '.' # the folder that contains the custum images, only jpg files are allowed
IMS_PER_BATCH: 8 # me
DTYPE: "float16" # me
GLOVE_DIR: '/home/ps/glove'
其中,重点需要关注的参数如下:
MODEL:
PRETRAINED_DETECTOR_CKPT:"/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/checkpoints/pretrained_faster_rcnn/model_final.pth"
ROI_RELATION_HEAD:
USE_GT_BOX: False # for choose sgdet, sgcls, precls
USE_GT_OBJECT_LABEL: False # for choose sgdet, sgcls, precls
REQUIRE_BOX_OVERLAP: False # for sgdet, during training, only train pairs with overlap 重叠
ADD_GTBOX_TO_PROPOSAL_IN_TRAIN: True # for sgdet only, in case some gt boxes are missing
# PREDICTOR: "MotifPredictor"
# PREDICTOR: "VCTreePredictor"
PREDICTOR: "TransformerPredictor"
# PREDICTOR: "VtransePredictor"
# PREDICTOR: "CausalAnalysisPredictor"
SOLVER:
BIAS_LR_FACTOR: 2 # 2 for Transformer, 1 for Motif, VCTree
BASE_LR: 0.001 # 0.0001 for Transformer, 0.01 for Motif, VCTree
WARMUP_FACTOR: 0.01 # 0.01 for Transformer, 0.1 for Motif, VCTree
WEIGHT_DECAY: 0.0001 # 0.01 for Transformer, 0.0001 for Motif, VCTree
MOMENTUM: 0.9
GRAD_NORM_CLIP: 1.0 # 1 for Transformer, 5.0 for Motif, VCTree
STEPS: (8000, 12000, 16000) # (8000, 12000, 16000) for Transformer, (10000, 16000) for Motif, VCTree
MAX_ITER: 20000 # 20000 for Transformer, 40000 for Motif, VCTree
VAL_PERIOD: 2000
CHECKPOINT_PERIOD: 1000
PRINT_GRAD_FREQ: 1000 # 1000 for Transformer, 4000 for Motif, VCTree
IMS_PER_BATCH: 16 # 16 for Transformer, 64 for Motif, VCTree
SCHEDULE:
# the following paramters are only used for WarmupReduceLROnPlateau
TYPE: "WarmupMultiStepLR" # WarmupMultiStepLR for TransformerPredictor, WarmupReduceLROnPlateau for Motif, VCTree
OUTPUT_DIR: '/home/ps/Disk2/sgbm_model'
TEST:
ALLOW_LOAD_FROM_CACHE: False
RELATION:
SYNC_GATHER: True # turn on will slow down the evaluation to solve the sgdet test out of memory problem
REQUIRE_OVERLAP: False
608

被折叠的 条评论
为什么被折叠?



