Scene-Graph-Benchmark环境配置过程(基于ubuntu18.04,cuda11.1,cudnn8.0.5,torch1.8.1)

部署运行你感兴趣的模型镜像

安装cuda11.1和cudnn8.0.5

由于已经2025年了,太旧的cuda适应不了新的显卡,太新的cuda搞不定旧的环境,踩了很多坑,采取这个方案,在ubuntu18.04上进行配置。cuda11.1和cudnn8.0.5不讲了,直接参考https://blog.csdn.net/m0_71087087/article/details/135828903

安装torch-1.8.1和torchvision-0.9.1

anaconda创建虚拟环境,不必须但是建议

conda create -n sgbm python=3.7

然后把下面的语句加入.bashrc文件的最后一行,这样打开新的终端会自动进入创建的虚拟环境中

conda activate sgbm

可以用迅雷把这两个包下载下来,使用离线安装的方法,会比较快

https://download.pytorch.org/whl/cu111/torch-1.8.1%2Bcu111-cp37-cp37m-linux_x86_64.whl
https://download.pytorch.org/whl/cu111/torchvision-0.9.1%2Bcu111-cp37-cp37m-linux_x86_64.whl

然后在离线安装包所在的目录运行下面的命令。

pip install torch-1.8.1+cu111-cp37-cp37m-linux_x86_64.whl torchvision-0.9.1+cu111-cp37-cp37m-linux_x86_64.whl torchaudio==0.8.1 torchtext==0.9.1  -f https://download.pytorch.org/whl/torch_stable.html -i https://pypi.tuna.tsinghua.edu.cn/simple

-i https://pypi.tuna.tsinghua.edu.cn/simple可以使用清华镜像源加速。
不想安装torchaudio==0.8.1 torchtext==0.9.1的可以直接把这部分删掉。

基本上到这里torch-1.8.1和torchvision-0.9.1环境就搞定了。

配置Scene-Graph-Benchmark

上面安装cuda11.1的配置里面好像没有设置CUDA_HOME,我.bashrc里面的CUDA相关配置是这样的

export PATH=/usr/local/cuda-11.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda-11.1 #/usr/local/cuda
  1. 安装依赖包
pip install ipython scipy h5py ninja yacs cython matplotlib tqdm opencv-python -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install overrides -i https://pypi.tuna.tsinghua.edu.cn/simple

别问为什么overrides要单独安装,不知道,反正放一起报错了。

  1. 安装cocoapi,这个网上很多细节教程,我直接使用的下面命令。
pip install pycocotools -i https://pypi.tuna.tsinghua.edu.cn/simple

喜欢细节的可以参考https://blog.csdn.net/gaoqing_dream163/article/details/112554621

安装apex

cd到准备放置apex的目录

git clone https://github.com/NVIDIA/apex.git
cd apex
git reset --hard 3fe10b5597ba14a748ebb271a6ab97c09c5701ac
python setup.py install --cuda_ext --cpp_ext

这时候可能会报错

  File "/home/ps/anaconda3/envs/sgbm/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1683, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

在终端运行下面命令打开错误所在文件。把1631行左右的的command = ['ninja', '-v']改成command = ['ninja', '--version']

gedit /home/ps/anaconda3/envs/sgbm/lib/python3.7/site-packages/torch/utils/cpp_extension.py

再次执行python setup.py install --cuda_ext --cpp_ext就安装成功了。

到这里环境配置就完成了

Scene-Graph-Benchmark 编译安装

cd到你希望放置这个工程的目录,然后运行。

git clone https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch.git
cd scene-graph-benchmark
python setup.py build develop

执行这一步有人会报错,这个错我在其中一台电脑上解决了,当时忘记记录,在另外一台电脑没解决。就把之前电脑行的那个/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc下面的东西全部拷贝到另一个电脑对应的目录下了,然后再执行 python setup.py build develop就成功了。知道怎么解决的同学欢迎评论,我将十分感谢。

g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cpu/ROIAlign_cpu.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cpu/nms_cpu.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cuda/ROIAlign_cuda.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cuda/ROIPool_cuda.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cuda/SigmoidFocalLoss_cuda.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cuda/deform_conv_cuda.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cuda/deform_conv_kernel_cuda.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cuda/deform_pool_cuda.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cuda/deform_pool_kernel_cuda.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/cuda/nms.o: 没有那个文件或目录
g++: error: /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/build/temp.linux-x86_64-cpython-37/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/csrc/vision.o: 没有那个文件或目录
error: command '/usr/bin/g++' failed with exit code 1

Scene-Graph-Benchmark 运行报错

在运行程序之前,还要按照要求准备好数据集和相关文件,这里不再赘述

报错1

/home/ps/anaconda3/envs/sgbm/bin/python /home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/my_relation_train_net.py 
Traceback (most recent call last):
  File "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/my_relation_train_net.py", line 8, in <module>
    from maskrcnn_benchmark.utils.env import setup_environment  # noqa F401 isort:skip
  File "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/utils/env.py", line 4, in <module>
    from maskrcnn_benchmark.utils.imports import import_file
  File "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/utils/imports.py", line 4, in <module>
    if torch._six.PY3:
AttributeError: module 'torch._six' has no attribute 'PY3'

解决1
找到"/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/utils/imports.py"把torch._six.PY3改成torch._six.PY37

报错2

  File "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/modeling/rpn/rpn.py", line 178, in _forward_train
    anchors, objectness, rpn_box_regression, targets
  File "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/modeling/rpn/loss.py", line 106, in __call__
    sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)
  File "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/modeling/balanced_positive_negative_sampler.py", line 38, in __call__
    positive = torch.nonzero(matched_idxs_per_image >= 1).squeeze(1)
RuntimeError: CUDA error: device-side assert triggered

解决2
现在代码的最开始加上

import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

再次运行程序,就可以定位到真正的报错地方

  File "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/modeling/rpn/loss.py", line 106, in __call__
    sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)
  File "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/maskrcnn_benchmark/modeling/balanced_positive_negative_sampler.py", line 53, in __call__
    neg_idx_per_image = negative[perm2]
RuntimeError: CUDA error: device-side assert triggered

报错位置在这里,perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]生成perm2的过程有问题。

# randomly select positive and negative examples
perm1 = torch.randperm(positive.numel(), device=positive.device)[:num_pos]
perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]

pos_idx_per_image = positive[perm1]
neg_idx_per_image = negative[perm2]

perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]改成如下过程问题就解决了。

# perm2 = torch.randperm(min(negative.numel(), 20485), device=negative.device)[:num_neg]
if negative.numel() > 20480:
    perm2 = torch.randperm(negative.numel(), device='cpu').to(negative.device)[:num_neg]
else:
    perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]

其他问题或解决方案欢迎在下面评论

训练过程中,博主使用的配置文件是e2e_relation_X_101_32_8_FPN_1x.yaml,在训练过程中,“MotifPredictor” 、“VCTreePredictor”、"TransformerPredictor"使用如下设置取得了较好的效果。TransformerPredictor一开始总是没有取得好结果,在TransformerPredictor的训练参数上摸索了比较久。

INPUT:
  MIN_SIZE_TRAIN: (600,)
  MAX_SIZE_TRAIN: 1000
  MIN_SIZE_TEST: 600
  MAX_SIZE_TEST: 1000
MODEL:
  PRETRAINED_DETECTOR_CKPT: "/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/checkpoints/pretrained_faster_rcnn/model_final.pth"
  META_ARCHITECTURE: "GeneralizedRCNN"
  WEIGHT: "catalog://ImageNetPretrained/FAIR/20171220/X-101-32x8d"
  BACKBONE:
    CONV_BODY: "R-101-FPN" # VGG-16
  RESNETS:
    BACKBONE_OUT_CHANNELS: 256
    STRIDE_IN_1X1: False
    NUM_GROUPS: 32
    WIDTH_PER_GROUP: 8
  RELATION_ON: True
  ATTRIBUTE_ON: False
  FLIP_AUG: False            # if there is any left-right relation, FLIP AUG should be false
  RPN:
    USE_FPN: True
    ANCHOR_SIZES: (32, 64, 128, 256, 512)
    ANCHOR_STRIDE: (4, 8, 16, 32, 64)
    ASPECT_RATIOS: (0.23232838, 0.63365731, 1.28478321, 3.15089189)   # from neural-motifs
    PRE_NMS_TOP_N_TRAIN: 6000
    PRE_NMS_TOP_N_TEST: 6000
    POST_NMS_TOP_N_TRAIN: 1000
    POST_NMS_TOP_N_TEST: 1000
    FPN_POST_NMS_TOP_N_TRAIN: 1000
    FPN_POST_NMS_TOP_N_TEST: 1000
    FPN_POST_NMS_PER_BATCH: False
    RPN_MID_CHANNEL: 256
  ROI_HEADS:
    USE_FPN: True
    POSITIVE_FRACTION: 0.5
    BG_IOU_THRESHOLD: 0.3
    BATCH_SIZE_PER_IMAGE: 256
    DETECTIONS_PER_IMG: 80
    NMS_FILTER_DUPLICATES: True
  ROI_BOX_HEAD:
    POOLER_RESOLUTION: 7
    POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
    POOLER_SAMPLING_RATIO: 2
    FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
    PREDICTOR: "FPNPredictor"
    NUM_CLASSES: 151                # 151 for VG, 1201 for GQA
    MLP_HEAD_DIM: 4096
  ROI_ATTRIBUTE_HEAD:
    FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
    PREDICTOR: "FPNPredictor"
    USE_BINARY_LOSS: True           # choose binary, because cross_entropy loss deteriorate the box head, even with 0.1 weight
    POS_WEIGHT: 50.0
    ATTRIBUTE_LOSS_WEIGHT: 1.0
    NUM_ATTRIBUTES: 201             # 201 for VG, 501 for GQA
    MAX_ATTRIBUTES: 10             
    ATTRIBUTE_BGFG_SAMPLE: True    
    ATTRIBUTE_BGFG_RATIO: 3        
  ROI_RELATION_HEAD:
    USE_GT_BOX: False                       # for choose sgdet, sgcls, precls
    USE_GT_OBJECT_LABEL: False              # for choose sgdet, sgcls, precls
    REQUIRE_BOX_OVERLAP: False              # for sgdet, during training, only train pairs with overlap 重叠
    ADD_GTBOX_TO_PROPOSAL_IN_TRAIN: True    # for sgdet only, in case some gt boxes are missing
    NUM_CLASSES: 51                 # 51 for VG, 201 for GQA (not contain "to the left of" & "to the right of")
    BATCH_SIZE_PER_IMAGE: 1024      # sample as much as possible
    POSITIVE_FRACTION: 0.25
    CONTEXT_POOLING_DIM: 4096
    CONTEXT_HIDDEN_DIM: 512         #1024 for VCTree 512 for Others
    POOLING_ALL_LEVELS: True
    LABEL_SMOOTHING_LOSS: False
    FEATURE_EXTRACTOR: "RelationFeatureExtractor"
    #################### Select Relationship Model ####################
#    PREDICTOR: "MotifPredictor"
#    PREDICTOR: "VCTreePredictor"
    PREDICTOR: "TransformerPredictor"
#    PREDICTOR: "VtransePredictor"
#    PREDICTOR: "CausalAnalysisPredictor"
    ################# Parameters for Motif Predictor ##################
    CONTEXT_OBJ_LAYER: 1
    CONTEXT_REL_LAYER: 1
    ############# Parameters for Causal Unbias Predictor ##############
    ### Implementation for paper "Unbiased Scene Graph Generation from Biased Training"
    CAUSAL:
      EFFECT_TYPE: 'none'             # candicates: 'TDE', 'NIE', 'TE', 'none'
      FUSION_TYPE: 'sum'              # candicates: 'sum', 'gate'
      SEPARATE_SPATIAL: False         # separate spatial in union feature
      CONTEXT_LAYER: "motifs"         # candicates: motifs, vctree, vtranse
      SPATIAL_FOR_VISION: True
      EFFECT_ANALYSIS: True
    ############### Parameters for Transformer Predictor ##############
    TRANSFORMER:
      DROPOUT_RATE: 0.1
      OBJ_LAYER: 4
      REL_LAYER: 2
      NUM_HEAD: 8
      KEY_DIM: 64
      VAL_DIM: 64
      INNER_DIM: 2048
DATASETS:
  TRAIN: ("VG_stanford_filtered_with_attribute_train",)
  VAL: ("VG_stanford_filtered_with_attribute_val",)
  TEST: ("VG_stanford_filtered_with_attribute_test",) # VG_stanford_filtered_with_attribute_test
DATALOADER:
  SIZE_DIVISIBILITY: 32
SOLVER:
  BIAS_LR_FACTOR: 2     # 2 for Transformer, 1 for Motif, VCTree
  BASE_LR: 0.001        # 0.0001 for Transformer, 0.01 for Motif, VCTree
  WARMUP_FACTOR: 0.01    # 0.01 for Transformer, 0.1 for Motif, VCTree
  WEIGHT_DECAY: 0.0001  # 0.01 for Transformer, 0.0001 for Motif, VCTree
  MOMENTUM: 0.9
  GRAD_NORM_CLIP: 1.0   # 1 for Transformer, 5.0 for Motif, VCTree
  STEPS: (8000, 12000, 16000) # (8000, 12000, 16000) for Transformer, (10000, 16000) for Motif, VCTree
  MAX_ITER: 20000       # 20000 for Transformer, 40000 for Motif, VCTree
  VAL_PERIOD: 2000
  CHECKPOINT_PERIOD: 1000
  PRINT_GRAD_FREQ: 1000 # 1000 for Transformer, 4000 for Motif, VCTree
  IMS_PER_BATCH: 16     # 16 for Transformer, 64 for Motif, VCTree
  PRE_VAL: False
  SCHEDULE:
    # the following paramters are only used for WarmupReduceLROnPlateau
    TYPE: "WarmupMultiStepLR"    # WarmupMultiStepLR for TransformerPredictor, WarmupReduceLROnPlateau for Motif, VCTree
    PATIENCE: 2
    THRESHOLD: 0.001
    COOLDOWN: 0
    FACTOR: 0.1
    MAX_DECAY_STEP: 3
OUTPUT_DIR: '/home/ps/Disk2/sgbm_model'
TEST:
  ALLOW_LOAD_FROM_CACHE: False
  RELATION:
    SYNC_GATHER: True      # turn on will slow down the evaluation to solve the sgdet test out of memory problem
    REQUIRE_OVERLAP: False
    LATER_NMS_PREDICTION_THRES: 0.5
  CUSTUM_EVAL: False       # eval SGDet model on custum images, output a json
  CUSTUM_PATH: '.'         # the folder that contains the custum images, only jpg files are allowed
  IMS_PER_BATCH: 8  # me
DTYPE: "float16"  # me
GLOVE_DIR: '/home/ps/glove'

其中,重点需要关注的参数如下:

MODEL:
  PRETRAINED_DETECTOR_CKPT:"/home/ps/MyProject/Scene-Graph-Benchmark.pytorch-master/checkpoints/pretrained_faster_rcnn/model_final.pth"

ROI_RELATION_HEAD:
    USE_GT_BOX: False                       # for choose sgdet, sgcls, precls
    USE_GT_OBJECT_LABEL: False              # for choose sgdet, sgcls, precls
    REQUIRE_BOX_OVERLAP: False              # for sgdet, during training, only train pairs with overlap 重叠
    ADD_GTBOX_TO_PROPOSAL_IN_TRAIN: True    # for sgdet only, in case some gt boxes are missing
    
#    PREDICTOR: "MotifPredictor"
#    PREDICTOR: "VCTreePredictor"
    PREDICTOR: "TransformerPredictor"
#    PREDICTOR: "VtransePredictor"
#    PREDICTOR: "CausalAnalysisPredictor"

SOLVER:
  BIAS_LR_FACTOR: 2     # 2 for Transformer, 1 for Motif, VCTree
  BASE_LR: 0.001        # 0.0001 for Transformer, 0.01 for Motif, VCTree
  WARMUP_FACTOR: 0.01    # 0.01 for Transformer, 0.1 for Motif, VCTree
  WEIGHT_DECAY: 0.0001  # 0.01 for Transformer, 0.0001 for Motif, VCTree
  MOMENTUM: 0.9
  GRAD_NORM_CLIP: 1.0   # 1 for Transformer, 5.0 for Motif, VCTree
  STEPS: (8000, 12000, 16000) # (8000, 12000, 16000) for Transformer, (10000, 16000) for Motif, VCTree
  MAX_ITER: 20000       # 20000 for Transformer, 40000 for Motif, VCTree
  VAL_PERIOD: 2000
  CHECKPOINT_PERIOD: 1000
  PRINT_GRAD_FREQ: 1000 # 1000 for Transformer, 4000 for Motif, VCTree
  IMS_PER_BATCH: 16     # 16 for Transformer, 64 for Motif, VCTree
  SCHEDULE:
    # the following paramters are only used for WarmupReduceLROnPlateau
    TYPE: "WarmupMultiStepLR"    # WarmupMultiStepLR for TransformerPredictor, WarmupReduceLROnPlateau for Motif, VCTree

OUTPUT_DIR: '/home/ps/Disk2/sgbm_model'
TEST:
  ALLOW_LOAD_FROM_CACHE: False
  RELATION:
    SYNC_GATHER: True      # turn on will slow down the evaluation to solve the sgdet test out of memory problem
    REQUIRE_OVERLAP: False

博主训练好的模型可在此处下载,所得指标与其他论文不完全一致

您可能感兴趣的与本文相关的镜像

PyTorch 2.5

PyTorch 2.5

PyTorch
Cuda

PyTorch 是一个开源的 Python 机器学习库,基于 Torch 库,底层由 C++ 实现,应用于人工智能领域,如计算机视觉和自然语言处理

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值