ppyoloe问题

07/19 22:35:09 - mmengine - INFO - Epoch(train)  [4][ 60/580]  base_lr: 2.5000e-04 lr: 1.5509e-04  eta: 4 days, 0:
55:13  time: 6.2245  data_time: 0.4387  memory: 1211  loss: 4.0579  loss_cls: 0.9532  loss_state: 1.2636  loss_bbo
x: 0.3440  loss_dfl: 1.4971
07/19 22:36:52 - mmengine - INFO - Epoch(train)  [4][ 70/580]  base_lr: 2.5000e-04 lr: 1.5595e-04  eta: 4 days, 1:
04:14  time: 7.6080  data_time: 0.4719  memory: 5738  loss: 4.0051  loss_cls: 0.9280  loss_state: 1.2617  loss_bbo
x: 0.3368  loss_dfl: 1.4786
07/19 22:37:58 - mmengine - INFO - Epoch(train)  [4][ 80/580]  base_lr: 2.5000e-04 lr: 1.5681e-04  eta: 4 days, 0:
58:00  time: 8.1634  data_time: 0.0477  memory: 1211  loss: 4.0580  loss_cls: 0.9506  loss_state: 1.2706  loss_bbo
x: 0.3783  loss_dfl: 1.4584
07/19 22:39:41 - mmengine - INFO - Epoch(train)  [4][ 90/580]  base_lr: 2.5000e-04 lr: 1.5767e-04  eta: 4 days, 1:
06:24  time: 9.7714  data_time: 0.0519  memory: 1211  loss: 4.0740  loss_cls: 0.9612  loss_state: 1.2761  loss_bbo
x: 0.3790  loss_dfl: 1.4578
07/19 22:41:32 - mmengine - INFO - Epoch(train)  [4][100/580]  base_lr: 2.5000e-04 lr: 1.5853e-04  eta: 4 days, 1:
18:20  time: 10.1447  data_time: 0.0393  memory: 1299  loss: 4.0076  loss_cls: 0.9446  loss_state: 1.2327  loss_bb
ox: 0.3803  loss_dfl: 1.4501
07/19 22:43:03 - mmengine - INFO - Epoch(train)  [4][110/580]  base_lr: 2.5000e-04 lr: 1.5940e-04  eta: 4 days, 1:
21:53  time: 9.4774  data_time: 0.0465  memory: 1299  loss: 4.0742  loss_cls: 0.9750  loss_state: 1.2301  loss_bbo
x: 0.4177  loss_dfl: 1.4514
07/19 22:43:44 - mmengine - INFO - Epoch(train)  [4][120/580]  base_lr: 2.5000e-04 lr: 1.6026e-04  eta: 4 days, 1:
05:55  time: 8.2485  data_time: 0.0239  memory: 1123  loss: 4.2037  loss_cls: 1.0055  loss_state: 1.2765  loss_bbo
x: 0.4376  loss_dfl: 1.4840
07/19 22:44:31 - mmengine - INFO - Epoch(train)  [4][130/580]  base_lr: 2.5000e-04 lr: 1.6112e-04  eta: 4 days, 0:
52:08  time: 7.8639  data_time: 0.0401  memory: 1299  loss: 4.1204  loss_cls: 0.9918  loss_state: 1.2751  loss_bbo
x: 0.4120  loss_dfl: 1.4415
07/19 22:45:00 - mmengine - INFO - Epoch(train)  [4][140/580]  base_lr: 2.5000e-04 lr: 1.6198e-04  eta: 4 days, 0:
31:01  time: 6.3823  data_time: 0.0468  memory: 1299  loss: 4.1752  loss_cls: 1.0122  loss_state: 1.2948  loss_bbo
x: 0.4150  loss_dfl: 1.4531
07/19 22:45:53 - mmengine - INFO - Epoch(train)  [4][150/580]  base_lr: 2.5000e-04 lr: 1.6284e-04  eta: 4 days, 0:
20:00  time: 5.2226  data_time: 0.0467  memory: 1211  loss: 4.1133  loss_cls: 0.9960  loss_state: 1.2773  loss_bbo
x: 0.3973  loss_dfl: 1.4427
07/19 22:46:23 - mmengine - INFO - Epoch(train)  [4][160/580]  base_lr: 2.5000e-04 lr: 1.6371e-04  eta: 3 days, 23
:59:52  time: 4.0014  data_time: 0.0396  memory: 1299  loss: 4.1325  loss_cls: 0.9886  loss_state: 1.2972  loss_bb
ox: 0.3905  loss_dfl: 1.4561
07/19 22:47:54 - mmengine - INFO - Epoch(train)  [4][170/580]  base_lr: 2.5000e-04 lr: 1.6457e-04  eta: 4 days, 0:
04:03  time: 4.9978  data_time: 0.0289  memory: 1123  loss: 4.0423  loss_cls: 0.9726  loss_state: 1.2667  loss_bbo
x: 0.3852  loss_dfl: 1.4178
07/19 22:49:26 - mmengine - INFO - Epoch(train)  [4][180/580]  base_lr: 2.5000e-04 lr: 1.6543e-04  eta: 4 days, 0:
07:57  time: 5.8808  data_time: 0.0185  memory: 1299  loss: 4.1775  loss_cls: 1.0039  loss_state: 1.2959  loss_bbo
x: 0.3878  loss_dfl: 1.4899
07/19 22:49:50 - mmengine - INFO - Epoch(train)  [4][190/580]  base_lr: 2.5000e-04 lr: 1.6629e-04  eta: 3 days, 23
:46:10  time: 5.8069  data_time: 0.0079  memory: 1211  loss: 4.1025  loss_cls: 0.9986  loss_state: 1.2759  loss_bb
ox: 0.3874  loss_dfl: 1.4406
07/19 22:51:30 - mmengine - INFO - Epoch(train)  [4][200/580]  base_lr: 2.5000e-04 lr: 1.6716e-04  eta: 3 days, 23
:53:28  time: 6.7419  data_time: 0.0106  memory: 1299  loss: 4.0270  loss_cls: 0.9840  loss_state: 1.2463  loss_bb
ox: 0.3891  loss_dfl: 1.4076
07/19 23:02:37 - mmengine - INFO - Epoch(train)  [4][210/580]  base_lr: 2.5000e-04 lr: 1.6802e-04  eta: 4 days, 3:
35:58  time: 19.4827  data_time: 0.0172  memory: 1299  loss: 3.9365  loss_cls: 0.9562  loss_state: 1.1972  loss_bb
ox: 0.3909  loss_dfl: 1.3922
07/19 23:02:56 - mmengine - INFO - Epoch(train)  [4][220/580]  base_lr: 2.5000e-04 lr: 1.6888e-04  eta: 4 days, 3:
11:23  time: 18.0313  data_time: 0.0172  memory: 1123  loss: 3.9469  loss_cls: 0.9466  loss_state: 1.1939  loss_bb
ox: 0.3961  loss_dfl: 1.4104
07/19 23:04:27 - mmengine - INFO - Epoch(train)  [4][230/580]  base_lr: 2.5000e-04 lr: 1.6974e-04  eta: 4 days, 3:
14:05  time: 18.0290  data_time: 0.0515  memory: 1809  loss: 3.8146  loss_cls: 0.8968  loss_state: 1.1688  loss_bb
ox: 0.3726  loss_dfl: 1.3764
07/19 23:06:09 - mmengine - INFO - Epoch(train)  [4][240/580]  base_lr: 2.5000e-04 lr: 1.7060e-04  eta: 4 days, 3:
20:38  time: 19.5705  data_time: 0.0513  memory: 1299  loss: 3.8529  loss_cls: 0.8854  loss_state: 1.1818  loss_bb
ox: 0.3745  loss_dfl: 1.4112
07/19 23:07:30 - mmengine - INFO - Epoch(train)  [4][250/580]  base_lr: 2.5000e-04 lr: 1.7147e-04  eta: 4 days, 3:
19:33  time: 19.1939  data_time: 0.0699  memory: 1123  loss: 3.9280  loss_cls: 0.9007  loss_state: 1.2024  loss_bb
ox: 0.3729  loss_dfl: 1.4521
07/19 23:09:59 - mmengine - INFO - Exp name: ppyoloe_plus_s_fast_8xb8-80e_coco_20240719_184020
07/19 23:09:59 - mmengine - INFO - Epoch(train)  [4][260/580]  base_lr: 2.5000e-04 lr: 1.7233e-04  eta: 4 days, 3:
43:49  time: 8.8543  data_time: 0.0840  memory: 1299  loss: 4.0175  loss_cls: 0.9366  loss_state: 1.2415  loss_bbo
x: 0.3666  loss_dfl: 1.4728
07/19 23:10:29 - mmengine - INFO - Epoch(train)  [4][270/580]  base_lr: 2.5000e-04 lr: 1.7319e-04  eta: 4 days, 3:
23:24  time: 9.0525  data_time: 0.0855  memory: 1125  loss: 3.9124  loss_cls: 0.9142  loss_state: 1.1960  loss_bbo
x: 0.3407  loss_dfl: 1.4615
07/19 23:11:15 - mmengine - INFO - Epoch(train)  [4][280/580]  base_lr: 2.5000e-04 lr: 1.7405e-04  eta: 4 days, 3:
09:43  time: 8.1679  data_time: 0.0544  memory: 1299  loss: 3.9072  loss_cls: 0.9227  loss_state: 1.1882  loss_bbo
x: 0.3505  loss_dfl: 1.4458
07/19 23:12:09 - mmengine - INFO - Epoch(train)  [4][290/580]  base_lr: 2.5000e-04 lr: 1.7491e-04  eta: 4 days, 2:
58:39  time: 7.2122  data_time: 0.0544  memory: 1139  loss: 3.8454  loss_cls: 0.9033  loss_state: 1.1660  loss_bbo
x: 0.3427  loss_dfl: 1.4334
07/19 23:14:07 - mmengine - INFO - Epoch(train)  [4][300/580]  base_lr: 2.5000e-04 lr: 1.7578e-04  eta: 4 days, 3:
10:56  time: 7.9462  data_time: 0.0460  memory: 1299  loss: 3.7560  loss_cls: 0.8613  loss_state: 1.1528  loss_bbo
x: 0.3260  loss_dfl: 1.4159
07/19 23:16:20 - mmengine - INFO - Epoch(train)  [4][310/580]  base_lr: 2.5000e-04 lr: 1.7664e-04  eta: 4 days, 3:
28:22  time: 7.6028  data_time: 0.0254  memory: 1211  loss: 3.7026  loss_cls: 0.8481  loss_state: 1.1537  loss_bbo
x: 0.3134  loss_dfl: 1.3874
07/19 23:16:56 - mmengine - INFO - Epoch(train)  [4][320/580]  base_lr: 2.5000e-04 lr: 1.7750e-04  eta: 4 days, 3:
10:56  time: 7.7392  data_time: 0.0238  memory: 1299  loss: 3.7546  loss_cls: 0.8488  loss_state: 1.1982  loss_bbo
x: 0.3319  loss_dfl: 1.3758
07/19 23:18:29 - mmengine - INFO - Epoch(train)  [4][330/580]  base_lr: 2.5000e-04 lr: 1.7836e-04  eta: 4 days, 3:
14:21  time: 8.6788  data_time: 0.0149  memory: 963  loss: 3.8670  loss_cls: 0.8823  loss_state: 1.2236  loss_bbox
: 0.3470  loss_dfl: 1.4140
07/19 23:19:29 - mmengine - INFO - Epoch(train)  [4][340/580]  base_lr: 2.5000e-04 lr: 1.7922e-04  eta: 4 days, 3:
05:36  time: 8.7994  data_time: 0.0149  memory: 1044  loss: 3.8786  loss_cls: 0.8796  loss_state: 1.2530  loss_bbo
x: 0.3386  loss_dfl: 1.4073
07/19 23:22:00 - mmengine - INFO - Epoch(train)  [4][350/580]  base_lr: 2.5000e-04 lr: 1.8009e-04  eta: 4 days, 3:
29:13  time: 9.4643  data_time: 0.0053  memory: 1299  loss: 4.1052  loss_cls: 0.9432  loss_state: 1.3361  loss_bbo
x: 0.3719  loss_dfl: 1.4541
07/19 23:22:51 - mmengine - INFO - Epoch(train)  [4][360/580]  base_lr: 2.5000e-04 lr: 1.8095e-04  eta: 4 days, 3:
17:08  time: 7.8188  data_time: 0.0102  memory: 963  loss: 4.0853  loss_cls: 0.9371  loss_state: 1.3109  loss_bbox
: 0.3897  loss_dfl: 1.4476
07/19 23:25:08 - mmengine - INFO - Epoch(train)  [4][370/580]  base_lr: 2.5000e-04 lr: 1.8181e-04  eta: 4 days, 3:
35:30  time: 9.8407  data_time: 0.0102  memory: 1299  loss: 4.1099  loss_cls: 0.9707  loss_state: 1.3064  loss_bbo
x: 0.3791  loss_dfl: 1.4537
07/19 23:26:01 - mmengine - INFO - Epoch(train)  [4][380/580]  base_lr: 2.5000e-04 lr: 1.8267e-04  eta: 4 days, 3:
24:44  time: 9.0422  data_time: 0.0134  memory: 1419  loss: 3.9697  loss_cls: 0.9292  loss_state: 1.2594  loss_bbo
x: 0.3538  loss_dfl: 1.4273
[E ProcessGroupNCCL.cpp:587] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(OpType=_ALLGATHER_BAS
E, Timeout(ms)=1800000) ran for 1807354 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(OpType=_ALLGATHER_BAS
E, Timeout(ms)=1800000) ran for 1807440 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(OpType=_ALLGATHER_BAS
E, Timeout(ms)=1800000) ran for 1807458 milliseconds before timing out.
Traceback (most recent call last):
  File "./tools/train.py", line 126, in <module>
Traceback (most recent call last):
  File "./tools/train.py", line 126, in <module>
    main()
  File "./tools/train.py", line 122, in main
    runner.train()
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1746, in train
    main()
  File "./tools/train.py", line 122, in main
    runner.train()
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1746, in train
    model = self.train_loop.run()  # type: ignore
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run
    self.run_epoch()
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
    model = self.train_loop.run()  # type: ignore
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run
    self.run_iter(idx, data_batch)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter
    self.run_epoch()
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
    outputs = self.runner.model.train_step(
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 121
, in train_step
    self.run_iter(idx, data_batch)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter
    losses = self._run_forward(data, mode='loss')
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 161
, in _run_forward
    outputs = self.runner.model.train_step(
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 121
, in train_step
    results = self(**data, mode=mode)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    losses = self._run_forward(data, mode='loss')
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 161
, in _run_forward
    results = self(**data, mode=mode)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 886, in f
orward
    return forward_call(*input, **kwargs)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 886, in f
orward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/hl/mi/mmdet/models/detectors/base.py", line 92, in forward
    return forward_call(*input, **kwargs)
  File "/home/lsw/hl/mi/mmdet/models/detectors/base.py", line 92, in forward
    return self.loss(inputs, data_samples)
      File "/home/lsw/hl/mi/mmdet/models/detectors/single_stage.py", line 77, in loss
return self.loss(inputs, data_samples)
  File "/home/lsw/hl/mi/mmdet/models/detectors/single_stage.py", line 77, in loss
    x = self.extract_feat(batch_inputs)
  File "/home/lsw/hl/mi/mmdet/models/detectors/single_stage.py", line 157, in extract_feat
    x = self.extract_feat(batch_inputs)
  File "/home/lsw/hl/mi/mmdet/models/detectors/single_stage.py", line 157, in extract_feat
    x = self.backbone(batch_inputs)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    x = self.backbone(batch_inputs)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/hl/mi/mmyolo/models/backbones/base_backbone.py", line 221, in forward
    x = layer(x)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/hl/mi/mmyolo/models/backbones/base_backbone.py", line 221, in forward
        x = layer(x)return forward_call(*input, **kwargs)

  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forw
ard
        input = module(input)return forward_call(*input, **kwargs)

  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forw
ard
    input = module(input)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/hl/mi/mmyolo/models/layers/yolo_bricks.py", line 1295, in forward
    return forward_call(*input, **kwargs)
  File "/home/lsw/hl/mi/mmyolo/models/layers/yolo_bricks.py", line 1295, in forward
    y2 = self.blocks(self.conv2(x))
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    y2 = self.blocks(self.conv2(x))
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forw
ard
    input = module(input)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forw
ard
    input = module(input)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/hl/mi/mmyolo/models/layers/yolo_bricks.py", line 1154, in forward
    return forward_call(*input, **kwargs)
  File "/home/lsw/hl/mi/mmyolo/models/layers/yolo_bricks.py", line 1154, in forward
    y = self.conv2(y)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/hl/mi/mmyolo/models/layers/yolo_bricks.py", line 252, in forward
        y = self.conv2(y)self.rbr_dense(inputs) +

  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmcv/cnn/bricks/conv_module.py", line 281, in for
ward
    return forward_call(*input, **kwargs)
  File "/home/lsw/hl/mi/mmyolo/models/layers/yolo_bricks.py", line 252, in forward
    self.rbr_dense(inputs) +
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    x = self.norm(x)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 749, in forw
ard
    return forward_call(*input, **kwargs)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmcv/cnn/bricks/conv_module.py", line 281, in for
ward
    x = self.norm(x)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 749, in forw
ard
    return sync_batch_norm.apply(
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/_functions.py", line 42, in forw
ard
    return sync_batch_norm.apply(
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/_functions.py", line 42, in forw
ard
    dist._all_gather_base(combined_flat, combined, process_group, async_op=False)
    dist._all_gather_base(combined_flat, combined, process_group, async_op=False)  File "/home/lsw/miniconda3/envs
/mi/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2070, in _all_gather_base

  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2070
, in _all_gather_base
    work = group._allgather_base(output_tensor, input_tensor)
RuntimeError: NCCL communicator was aborted on rank 0.  Original reason for failure was: [Rank 0] Watchdog caught 
collective operation timeout: WorkNCCL(OpType=_ALLGATHER_BASE, Timeout(ms)=1800000) ran for 1807354 milliseconds b
efore timing out.
    work = group._allgather_base(output_tensor, input_tensor)
RuntimeError: NCCL communicator was aborted on rank 3.  Original reason for failure was: [Rank 3] Watchdog caught 
collective operation timeout: WorkNCCL(OpType=_ALLGATHER_BASE, Timeout(ms)=1800000) ran for 1807440 milliseconds b
efore timing out.
Traceback (most recent call last):
  File "./tools/train.py", line 126, in <module>
    main()
  File "./tools/train.py", line 122, in main
    runner.train()
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1746, in train
    model = self.train_loop.run()  # type: ignore
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run
    self.run_epoch()
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
    self.run_iter(idx, data_batch)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter
    outputs = self.runner.model.train_step(
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 121
, in train_step
    losses = self._run_forward(data, mode='loss')
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 161
, in _run_forward
    results = self(**data, mode=mode)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 886, in f
orward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/hl/mi/mmdet/models/detectors/base.py", line 92, in forward
    return self.loss(inputs, data_samples)
  File "/home/lsw/hl/mi/mmdet/models/detectors/single_stage.py", line 77, in loss
    x = self.extract_feat(batch_inputs)
  File "/home/lsw/hl/mi/mmdet/models/detectors/single_stage.py", line 157, in extract_feat
    x = self.backbone(batch_inputs)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/hl/mi/mmyolo/models/backbones/base_backbone.py", line 221, in forward
    x = layer(x)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forw
ard
    input = module(input)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/hl/mi/mmyolo/models/layers/yolo_bricks.py", line 1295, in forward
    y2 = self.blocks(self.conv2(x))
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forw
ard
    input = module(input)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/hl/mi/mmyolo/models/layers/yolo_bricks.py", line 1154, in forward
    y = self.conv2(y)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/hl/mi/mmyolo/models/layers/yolo_bricks.py", line 252, in forward
    self.rbr_dense(inputs) +
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmcv/cnn/bricks/conv_module.py", line 281, in for
ward
    x = self.norm(x)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_
impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 749, in forw
ard
    return sync_batch_norm.apply(
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/_functions.py", line 42, in forw
ard
    dist._all_gather_base(combined_flat, combined, process_group, async_op=False)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2070
, in _all_gather_base
    work = group._allgather_base(output_tensor, input_tensor)
RuntimeError: NCCL communicator was aborted on rank 2.  Original reason for failure was: [Rank 2] Watchdog caught 
collective operation timeout: WorkNCCL(OpType=_ALLGATHER_BASE, Timeout(ms)=1800000) ran for 1807458 milliseconds b
efore timing out.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA
 kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are ta
king the entire process down.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 13725 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 13724) of binary: /ho
me/lsw/miniconda3/envs/mi/bin/python
Traceback (most recent call last):
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in <modul
e>
    main()
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/distributed/run.py", line 710, in run
    elastic_launch(
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in 
__call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 259, in 
launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
./tools/train.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2024-07-19_23:57:21
  host      : lswPlus
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 13726)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2024-07-19_23:57:21
  host      : lswPlus
  rank      : 3 (local_rank: 3)
  exitcode  : -6 (pid: 13727)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 13727
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-07-19_23:57:21
  host      : lswPlus
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 13724)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

dist.init_process_group(backend='nccl', init_method='env://', timeout=datetime.timedelta(seconds=5400))

07/20 17:37:28 - mmengine - INFO - Epoch(train)  [3][2460/4634]  base_lr: 1.2500e-04 lr: 6.3266e-05  eta: 15:32:14  time: 0.1281  data_time: 0.0223  memory: 654  loss: 4.5739  loss_cls: 1.0988  loss_state: 1.4604  loss_bbox: 0.6126  loss_dfl: 1.4021
Traceback (most recent call last):
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 990, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/queue.py", line 175, in get
    while not self._qsize():
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/queue.py", line 209, in _qsize
    return len(self.queue)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 8104) is killed by signal: Killed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/lsw/HL/othercode/mi/tools/train.py", line 125, in <module>
    main()
  File "/home/lsw/HL/othercode/mi/tools/train.py", line 121, in main
    runner.train()
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1746, in train
    model = self.train_loop.run()  # type: ignore
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run
    self.run_epoch()
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/loops.py", line 111, in run_epoch
    for idx, data_batch in enumerate(self.dataloader):
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1186, in _next_data
    idx, data = self._get_data()
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1142, in _get_data
    success, data = self._try_get_data()
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1003, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 8104) exited unexpectedly

07/20 21:31:35 - mmengine - INFO - Epoch(train)  [1][ 260/4634]  base_lr: 1.2500e-04 lr: 1.3973e-06  eta: 14:31:26  time: 0.1365  data_time: 0.0374  memory: 715  loss: 7.9384  loss_cls: 2.2571  loss_state: 3.1435  loss_bbox: 0.7062  loss_dfl: 1.8315
Traceback (most recent call last):
  File "/home/lsw/HL/othercode/mi/tools/train.py", line 125, in <module>
    main()
  File "/home/lsw/HL/othercode/mi/tools/train.py", line 121, in main
    runner.train()
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1746, in train
    model = self.train_loop.run()  # type: ignore
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run
    self.run_epoch()
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
    self.run_iter(idx, data_batch)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter
    outputs = self.runner.model.train_step(
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 113, in train_step
    data = self.data_preprocessor(data, True)
  File "/home/lsw/miniconda3/envs/mi/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lsw/HL/othercode/mi/mmyolo/models/data_preprocessors/data_preprocessor.py", line 163, in forward
    _input = _input[[2, 1, 0], ...]
RuntimeError: CUDA out of memory. Tried to allocate 5.77 GiB (GPU 0; 11.91 GiB total capacity; 5.89 GiB already allocated; 4.86 GiB free; 5.98 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

代码

_backend_args = None
_multiscale_resize_transforms = [
    dict(
        transforms=[
            dict(scale=(
                640,
                640,
            ), type='YOLOv5KeepRatioResize'),
            dict(
                allow_scale_up=False,
                pad_val=dict(img=114),
                scale=(
                    640,
                    640,
                ),
                type='LetterResize'),
        ],
        type='Compose'),
    dict(
        transforms=[
            dict(scale=(
                320,
                320,
            ), type='YOLOv5KeepRatioResize'),
            dict(
                allow_scale_up=False,
                pad_val=dict(img=114),
                scale=(
                    320,
                    320,
                ),
                type='LetterResize'),
        ],
        type='Compose'),
    dict(
        transforms=[
            dict(scale=(
                960,
                960,
            ), type='YOLOv5KeepRatioResize'),
            dict(
                allow_scale_up=False,
                pad_val=dict(img=114),
                scale=(
                    960,
                    960,
                ),
                type='LetterResize'),
        ],
        type='Compose'),
]
backend_args = None
# base_lr = 0.001
base_lr = 0.000125
custom_hooks = [
    dict(
        ema_type='ExpMomentumEMA',
        momentum=0.0002,
        priority=49,
        strict_load=False,
        type='EMAHook',
        update_buffers=True),
]
data_root = 'data/coco/'
dataset_type = 'YOLOv5CocoDataset'
deepen_factor = 0.33
default_hooks = dict(
    checkpoint=dict(
        interval=5,
        max_keep_ckpts=16,
        save_best='auto',
        type='CheckpointHook'),
    logger=dict(interval=10, type='LoggerHook'),
    param_scheduler=dict(
        min_lr_ratio=0.0,
        start_factor=0.0,
        total_epochs=96,
        type='PPYOLOEParamSchedulerHook',
        warmup_epochs=5,
        warmup_min_iter=1000),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    timer=dict(type='IterTimerHook'),
    visualization=dict(type='mmdet.DetVisualizationHook'))
default_scope = 'mmyolo'
env_cfg = dict(
    cudnn_benchmark=False,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
img_scale = (
    640,
    640,
)
img_scales = [
    (
        640,
        640,
    ),
    (
        320,
        320,
    ),
    (
        960,
        960,
    ),
]
launcher = 'none'
load_from = '/home/lsw/HL/othercode/mi/checkpoints/ppyoloe_plus_s_obj365_pretrained-bcfe8478.pth'
log_level = 'INFO'
log_processor = dict(by_epoch=True, type='LogProcessor', window_size=50)
max_epochs = 80
metainfo = dict(
    classes=(
        'humanchild',
        'humanpregnant_woman',
        'humanold',
        'humanhandicapped',
    ),
    palette=[
        (
            20,
            220,
            60,
        ),
    ])
model = dict(
    backbone=dict(
        act_cfg=dict(inplace=True, type='SiLU'),
        attention_cfg=dict(
            act_cfg=dict(type='HSigmoid'), type='EffectiveSELayer'),
        block_cfg=dict(
            shortcut=True, type='PPYOLOEBasicBlock', use_alpha=True),
        deepen_factor=0.33,
        norm_cfg=dict(eps=1e-05, momentum=0.1, type='BN'),
        type='PPYOLOECSPResNet',
        use_large_stem=True,
        widen_factor=0.5),
    bbox_head=dict(
        bbox_coder=dict(type='DistancePointBBoxCoder'),
        head_module=dict(
            act_cfg=dict(inplace=True, type='SiLU'),
            featmap_strides=[
                8,
                16,
                32,
            ],
            in_channels=[
                192,
                384,
                768,
            ],
            norm_cfg=dict(eps=1e-05, momentum=0.1, type='BN'),
            num_base_priors=1,
            num_classes=4,
            num_state=4,
            reg_max=16,
            type='PPYOLOEHeadModule',
            widen_factor=0.5),
        loss_bbox=dict(
            bbox_format='xyxy',
            iou_mode='giou',
            loss_weight=2.5,
            reduction='mean',
            return_iou=False,
            type='IoULoss'),
        loss_cls=dict(
            alpha=0.75,
            gamma=2.0,
            iou_weighted=True,
            loss_weight=1.0,
            reduction='sum',
            type='mmdet.VarifocalLoss',
            use_sigmoid=True),
        loss_dfl=dict(
            loss_weight=0.125,
            reduction='mean',
            type='mmdet.DistributionFocalLoss'),
        loss_state=dict(
            alpha=0.75,
            gamma=2.0,
            iou_weighted=True,
            loss_weight=1.0,
            reduction='sum',
            type='mmdet.VarifocalLoss',
            use_sigmoid=True),
        prior_generator=dict(
            offset=0.5, strides=[
                8,
                16,
                32,
            ], type='mmdet.MlvlPointGenerator'),
        type='PPYOLOEHead'),
    data_preprocessor=dict(
        batch_augments=[
            dict(
                interval=1,
                keep_ratio=False,
                random_interp=True,
                random_size_range=(
                    320,
                    800,
                ),
                size_divisor=32,
                type='PPYOLOEBatchRandomResize'),
        ],
        bgr_to_rgb=True,
        mean=[
            0.0,
            0.0,
            0.0,
        ],
        pad_size_divisor=32,
        std=[
            255.0,
            255.0,
            255.0,
        ],
        type='PPYOLOEDetDataPreprocessor'),
    neck=dict(
        act_cfg=dict(inplace=True, type='SiLU'),
        block_cfg=dict(
            shortcut=False, type='PPYOLOEBasicBlock', use_alpha=False),
        deepen_factor=0.33,
        drop_block_cfg=None,
        in_channels=[
            256,
            512,
            1024,
        ],
        norm_cfg=dict(eps=1e-05, momentum=0.1, type='BN'),
        num_blocks_per_layer=3,
        num_csplayer=1,
        out_channels=[
            192,
            384,
            768,
        ],
        type='PPYOLOECSPPAFPN',
        use_spp=True,
        widen_factor=0.5),
    test_cfg=dict(
        max_per_img=300,
        multi_label=True,
        nms=dict(iou_threshold=0.7, type='nms'),
        nms_pre=1000,
        score_thr=0.01),
    train_cfg=dict(
        assigner=dict(
            alpha=1,
            beta=6,
            eps=1e-09,
            num_classes=4,
            topk=13,
            type='mi_BatchTaskAlignedAssigner'),
        initial_assigner=dict(
            iou_calculator=dict(type='mmdet.BboxOverlaps2D'),
            num_classes=4,
            topk=9,
            type='BatchATSSAssigner'),
        initial_epoch=30),
    type='YOLODetector')
num_classes = 4
num_state = 4
optim_wrapper = dict(
    optimizer=dict(
        lr=0.000125,
        momentum=0.9,
        nesterov=False,
        type='SGD',
        weight_decay=0.0005),
    paramwise_cfg=dict(norm_decay_mult=0.0),
    type='OptimWrapper')
param_scheduler = None
persistent_workers = True
resume = False
save_epoch_intervals = 80
strides = [
    8,
    16,
    32,
]
test_cfg = dict(type='TestLoop')
test_dataloader = dict(
    batch_size=1,
    dataset=dict(
        ann_file='annotations/instances_val2017.json',
        data_prefix=dict(img='images/val2017/'),
        data_root='data/coco/',
        filter_cfg=dict(filter_empty_gt=True, min_size=0),
        metainfo=dict(
            classes=(
                'humanchild',
                'humanpregnant_woman',
                'humanold',
                'humanhandicapped',
            ),
            palette=[
                (
                    20,
                    220,
                    60,
                ),
            ]),
        pipeline=[
            dict(backend_args=None, type='LoadImageFromFile'),
            dict(
                height=640,
                interpolation='bicubic',
                keep_ratio=False,
                type='mmdet.FixShapeResize',
                width=640),
            dict(_scope_='mmdet', type='LoadAnnotations', with_bbox=True),
            dict(
                meta_keys=(
                    'img_id',
                    'img_path',
                    'ori_shape',
                    'img_shape',
                    'scale_factor',
                ),
                type='mmdet.PackDetInputs'),
        ],
        test_mode=True,
        type='YOLOv5CocoDataset'),
    drop_last=False,
    num_workers=1,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
test_evaluator = dict(
    ann_file='data/coco/annotations/instances_val2017.json',
    metric='bbox',
    proposal_nums=(
        100,
        1,
        10,
    ),
    type='mmdet.CocoMetric')
test_pipeline = [
    dict(backend_args=None, type='LoadImageFromFile'),
    dict(
        height=640,
        interpolation='bicubic',
        keep_ratio=False,
        type='mmdet.FixShapeResize',
        width=640),
    dict(_scope_='mmdet', type='LoadAnnotations', with_bbox=True),
    dict(
        meta_keys=(
            'img_id',
            'img_path',
            'ori_shape',
            'img_shape',
            'scale_factor',
        ),
        type='mmdet.PackDetInputs'),
]
train_batch_size_per_gpu = 1
train_cfg = dict(max_epochs=80, type='EpochBasedTrainLoop', val_interval=80)
train_dataloader = dict(
    batch_size=1,
    collate_fn=dict(type='yolov5_collate', use_ms_training=True),
    dataset=dict(
        ann_file='annotations/instances_train2017.json',
        data_prefix=dict(img='images/train2017/'),
        data_root='data/coco/',
        filter_cfg=dict(filter_empty_gt=True, min_size=0),
        metainfo=dict(
            classes=(
                'humanchild',
                'humanpregnant_woman',
                'humanold',
                'humanhandicapped',
            ),
            palette=[
                (
                    20,
                    220,
                    60,
                ),
            ]),
        pipeline=[
            dict(backend_args=None, type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(type='PPYOLOERandomDistort'),
            dict(mean=(
                103.53,
                116.28,
                123.675,
            ), type='mmdet.Expand'),
            dict(type='PPYOLOERandomCrop'),
            dict(prob=0.5, type='mmdet.RandomFlip'),
            dict(
                meta_keys=(
                    'img_id',
                    'img_path',
                    'ori_shape',
                    'img_shape',
                    'flip',
                    'flip_direction',
                ),
                type='mmdet.PackDetInputs'),
        ],
        type='YOLOv5CocoDataset'),
    num_workers=1,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=True, type='DefaultSampler'))
train_num_workers = 1
train_pipeline = [
    dict(backend_args=None, type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='PPYOLOERandomDistort'),
    dict(mean=(
        103.53,
        116.28,
        123.675,
    ), type='mmdet.Expand'),
    dict(type='PPYOLOERandomCrop'),
    dict(prob=0.5, type='mmdet.RandomFlip'),
    dict(
        meta_keys=(
            'img_id',
            'img_path',
            'ori_shape',
            'img_shape',
            'flip',
            'flip_direction',
        ),
        type='mmdet.PackDetInputs'),
]
tta_model = dict(
    tta_cfg=dict(max_per_img=300, nms=dict(iou_threshold=0.65, type='nms')),
    type='mmdet.DetTTAModel')
tta_pipeline = [
    dict(backend_args=None, type='LoadImageFromFile'),
    dict(
        transforms=[
            [
                dict(
                    transforms=[
                        dict(scale=(
                            640,
                            640,
                        ), type='YOLOv5KeepRatioResize'),
                        dict(
                            allow_scale_up=False,
                            pad_val=dict(img=114),
                            scale=(
                                640,
                                640,
                            ),
                            type='LetterResize'),
                    ],
                    type='Compose'),
                dict(
                    transforms=[
                        dict(scale=(
                            320,
                            320,
                        ), type='YOLOv5KeepRatioResize'),
                        dict(
                            allow_scale_up=False,
                            pad_val=dict(img=114),
                            scale=(
                                320,
                                320,
                            ),
                            type='LetterResize'),
                    ],
                    type='Compose'),
                dict(
                    transforms=[
                        dict(scale=(
                            960,
                            960,
                        ), type='YOLOv5KeepRatioResize'),
                        dict(
                            allow_scale_up=False,
                            pad_val=dict(img=114),
                            scale=(
                                960,
                                960,
                            ),
                            type='LetterResize'),
                    ],
                    type='Compose'),
            ],
            [
                dict(prob=1.0, type='mmdet.RandomFlip'),
                dict(prob=0.0, type='mmdet.RandomFlip'),
            ],
            [
                dict(type='mmdet.LoadAnnotations', with_bbox=True),
            ],
            [
                dict(
                    meta_keys=(
                        'img_id',
                        'img_path',
                        'ori_shape',
                        'img_shape',
                        'scale_factor',
                        'pad_param',
                        'flip',
                        'flip_direction',
                    ),
                    type='mmdet.PackDetInputs'),
            ],
        ],
        type='TestTimeAug'),
]
val_batch_size_per_gpu = 1
val_cfg = dict(type='ValLoop')
val_dataloader = dict(
    batch_size=1,
    dataset=dict(
        ann_file='annotations/instances_val2017.json',
        data_prefix=dict(img='images/val2017/'),
        data_root='data/coco/',
        filter_cfg=dict(filter_empty_gt=True, min_size=0),
        metainfo=dict(
            classes=(
                'humanchild',
                'humanpregnant_woman',
                'humanold',
                'humanhandicapped',
            ),
            palette=[
                (
                    20,
                    220,
                    60,
                ),
            ]),
        pipeline=[
            dict(backend_args=None, type='LoadImageFromFile'),
            dict(
                height=640,
                interpolation='bicubic',
                keep_ratio=False,
                type='mmdet.FixShapeResize',
                width=640),
            dict(_scope_='mmdet', type='LoadAnnotations', with_bbox=True),
            dict(
                meta_keys=(
                    'img_id',
                    'img_path',
                    'ori_shape',
                    'img_shape',
                    'scale_factor',
                ),
                type='mmdet.PackDetInputs'),
        ],
        test_mode=True,
        type='YOLOv5CocoDataset'),
    drop_last=False,
    num_workers=1,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
val_evaluator = dict(
    ann_file='data/coco/annotations/instances_val2017.json',
    metric='bbox',
    proposal_nums=(
        100,
        1,
        10,
    ),
    type='mmdet.CocoMetric')
val_num_workers = 1
vis_backends = [
    dict(type='LocalVisBackend'),
]
visualizer = dict(
    name='visualizer',
    type='mmdet.DetLocalVisualizer',
    vis_backends=[
        dict(type='LocalVisBackend'),
    ])
widen_factor = 0.5
work_dir = 'HL'

  • 19
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
ppyoloe”是一种人工智能技术,全称为“PaddlePaddle Parallelized Online Learning for One-Class Collaborative Filtering”。该技术是百度开发的一种基于深度学习的个性化推荐算法。其主要应用于电商、社交、新闻和音乐等领域的个性化推荐和商品搭配推荐。 ppyoloe算法的原理是利用深度学习中的卷积神经网络进行特征提取,将用户与商品之间的交互数据表示为一个高维向量。然后通过特征向量之间的相似度计算来实现推荐。在模型训练过程中,ppyoloe采用分布式学习技术,将数据分割到多个节点进行训练,从而提高了模型的训练速度和准确性。 ppyoloe技术的应用主要包括以下几个方面: 1. 个性化推荐:做为一种个性化推荐算法,ppyoloe能从用户的历史行为数据中学习用户的兴趣爱好和购买习惯,并通过模型计算出最合适的商品推荐和搜索结果。 2. 商品搭配推荐:ppyoloe还可以利用用户的历史行为数据,针对不同的商品进行个性化的搭配推荐,帮助用户方便地找到搭配感兴趣的商品。 3. 社交推荐:ppyoloe可以根据用户的朋友关系、社交行为等数据,推荐与用户兴趣相符的社交内容和用户。 4. 新闻推荐:ppyoloe也能通过分析用户的阅读行为,学习用户的新闻兴趣,给予更合适的新闻推荐,帮助用户更好地了解最新资讯。 5. 音乐推荐:ppyoloe可以根据用户的节奏、喜好等数据学习用户的音乐喜好,推荐更合适的音乐和歌曲。 总之,ppyoloe技术能够帮助企业和个人为用户提供更加个性化和精准的推荐服务,提高用户体验和购买转化率。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值