因为,bfloat16的计算,真的太慢了。。。。。。一丁点优化都没有。。。
都上了 CUDA 11.3 ,驱动版本 511.23 了
styleganv2 1070显卡上的Log,改成bfloat16跑1000张图像居然要6分钟
而跑在 float16 上只需要2分钟
如果换到 3090 显卡上,跑1000张图像的 bfloat16,也需要3.6分钟,如果是切换到 float32,则只需要30秒。
bfloat16 的计算速度真是令人绝望。
这个是使用GTX1070 bfloat16的
2022-03-19 18:12:26.821 | INFO | Generator Parameters Buffers Output shape Datatype
2022-03-19 18:12:26.821 | INFO | --- --- --- --- ---
2022-03-19 18:12:26.821 | INFO | mapping.layers.0 262656 - [16, 512] float32
2022-03-19 18:12:26.836 | INFO | mapping.layers.1 262656 - [16, 512] float32
2022-03-19 18:12:26.836 | INFO | mapping.layers.2 262656 - [16, 512] float32
2022-03-19 18:12:26.836 | INFO | mapping.layers.3 262656 - [16, 512] float32
2022-03-19 18:12:26.836 | INFO | mapping.layers.4 262656 - [16, 512] float32
2022-03-19 18:12:26.836 | INFO | mapping.layers.5 262656 - [16, 512] float32
2022-03-19 18:12:26.836 | INFO | mapping.layers.6 262656 - [16, 512] float32
2022-03-19 18:12:26.836 | INFO | mapping.layers.7 262656 - [16, 512] float32
2022-03-19 18:12:26.836 | INFO | mapping - 512 [16, 12, 512] float32
2022-03-19 18:12:26.852 | INFO | synthesis.blocks.b4.conv1 721408 16 [16, 256, 4, 4] float32
2022-03-19 18:12:26.852 | INFO | synthesis.blocks.b4 4096 16 [16, 256, 4, 4] float32
2022-03-19 18:12:26.852 | INFO | synthesis.blocks.b8.skip 65536 16 [16, 256, 8, 8] float32
2022-03-19 18:12:26.852 | INFO | synthesis.blocks.b8.conv0 721408 16 [16, 256, 8, 8] float32
2022-03-19 18:12:26.852 | INFO | synthesis.blocks.b8.conv1 721408 16 [16, 256, 8, 8] float32
2022-03-19 18:12:26.852 | INFO | synthesis.blocks.b8 - 16 [16, 256, 8, 8] float32
2022-03-19 18:12:26.852 | INFO | synthesis.blocks.b16.skip 65536 16 [16, 256, 16, 16] bfloat16
2022-03-19 18:12:26.852 | INFO | synthesis.blocks.b16.conv0 721408 16 [16, 256, 16, 16] bfloat16
2022-03-19 18:12:26.868 | INFO | synthesis.blocks.b16.conv1 721408 16 [16, 256, 16, 16] bfloat16
2022-03-19 18:12:26.868 | INFO | synthesis.blocks.b16 - 16 [16, 256, 16, 16] bfloat16
2022-03-19 18:12:26.868 | INFO | synthesis.blocks.b32.skip 65536 16 [16, 256, 32, 32] bfloat16
2022-03-19 18:12:26.868 | INFO | synthesis.blocks.b32.conv0 721408 16 [16, 256, 32, 32] bfloat16
2022-03-19 18:12:26.868 | INFO | synthesis.blocks.b32.conv1 721408 16 [16, 256, 32, 32] bfloat16
2022-03-19 18:12:26.868 | INFO | synthesis.blocks.b32 - 16 [16, 256, 32, 32] bfloat16
2022-03-19 18:12:26.868 | INFO | synthesis.blocks.b64.skip 32768 16 [16, 128, 64, 64] bfloat16
2022-03-19 18:12:26.868 | INFO | synthesis.blocks.b64.conv0 426368 16 [16, 128, 64, 64] bfloat16
2022-03-19 18:12:26.883 | INFO | synthesis.blocks.b64.conv1 213248 16 [16, 128, 64, 64] bfloat16
2022-03-19 18:12:26.883 | INFO | synthesis.blocks.b64 - 16 [16, 128, 64, 64] bfloat16
2022-03-19 18:12:26.883 | INFO | synthesis.blocks.b128.skip 8192 16 [16, 64, 128, 128] bfloat16
2022-03-19 18:12:26.883 | INFO | synthesis.blocks.b128.conv0 139456 16 [16, 64, 128, 128] bfloat16
2022-03-19 18:12:26.883 | INFO | synthesis.blocks.b128.conv1 69760 16 [16, 64, 128, 128] bfloat16
2022-03-19 18:12:26.883 | INFO | synthesis.blocks.b128.torgb 33027 - [16, 3, 128, 128] bfloat16
2022-03-19 18:12:26.883 | INFO | synthesis.blocks.b128:0 - 16 [16, 64, 128, 128] bfloat16
2022-03-19 18:12:26.883 | INFO | synthesis.blocks.b128:1 - - [16, 3, 128, 128] bfloat16
2022-03-19 18:12:26.899 | INFO | --- --- --- --- ---
2022-03-19 18:12:26.899 | INFO | Total 8274627 864 - -
2022-03-19 18:12:26.899 | INFO |
2022-03-19 18:12:27.243 | INFO |
2022-03-19 18:12:27.243 | INFO | Discriminator Parameters Buffers Output shape Datatype
2022-03-19 18:12:27.243 | INFO | --- --- --- --- ---
2022-03-19 18:12:27.258 | INFO | blocks.b128.fromrgb 256 16 [16, 64, 128, 128] bfloat16
2022-03-19 18:12:27.258 | INFO | blocks.b128.skip 8192 16 [16, 128, 64, 64] bfloat16
2022-03-19 18:12:27.258 | INFO | blocks.b128.conv0 36928 16 [16, 64, 128, 128] bfloat16
2022-03-19 18:12:27.258 | INFO | blocks.b128.conv1 73856 16 [16, 128, 64, 64] bfloat16
2022-03-19 18:12:27.258 | INFO | blocks.b128 - 16 [16, 128, 64, 64] bfloat16
2022-03-19 18:12:27.258 | INFO | blocks.b64.skip 32768 16 [16, 256, 32, 32] bfloat16
2022-03-19 18:12:27.258 | INFO | blocks.b64.conv0 147584 16 [16, 128, 64, 64] bfloat16
2022-03-19 18:12:27.258 | INFO | blocks.b64.conv1 295168 16 [16, 256, 32, 32] bfloat16
2022-03-19 18:12:27.274 | INFO | blocks.b64 - 16 [16, 256, 32, 32] bfloat16
2022-03-19 18:12:27.274 | INFO | blocks.b32.skip 65536 16 [16, 256, 16, 16] bfloat16
2022-03-19 18:12:27.274 | INFO | blocks.b32.conv0 590080 16 [16, 256, 32, 32] bfloat16
2022-03-19 18:12:27.274 | INFO | blocks.b32.conv1 590080 16 [16, 256, 16, 16] bfloat16
2022-03-19 18:12:27.274 | INFO | blocks.b32 - 16 [16, 256, 16, 16] bfloat16
2022-03-19 18:12:27.274 | INFO | blocks.b16.skip 65536 16 [16, 256, 8, 8] bfloat16
2022-03-19 18:12:27.274 | INFO | blocks.b16.conv0 590080 16 [16, 256, 16, 16] bfloat16
2022-03-19 18:12:27.274 | INFO | blocks.b16.conv1 590080 16 [16, 256, 8, 8] bfloat16
2022-03-19 18:12:27.289 | INFO | blocks.b16 - 16 [16, 256, 8, 8] bfloat16
2022-03-19 18:12:27.289 | INFO | blocks.b8.skip 65536 16 [16, 256, 4, 4] float32
2022-03-19 18:12:27.289 | INFO | blocks.b8.conv0 590080 16 [16, 256, 8, 8] float32
2022-03-19 18:12:27.289 | INFO | blocks.b8.conv1 590080 16 [16, 256, 4, 4] float32
2022-03-19 18:12:27.289 | INFO | blocks.b8 - 16 [16, 256, 4, 4] float32
2022-03-19 18:12:27.289 | INFO | b4.mbstd - - [16, 257, 4, 4] float32
2022-03-19 18:12:27.289 | INFO | b4.conv 592384 16 [16, 256, 4, 4] float32
2022-03-19 18:12:27.289 | INFO | b4.fc 1048832 - [16, 256] float32
2022-03-19 18:12:27.305 | INFO | b4.out 257 - [16, 1] float32
2022-03-19 18:12:27.305 | INFO | --- --- --- --- ---
2022-03-19 18:12:27.305 | INFO | Total 5973313 352 - -
2022-03-19 18:12:27.305 | INFO |
2022-03-19 18:12:27.305 | INFO | Setting up training phases...
2022-03-19 18:12:27.305 | INFO | Exporting sample images...
2022-03-19 18:12:40.073 | INFO | Training for 25000 total_kimg...
2022-03-19 18:12:40.073 | INFO |
2022-03-19 18:12:40.089 | INFO | Start summary writer.
2022-03-19 18:12:57.750 | INFO | tick 0 total_kimg 0.0 time 40.333200 sec/tick 17.7 sec/total_kimg 1103.83 maintenance 22.7 gpumem 1.55 reserved 3.10
2022-03-19 18:13:11.575 | INFO | Evaluating metrics...
2022-03-19 18:14:33.151 | INFO | fid: 286.04832986921974
2022-03-19 18:21:31.793 | INFO | tick 1 total_kimg 1.0 time 554.376297 sec/tick 418.6 sec/total_kimg 415.32 maintenance 95.4 gpumem 1.62 reserved 3.10
2022-03-19 18:28:43.830 | INFO | tick 2 total_kimg 2.0 time 986.413076 sec/tick 432.0 sec/total_kimg 428.61 maintenance 0.0 gpumem 1.62 reserved 3.10
2022-03-19 18:35:51.396 | INFO | tick 3 total_kimg 3.0 time 1413.979094 sec/tick 427.6 sec/total_kimg 424.17 maintenance 0.0 gpumem 1.62 reserved 3.10
2022-03-19 18:43:02.733 | INFO | tick 4 total_kimg 4.0 time 1845.316453 sec/tick 431.3 sec/total_kimg 427.91 maintenance 0.0 gpumem 1.62 reserved 3.10
2022-03-19 18:43:16.580 | INFO | Evaluating metrics...
这个为使用GTX1070 float16的
2022-03-19 23:00:21.011 | INFO | Generator Parameters Buffers Output shape Datatype
2022-03-19 23:00:21.013 | INFO | --- --- --- --- ---
2022-03-19 23:00:21.016 | INFO | mapping.layers.0 262656 - [16, 512] float32
2022-03-19 23:00:21.018 | INFO | mapping.layers.1 262656 - [16, 512] float32
2022-03-19 23:00:21.019 | INFO | mapping.layers.2 262656 - [16, 512] float32
2022-03-19 23:00:21.021 | INFO | mapping.layers.3 262656 - [16, 512] float32
2022-03-19 23:00:21.023 | INFO | mapping.layers.4 262656 - [16, 512] float32
2022-03-19 23:00:21.025 | INFO | mapping.layers.5 262656 - [16, 512] float32
2022-03-19 23:00:21.028 | INFO | mapping.layers.6 262656 - [16, 512] float32
2022-03-19 23:00:21.030 | INFO | mapping.layers.7 262656 - [16, 512] float32
2022-03-19 23:00:21.032 | INFO | mapping - 512 [16, 12, 512] float32
2022-03-19 23:00:21.035 | INFO | synthesis.blocks.b4.conv1 721408 16 [16, 256, 4, 4] float32
2022-03-19 23:00:21.037 | INFO | synthesis.blocks.b4 4096 16 [16, 256, 4, 4] float32
2022-03-19 23:00:21.040 | INFO | synthesis.blocks.b8.skip 65536 16 [16, 256, 8, 8] float32
2022-03-19 23:00:21.042 | INFO | synthesis.blocks.b8.conv0 721408 16 [16, 256, 8, 8] float32
2022-03-19 23:00:21.044 | INFO | synthesis.blocks.b8.conv1 721408 16 [16, 256, 8, 8] float32
2022-03-19 23:00:21.046 | INFO | synthesis.blocks.b8 - 16 [16, 256, 8, 8] float32
2022-03-19 23:00:21.047 | INFO | synthesis.blocks.b16.skip 65536 16 [16, 256, 16, 16] float16
2022-03-19 23:00:21.049 | INFO | synthesis.blocks.b16.conv0 721408 16 [16, 256, 16, 16] float16
2022-03-19 23:00:21.052 | INFO | synthesis.blocks.b16.conv1 721408 16 [16, 256, 16, 16] float16
2022-03-19 23:00:21.054 | INFO | synthesis.blocks.b16 - 16 [16, 256, 16, 16] float16
2022-03-19 23:00:21.056 | INFO | synthesis.blocks.b32.skip 65536 16 [16, 256, 32, 32] float16
2022-03-19 23:00:21.059 | INFO | synthesis.blocks.b32.conv0 721408 16 [16, 256, 32, 32] float16
2022-03-19 23:00:21.061 | INFO | synthesis.blocks.b32.conv1 721408 16 [16, 256, 32, 32] float16
2022-03-19 23:00:21.063 | INFO | synthesis.blocks.b32 - 16 [16, 256, 32, 32] float16
2022-03-19 23:00:21.064 | INFO | synthesis.blocks.b64.skip 32768 16 [16, 128, 64, 64] float16
2022-03-19 23:00:21.066 | INFO | synthesis.blocks.b64.conv0 426368 16 [16, 128, 64, 64] float16
2022-03-19 23:00:21.068 | INFO | synthesis.blocks.b64.conv1 213248 16 [16, 128, 64, 64] float16
2022-03-19 23:00:21.070 | INFO | synthesis.blocks.b64 - 16 [16, 128, 64, 64] float16
2022-03-19 23:00:21.072 | INFO | synthesis.blocks.b128.skip 8192 16 [16, 64, 128, 128] float16
2022-03-19 23:00:21.075 | INFO | synthesis.blocks.b128.conv0 139456 16 [16, 64, 128, 128] float16
2022-03-19 23:00:21.077 | INFO | synthesis.blocks.b128.conv1 69760 16 [16, 64, 128, 128] float16
2022-03-19 23:00:21.079 | INFO | synthesis.blocks.b128.torgb 33027 - [16, 3, 128, 128] float16
2022-03-19 23:00:21.081 | INFO | synthesis.blocks.b128:0 - 16 [16, 64, 128, 128] float16
2022-03-19 23:00:21.082 | INFO | synthesis.blocks.b128:1 - - [16, 3, 128, 128] float16
2022-03-19 23:00:21.085 | INFO | --- --- --- --- ---
2022-03-19 23:00:21.088 | INFO | Total 8274627 864 - -
2022-03-19 23:00:21.090 | INFO |
2022-03-19 23:00:21.108 | INFO |
2022-03-19 23:00:21.111 | INFO | Discriminator Parameters Buffers Output shape Datatype
2022-03-19 23:00:21.113 | INFO | --- --- --- --- ---
2022-03-19 23:00:21.115 | INFO | blocks.b128.fromrgb 256 16 [16, 64, 128, 128] float16
2022-03-19 23:00:21.117 | INFO | blocks.b128.skip 8192 16 [16, 128, 64, 64] float16
2022-03-19 23:00:21.120 | INFO | blocks.b128.conv0 36928 16 [16, 64, 128, 128] float16
2022-03-19 23:00:21.123 | INFO | blocks.b128.conv1 73856 16 [16, 128, 64, 64] float16
2022-03-19 23:00:21.125 | INFO | blocks.b128 - 16 [16, 128, 64, 64] float16
2022-03-19 23:00:21.127 | INFO | blocks.b64.skip 32768 16 [16, 256, 32, 32] float16
2022-03-19 23:00:21.130 | INFO | blocks.b64.conv0 147584 16 [16, 128, 64, 64] float16
2022-03-19 23:00:21.132 | INFO | blocks.b64.conv1 295168 16 [16, 256, 32, 32] float16
2022-03-19 23:00:21.134 | INFO | blocks.b64 - 16 [16, 256, 32, 32] float16
2022-03-19 23:00:21.136 | INFO | blocks.b32.skip 65536 16 [16, 256, 16, 16] float16
2022-03-19 23:00:21.137 | INFO | blocks.b32.conv0 590080 16 [16, 256, 32, 32] float16
2022-03-19 23:00:21.139 | INFO | blocks.b32.conv1 590080 16 [16, 256, 16, 16] float16
2022-03-19 23:00:21.141 | INFO | blocks.b32 - 16 [16, 256, 16, 16] float16
2022-03-19 23:00:21.143 | INFO | blocks.b16.skip 65536 16 [16, 256, 8, 8] float16
2022-03-19 23:00:21.145 | INFO | blocks.b16.conv0 590080 16 [16, 256, 16, 16] float16
2022-03-19 23:00:21.146 | INFO | blocks.b16.conv1 590080 16 [16, 256, 8, 8] float16
2022-03-19 23:00:21.148 | INFO | blocks.b16 - 16 [16, 256, 8, 8] float16
2022-03-19 23:00:21.151 | INFO | blocks.b8.skip 65536 16 [16, 256, 4, 4] float32
2022-03-19 23:00:21.154 | INFO | blocks.b8.conv0 590080 16 [16, 256, 8, 8] float32
2022-03-19 23:00:21.156 | INFO | blocks.b8.conv1 590080 16 [16, 256, 4, 4] float32
2022-03-19 23:00:21.158 | INFO | blocks.b8 - 16 [16, 256, 4, 4] float32
2022-03-19 23:00:21.160 | INFO | b4.mbstd - - [16, 257, 4, 4] float32
2022-03-19 23:00:21.162 | INFO | b4.conv 592384 16 [16, 256, 4, 4] float32
2022-03-19 23:00:21.164 | INFO | b4.fc 1048832 - [16, 256] float32
2022-03-19 23:00:21.165 | INFO | b4.out 257 - [16, 1] float32
2022-03-19 23:00:21.168 | INFO | --- --- --- --- ---
2022-03-19 23:00:21.170 | INFO | Total 5973313 352 - -
2022-03-19 23:00:21.173 | INFO |
2022-03-19 23:00:21.175 | INFO | Setting up training phases...
2022-03-19 23:00:21.178 | INFO | Training for 25000 total_kimg...
2022-03-19 23:00:21.180 | INFO |
2022-03-19 23:00:21.182 | INFO | Start summary writer.
2022-03-19 23:00:21.699 | INFO | Prepare fid ref stat cache.
2022-03-19 23:02:34.278 | INFO | tick 12 total_kimg 13.1 time 142.557234 sec/tick 133.1 sec/total_kimg 132.04 maintenance 9.5 gpumem 3.24 reserved 6.61
2022-03-19 23:02:45.651 | INFO | Evaluating metrics...
2022-03-19 23:04:10.113 | INFO | ema fid: 289.9442185138049 | fid: 276.2114226784195
2022-03-19 23:06:24.014 | INFO | tick 13 total_kimg 14.1 time 372.293017 sec/tick 133.9 sec/total_kimg 132.83 maintenance 95.8 gpumem 3.24 reserved 6.61
2022-03-19 23:08:35.925 | INFO | tick 14 total_kimg 15.1 time 504.203275 sec/tick 131.9 sec/total_kimg 130.86 maintenance 0.0 gpumem 3.25 reserved 6.61
2022-03-19 23:10:46.356 | INFO | tick 15 total_kimg 16.1 time 634.636633 sec/tick 130.4 sec/total_kimg 129.39 maintenance 0.0 gpumem 3.25 reserved 6.61
2022-03-19 23:13:00.914 | INFO | tick 16 total_kimg 17.2 time 769.193491 sec/tick 134.6 sec/total_kimg 133.48 maintenance 0.0 gpumem 3.25 reserved 6.61
2022-03-19 23:13:12.126 | INFO | Evaluating metrics...