分布式计算学习第二坑 cudnn

1 篇文章 0 订阅
1 篇文章 0 订阅
andrew@1manjaro:~/mount/arch/TensorFlowOnSpark/examples/resnet#export TF_CONFIG='{"cluster": { "chief": ["localhost:2222"], "worker": ["localhost:2223"]}, "task": {"type": "chief", "index": 0}}'
python resnet_cifar_main.py --data_dir=${CIFAR_DATA} --num_gpus=1 --ds=multi_worker_mirrored --train_epochs=100
2020-05-26 10:43:02.240568: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-05-26 10:43:03.329536: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-26 10:43:03.332103: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-26 10:43:03.332337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:1c:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.725GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-05-26 10:43:03.332359: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-05-26 10:43:03.333363: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-26 10:43:03.334473: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-26 10:43:03.334654: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-26 10:43:03.335746: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-26 10:43:03.336382: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-26 10:43:03.338684: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-26 10:43:03.338811: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-26 10:43:03.339096: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-26 10:43:03.339297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-05-26 10:43:03.339909: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: FMA
2020-05-26 10:43:03.360884: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3701195000 Hz
2020-05-26 10:43:03.361383: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5586666e7300 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-26 10:43:03.361399: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-05-26 10:43:03.682688: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-26 10:43:03.682977: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x558665a43420 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-26 10:43:03.683005: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
2020-05-26 10:43:03.683211: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-26 10:43:03.683432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:1c:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.725GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-05-26 10:43:03.683458: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-05-26 10:43:03.683481: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-26 10:43:03.683491: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-26 10:43:03.683500: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-26 10:43:03.683510: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-26 10:43:03.683519: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-26 10:43:03.683528: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-26 10:43:03.683571: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-26 10:43:03.683789: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-26 10:43:03.683974: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-05-26 10:43:03.683995: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-05-26 10:43:03.959819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-26 10:43:03.959852: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-05-26 10:43:03.959860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2020-05-26 10:43:03.960066: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-26 10:43:03.960320: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-26 10:43:03.960529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 35 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:1c:00.0, compute capability: 7.5)
2020-05-26 10:43:03.960938: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2020-05-26 10:43:03.961343: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-26 10:43:03.961546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:1c:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.725GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-05-26 10:43:03.961570: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2020-05-26 10:43:03.961593: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-26 10:43:03.961604: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-26 10:43:03.961614: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-26 10:43:03.961624: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-26 10:43:03.961634: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-26 10:43:03.961644: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-26 10:43:03.961686: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-26 10:43:03.961902: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-26 10:43:03.962084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-05-26 10:43:03.962100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-26 10:43:03.962106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-05-26 10:43:03.962111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2020-05-26 10:43:03.962169: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-26 10:43:03.962391: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-26 10:43:03.962579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:chief/replica:0/task:0/device:GPU:0 with 35 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:1c:00.0, compute capability: 7.5)
2020-05-26 10:43:03.964986: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job chief -> {0 -> localhost:2222}
2020-05-26 10:43:03.965006: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job worker -> {0 -> localhost:2223}
2020-05-26 10:43:03.965380: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:390] Started server with target: grpc://localhost:2222
INFO:tensorflow:Enabled multi-worker collective ops with available devices: ['/job:chief/replica:0/task:0/device:CPU:0', '/job:chief/replica:0/task:0/device:XLA_CPU:0', '/job:chief/replica:0/task:0/device:XLA_GPU:0', '/job:chief/replica:0/task:0/device:GPU:0']
I0526 10:43:03.965880 140302475241280 collective_all_reduce_strategy.py:303] Enabled multi-worker collective ops with available devices: ['/job:chief/replica:0/task:0/device:CPU:0', '/job:chief/replica:0/task:0/device:XLA_CPU:0', '/job:chief/replica:0/task:0/device:XLA_GPU:0', '/job:chief/replica:0/task:0/device:GPU:0']
INFO:tensorflow:Using MirroredStrategy with devices ('/job:chief/task:0/device:GPU:0',)
I0526 10:43:03.966263 140302475241280 mirrored_strategy.py:500] Using MirroredStrategy with devices ('/job:chief/task:0/device:GPU:0',)
INFO:tensorflow:MultiWorkerMirroredStrategy with cluster_spec = {'chief': ['localhost:2222'], 'worker': ['localhost:2223']}, task_type = 'chief', task_id = 0, num_workers = 2, local_devices = ('/job:chief/task:0/device:GPU:0',), communication = CollectiveCommunication.AUTO
I0526 10:43:03.966428 140302475241280 collective_all_reduce_strategy.py:344] MultiWorkerMirroredStrategy with cluster_spec = {'chief': ['localhost:2222'], 'worker': ['localhost:2223']}, task_type = 'chief', task_id = 0, num_workers = 2, local_devices = ('/job:chief/task:0/device:GPU:0',), communication = CollectiveCommunication.AUTO
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
I0526 10:43:04.433851 140302475241280 cross_device_ops.py:1059] Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
I0526 10:43:04.437643 140302475241280 cross_device_ops.py:1059] Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
I0526 10:43:04.444913 140302475241280 cross_device_ops.py:1059] Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
I0526 10:43:04.447288 140302475241280 cross_device_ops.py:1059] Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
I0526 10:43:04.500257 140302475241280 cross_device_ops.py:1059] Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
I0526 10:43:04.502656 140302475241280 cross_device_ops.py:1059] Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
I0526 10:43:04.508196 140302475241280 cross_device_ops.py:1059] Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
I0526 10:43:04.510025 140302475241280 cross_device_ops.py:1059] Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
I0526 10:43:04.543776 140302475241280 cross_device_ops.py:1059] Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
INFO:tensorflow:Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
I0526 10:43:04.545943 140302475241280 cross_device_ops.py:1059] Collective batch_all_reduce: 1 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
INFO:tensorflow:Running Distribute Coordinator with mode = 'independent_worker', cluster_spec = {'chief': ['localhost:2222'], 'worker': ['localhost:2223']}, task_type = 'chief', task_id = 0, environment = None, rpc_layer = 'grpc'
I0526 10:43:06.832544 140302475241280 distribute_coordinator.py:773] Running Distribute Coordinator with mode = 'independent_worker', cluster_spec = {'chief': ['localhost:2222'], 'worker': ['localhost:2223']}, task_type = 'chief', task_id = 0, environment = None, rpc_layer = 'grpc'
WARNING:tensorflow:`eval_fn` is not passed in. The `worker_fn` will be used if an "evaluator" task exists in the cluster.
W0526 10:43:06.832681 140302475241280 distribute_coordinator.py:825] `eval_fn` is not passed in. The `worker_fn` will be used if an "evaluator" task exists in the cluster.
WARNING:tensorflow:`eval_strategy` is not passed in. No distribution strategy will be used for evaluation.
W0526 10:43:06.832730 140302475241280 distribute_coordinator.py:829] `eval_strategy` is not passed in. No distribution strategy will be used for evaluation.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:chief/task:0/device:GPU:0',)
I0526 10:43:06.833080 140302475241280 mirrored_strategy.py:500] Using MirroredStrategy with devices ('/job:chief/task:0/device:GPU:0',)
INFO:tensorflow:MultiWorkerMirroredStrategy with cluster_spec = {'chief': ['localhost:2222'], 'worker': ['localhost:2223']}, task_type = 'chief', task_id = 0, num_workers = 2, local_devices = ('/job:chief/task:0/device:GPU:0',), communication = CollectiveCommunication.AUTO
I0526 10:43:06.833179 140302475241280 collective_all_reduce_strategy.py:344] MultiWorkerMirroredStrategy with cluster_spec = {'chief': ['localhost:2222'], 'worker': ['localhost:2223']}, task_type = 'chief', task_id = 0, num_workers = 2, local_devices = ('/job:chief/task:0/device:GPU:0',), communication = CollectiveCommunication.AUTO
INFO:tensorflow:Using MirroredStrategy with devices ('/job:chief/task:0/device:GPU:0',)
I0526 10:43:06.833521 140302475241280 mirrored_strategy.py:500] Using MirroredStrategy with devices ('/job:chief/task:0/device:GPU:0',)
INFO:tensorflow:MultiWorkerMirroredStrategy with cluster_spec = {'chief': ['localhost:2222'], 'worker': ['localhost:2223']}, task_type = 'chief', task_id = 0, num_workers = 2, local_devices = ('/job:chief/task:0/device:GPU:0',), communication = CollectiveCommunication.AUTO
I0526 10:43:06.833611 140302475241280 collective_all_reduce_strategy.py:344] MultiWorkerMirroredStrategy with cluster_spec = {'chief': ['localhost:2222'], 'worker': ['localhost:2223']}, task_type = 'chief', task_id = 0, num_workers = 2, local_devices = ('/job:chief/task:0/device:GPU:0',), communication = CollectiveCommunication.AUTO
Epoch 1/100
INFO:tensorflow:Collective batch_all_reduce: 176 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
I0526 10:43:08.031161 140302475241280 cross_device_ops.py:1054] Collective batch_all_reduce: 176 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
INFO:tensorflow:Collective batch_all_reduce: 176 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
I0526 10:43:11.108297 140302475241280 cross_device_ops.py:1054] Collective batch_all_reduce: 176 all-reduces, num_workers = 2, communication_hint = AUTO, num_packs = 1
2020-05-26 10:43:15.515986: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-26 10:43:15.864391: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-26 10:43:16.010903: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-05-26 10:43:16.021040: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "resnet_cifar_main.py", line 288, in <module>
    app.run(main)
  File "/usr/lib/python3.8/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/lib/python3.8/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "resnet_cifar_main.py", line 282, in main
    return run(flags.FLAGS)
  File "resnet_cifar_main.py", line 251, in run
    history = model.fit(train_input_dataset,
  File "/usr/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 72, in _method_wrapper
    return dc.run_distribute_coordinator(
  File "/usr/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_coordinator.py", line 852, in run_distribute_coordinator
    return _run_single_worker(worker_fn, strategy, cluster_spec, task_type,
  File "/usr/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_coordinator.py", line 360, in _run_single_worker
    return worker_fn(strategy)
  File "/usr/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 73, in <lambda>
    lambda _: method(self, *args, **kwargs),
  File "/usr/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 848, in fit
    tmp_logs = train_function(iterator)
  File "/usr/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/usr/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 644, in _call
    return self._stateless_fn(*args, **kwds)
  File "/usr/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2420, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1661, in _filtered_call
    return self._call_flat(
  File "/usr/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1745, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/usr/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 593, in call
    outputs = execute.execute(
  File "/usr/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node resnet56/conv1/Conv2D (defined at /threading.py:932) ]]
	 [[GroupCrossDeviceControlEdges_0/Identity_2/_35]]
  (1) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node resnet56/conv1/Conv2D (defined at /threading.py:932) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_19079]

Function call stack:
train_function -> train_function

2020-05-26 10:43:16.566656: W tensorflow/core/common_runtime/eager/context.cc:447] Unable to destroy server_ object, so releasing instead. Servers don't support clean shutdown.

是显存不够用导致的

import tensorflow as tf
import keras

config = tf.compat.v1.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
tf.compat.v1.keras.backend.set_session(tf.compat.v1.Session(config=config))

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
分布式计算(第二版).pdf》是一本关于分布式计算概念和技术的书籍,内容涵盖了分布式计算的基本原理、系统结构、算法设计、性能优化等方面的内容。该书在第二版中对一些新的技术发展和应用进行了更新和扩充,使得读者可以更全面地了解和掌握分布式计算的知识。 该书首先介绍了分布式计算的基本概念和发展历程,让读者对分布式计算有一个整体的认识。然后详细介绍了分布式系统的基本组成和架构设计,包括并行计算、集群系统、网格计算等内容,读者可以通过学习这些知识了解到不同类型的分布式系统的组成和工作原理。 在算法设计和性能优化方面,该书着重介绍了分布式计算中的一些经典算法和优化策略,帮助读者深入理解分布式计算问题的本质,并学习如何应用合适的算法来解决实际的分布式计算任务。此外,该书还介绍了一些分布式计算的应用领域,如大数据处理、云计算、物联网等,帮助读者了解分布式计算在不同领域的应用场景和发展趋势。 总的来说,《分布式计算(第二版).pdf》是一本全面系统地介绍了分布式计算概念和技术的书籍,适合对分布式计算感兴趣的读者阅读学习。通过阅读该书,读者可以对分布式计算有一个深入的了解,掌握分布式计算的核心知识和技术,为后续的研究和实践工作打下良好的基础。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值