torch.cuda.OutOfMemoryError: CUDA out of memory.

最新推荐文章于 2024-06-11 01:09:38 发布

morcake

最新推荐文章于 2024-06-11 01:09:38 发布

阅读量3.8k

点赞数 1

文章标签：人工智能深度学习

本文链接：https://blog.csdn.net/m0_72572822/article/details/133296247

版权

Stable diffusion model failed to load
Loading weights [6ce0161689] from E:\SD\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned-emaonly.safetensors
Creating model from config: E:\SD\stable-diffusion-webui\configs\v1-inference.yaml
loading stable diffusion model: OutOfMemoryError
Traceback (most recent call last):
  File "E:\program\anaconda3\lib\threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "E:\program\anaconda3\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "E:\SD\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "E:\SD\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "E:\SD\stable-diffusion-webui\modules\ui.py", line 1298, in <lambda>
    update_image_cfg_scale_visibility = lambda: gr.update(visible=shared.sd_model and shared.sd_model.cond_stage_key == "edit")
  File "E:\SD\stable-diffusion-webui\modules\shared_items.py", line 110, in sd_model
    return modules.sd_models.model_data.get_sd_model()
  File "E:\SD\stable-diffusion-webui\modules\sd_models.py", line 499, in get_sd_model
    load_model()
  File "E:\SD\stable-diffusion-webui\modules\sd_models.py", line 626, in load_model
    load_model_weights(sd_model, checkpoint_info, state_dict, timer)
  File "E:\SD\stable-diffusion-webui\modules\sd_models.py", line 353, in load_model_weights
    model.load_state_dict(state_dict, strict=False)
  File "E:\SD\stable-diffusion-webui\modules\sd_disable_initialization.py", line 223, in <lambda>
    module_load_state_dict = self.replace(torch.nn.Module, 'load_state_dict', lambda *args, **kwargs: load_state_dict(module_load_state_dict, *args, **kwargs))
  File "E:\SD\stable-diffusion-webui\modules\sd_disable_initialization.py", line 221, in load_state_dict
    original(module, state_dict, strict=strict)
  File "E:\SD\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2027, in load_state_dict
    load(self, state_dict)
  File "E:\SD\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2015, in load
    load(child, child_state_dict, child_prefix)
  File "E:\SD\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2015, in load
    load(child, child_state_dict, child_prefix)
  File "E:\SD\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2015, in load
    load(child, child_state_dict, child_prefix)
  [Previous line repeated 4 more times]
  File "E:\SD\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2009, in load
    module._load_from_state_dict(
  File "E:\SD\stable-diffusion-webui\modules\sd_disable_initialization.py", line 226, in <lambda>
    conv2d_load_from_state_dict = self.replace(torch.nn.Conv2d, '_load_from_state_dict', lambda *args, **kwargs: load_from_state_dict(conv2d_load_from_state_dict, *args, **kwargs))
  File "E:\SD\stable-diffusion-webui\modules\sd_disable_initialization.py", line 191, in load_from_state_dict
    module._parameters[name] = torch.nn.parameter.Parameter(torch.zeros_like(param, device=device, dtype=dtype), requires_grad=param.requires_grad)
  File "E:\SD\stable-diffusion-webui\venv\lib\site-packages\torch\_meta_registrations.py", line 1780, in zeros_like
    return aten.empty_like.default(
  File "E:\SD\stable-diffusion-webui\venv\lib\site-packages\torch\_ops.py", line 287, in __call__
    return self._op(*args, **kwargs or {})
  File "E:\SD\stable-diffusion-webui\venv\lib\site-packages\torch\_refs\__init__.py", line 4254, in empty_like
    return torch.empty_strided(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 3.43 GiB already allocated; 0 bytes free; 3.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

This error message indicates that there is not enough memory available on the GPU to perform the operation that triggered the error. The GPU has a total capacity of 4.00 GiB, but only 3.43 GiB is currently allocated for other tasks. The system attempted to allocate 20.00 MiB of memory, which exceeds the remaining free space on the GPU.

The solution：

set COMMANDLINE_ARGS=--precision full --no-half --lowvram --always-batch-cond-uncond --xformers
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512

set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512

This command sets the configuration parameters for garbage collection threshold and max split size for PyTorch on the GPU. The garbage collection threshold controls the percentage of memory that PyTorch will attempt to free up before moving objects to the CPU, while the max split size limits the size of shared memory allocations on the GPU.

The value "garbage_collection_threshold:0.9" sets the garbage collection threshold to 90%, which means that PyTorch will attempt to free up 90% of the memory it no longer needs before moving objects to the CPU. This can help reduce the amount of memory required by PyTorch and improve performance.

The value "max_split_size_mb:512" sets the maximum size of shared memory allocations on the GPU to 512 MB. This can help prevent fragmentation of the GPU memory and improve performance. However, if you find that this value is too small for your application, you may need to increase it.

set COMMANDLINE_ARGS=--precision full --no-half --lowvram --always-batch-cond-uncond --xformers

This command sets various command-line arguments for PyTorch on the GPU. Here are some of the arguments and their meanings:

--precision full: This argument specifies that all computations should be performed with full precision, which can provide better numerical stability but requires more memory.
--no-half: This argument disables the use of half-precision floating-point numbers in calculations, which can also improve numerical stability but requires more memory.
--lowvram: This argument tells PyTorch to reduce the amount of video memory available to it, which can help prevent out-of-memory errors when running certain operations.
--always-batch-cond-uncond: This argument tells PyTorch to always perform batch normalization operations conditionally, even if the input data is already normalized. This can help improve training performance by avoiding redundant computations.
--xformers: This argument specifies that the model should use Xformer modules, which are a type of transformer architecture specifically designed for NLP tasks.