CUDA 错误类型

 CUDA error types

Enumerator:
    cudaSuccess     The API call returned with no errors. In the case of query calls, this can also mean that the operation being queried is complete (see cudaEventQuery() and cudaStreamQuery()).
    cudaErrorMissingConfiguration     The device function being invoked (usually via cudaLaunch()) was not previously configured via the cudaConfigureCall() function.
    cudaErrorMemoryAllocation     The API call failed because it was unable to allocate enough memory to perform the requested operation.
    cudaErrorInitializationError     The API call failed because the CUDA driver and runtime could not be initialized.
    cudaErrorLaunchFailure     An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory. The device cannot be used until cudaThreadExit() is called. All existing device memory allocations are invalid and must be reconstructed if the program is to continue using CUDA.
    cudaErrorPriorLaunchFailure     This indicated that a previous kernel launch failed. This was previously used for device emulation of kernel launches.

    Deprecated:
        This error return is deprecated as of CUDA 3.1. Device emulation mode was removed with the CUDA 3.1 release.

    cudaErrorLaunchTimeout     This indicates that the device kernel took too long to execute. This can only occur if timeouts are enabled - see the device property kernelExecTimeoutEnabled for more information. The device cannot be used until cudaThreadExit() is called. All existing device memory allocations are invalid and must be reconstructed if the program is to continue using CUDA.
    cudaErrorLaunchOutOfResources     This indicates that a launch did not occur because it did not have appropriate resources. Although this error is similar to cudaErrorInvalidConfiguration, this error usually indicates that the user has attempted to pass too many arguments to the device kernel, or the kernel launch specifies too many threads for the kernel's register count.
    cudaErrorInvalidDeviceFunction     The requested device function does not exist or is not compiled for the proper device architecture.
    cudaErrorInvalidConfiguration     This indicates that a kernel launch is requesting resources that can never be satisfied by the current device. Requesting more shared memory per block than the device supports will trigger this error, as will requesting too many threads or blocks. See cudaDeviceProp for more device limitations.
    cudaErrorInvalidDevice     This indicates that the device ordinal supplied by the user does not correspond to a valid CUDA device.
    cudaErrorInvalidValue     This indicates that one or more of the parameters passed to the API call is not within an acceptable range of values.
    cudaErrorInvalidPitchValue     This indicates that one or more of the pitch-related parameters passed to the API call is not within the acceptable range for pitch.
    cudaErrorInvalidSymbol     This indicates that the symbol name/identifier passed to the API call is not a valid name or identifier.
    cudaErrorMapBufferObjectFailed     This indicates that the buffer object could not be mapped.
    cudaErrorUnmapBufferObjectFailed     This indicates that the buffer object could not be unmapped.
    cudaErrorInvalidHostPointer     This indicates that at least one host pointer passed to the API call is not a valid host pointer.
    cudaErrorInvalidDevicePointer     This indicates that at least one device pointer passed to the API call is not a valid device pointer.
    cudaErrorInvalidTexture     This indicates that the texture passed to the API call is not a valid texture.
    cudaErrorInvalidTextureBinding     This indicates that the texture binding is not valid. This occurs if you call cudaGetTextureAlignmentOffset() with an unbound texture.
    cudaErrorInvalidChannelDescriptor     This indicates that the channel descriptor passed to the API call is not valid. This occurs if the format is not one of the formats specified by cudaChannelFormatKind, or if one of the dimensions is invalid.
    cudaErrorInvalidMemcpyDirection     This indicates that the direction of the memcpy passed to the API call is not one of the types specified by cudaMemcpyKind.
    cudaErrorAddressOfConstant     This indicated that the user has taken the address of a constant variable, which was forbidden up until the CUDA 3.1 release.

    Deprecated:
        This error return is deprecated as of CUDA 3.1. Variables in constant memory may now have their address taken by the runtime via cudaGetSymbolAddress().

    cudaErrorTextureFetchFailed     This indicated that a texture fetch was not able to be performed. This was previously used for device emulation of texture operations.

    Deprecated:
        This error return is deprecated as of CUDA 3.1. Device emulation mode was removed with the CUDA 3.1 release.

    cudaErrorTextureNotBound     This indicated that a texture was not bound for access. This was previously used for device emulation of texture operations.

    Deprecated:
        This error return is deprecated as of CUDA 3.1. Device emulation mode was removed with the CUDA 3.1 release.

    cudaErrorSynchronizationError     This indicated that a synchronization operation had failed. This was previously used for some device emulation functions.

    Deprecated:
        This error return is deprecated as of CUDA 3.1. Device emulation mode was removed with the CUDA 3.1 release.

    cudaErrorInvalidFilterSetting     This indicates that a non-float texture was being accessed with linear filtering. This is not supported by CUDA.
    cudaErrorInvalidNormSetting     This indicates that an attempt was made to read a non-float texture as a normalized float. This is not supported by CUDA.
    cudaErrorMixedDeviceExecution     Mixing of device and device emulation code was not allowed.

    Deprecated:
        This error return is deprecated as of CUDA 3.1. Device emulation mode was removed with the CUDA 3.1 release.

    cudaErrorCudartUnloading     This indicates that a CUDA Runtime API call cannot be executed because it is being called during process shut down, at a point in time after CUDA driver has been unloaded.
    cudaErrorUnknown     This indicates that an unknown internal error has occurred.
    cudaErrorNotYetImplemented     This indicates that the API call is not yet implemented. Production releases of CUDA will never return this error.

    Deprecated:
        This error return is deprecated as of CUDA 4.1.

    cudaErrorMemoryValueTooLarge     This indicated that an emulated device pointer exceeded the 32-bit address range.

    Deprecated:
        This error return is deprecated as of CUDA 3.1. Device emulation mode was removed with the CUDA 3.1 release.

    cudaErrorInvalidResourceHandle     This indicates that a resource handle passed to the API call was not valid. Resource handles are opaque types like cudaStream_t and cudaEvent_t.
    cudaErrorNotReady     This indicates that asynchronous operations issued previously have not completed yet. This result is not actually an error, but must be indicated differently than cudaSuccess (which indicates completion). Calls that may return this value include cudaEventQuery() and cudaStreamQuery().
    cudaErrorInsufficientDriver     This indicates that the installed NVIDIA CUDA driver is older than the CUDA runtime library. This is not a supported configuration. Users should install an updated NVIDIA display driver to allow the application to run.
    cudaErrorSetOnActiveProcess     This indicates that the user has called cudaSetValidDevices(), cudaSetDeviceFlags(), cudaD3D9SetDirect3DDevice(), cudaD3D10SetDirect3DDevice, cudaD3D11SetDirect3DDevice(), or cudaVDPAUSetVDPAUDevice() after initializing the CUDA runtime by calling non-device management operations (allocating memory and launching kernels are examples of non-device management operations). This error can also be returned if using runtime/driver interoperability and there is an existing CUcontext active on the host thread.
    cudaErrorInvalidSurface     This indicates that the surface passed to the API call is not a valid surface.
    cudaErrorNoDevice     This indicates that no CUDA-capable devices were detected by the installed CUDA driver.
    cudaErrorECCUncorrectable     This indicates that an uncorrectable ECC error was detected during execution.
    cudaErrorSharedObjectSymbolNotFound     This indicates that a link to a shared object failed to resolve.
    cudaErrorSharedObjectInitFailed     This indicates that initialization of a shared object failed.
    cudaErrorUnsupportedLimit     This indicates that the cudaLimit passed to the API call is not supported by the active device.
    cudaErrorDuplicateVariableName     This indicates that multiple global or constant variables (across separate CUDA source files in the application) share the same string name.
    cudaErrorDuplicateTextureName     This indicates that multiple textures (across separate CUDA source files in the application) share the same string name.
    cudaErrorDuplicateSurfaceName     This indicates that multiple surfaces (across separate CUDA source files in the application) share the same string name.
    cudaErrorDevicesUnavailable     This indicates that all CUDA devices are busy or unavailable at the current time. Devices are often busy/unavailable due to use of cudaComputeModeExclusive, cudaComputeModeProhibited or when long running CUDA kernels have filled up the GPU and are blocking new work from starting. They can also be unavailable due to memory constraints on a device that already has active CUDA work being performed.
    cudaErrorInvalidKernelImage     This indicates that the device kernel image is invalid.
    cudaErrorNoKernelImageForDevice     This indicates that there is no kernel image available that is suitable for the device. This can occur when a user specifies code generation options for a particular CUDA source file that do not include the corresponding device configuration.
    cudaErrorIncompatibleDriverContext     This indicates that the current context is not compatible with this the CUDA Runtime. This can only occur if you are using CUDA Runtime/Driver interoperability and have created an existing Driver context using the driver API. The Driver context may be incompatible either because the Driver context was created using an older version of the API, because the Runtime API call expects a primary driver context and the Driver context is not primary, or because the Driver context has been destroyed. Please see Interactions with the CUDA Driver API" for more information.
    cudaErrorPeerAccessAlreadyEnabled     This error indicates that a call to cudaDeviceEnablePeerAccess() is trying to re-enable peer addressing on from a context which has already had peer addressing enabled.
    cudaErrorPeerAccessNotEnabled     This error indicates that cudaDeviceDisablePeerAccess() is trying to disable peer addressing which has not been enabled yet via cudaDeviceEnablePeerAccess().
    cudaErrorDeviceAlreadyInUse     This indicates that a call tried to access an exclusive-thread device that is already in use by a different thread.
    cudaErrorProfilerDisabled     This indicates profiler has been disabled for this run and thus runtime APIs cannot be used to profile subsets of the program. This can happen when the application is running with external profiling tools like visual profiler.
    cudaErrorProfilerNotInitialized     This indicates profiler has not been initialized yet. cudaProfilerInitialize() must be called before calling cudaProfilerStart and cudaProfilerStop to initialize profiler.
    cudaErrorProfilerAlreadyStarted     This indicates profiler is already started. This error can be returned if cudaProfilerStart() is called multiple times without subsequent call to cudaProfilerStop().
    cudaErrorProfilerAlreadyStopped     This indicates profiler is already stopped. This error can be returned if cudaProfilerStop() is called without starting profiler using cudaProfilerStart().
    cudaErrorAssert     An assert triggered in device code during kernel execution. The device cannot be used again until cudaThreadExit() is called. All existing allocations are invalid and must be reconstructed if the program is to continue using CUDA.
    cudaErrorTooManyPeers     This error indicates that the hardware resources required to enable peer access have been exhausted for one or more of the devices passed to cudaEnablePeerAccess().
    cudaErrorHostMemoryAlreadyRegistered     This error indicates that the memory range passed to cudaHostRegister() has already been registered.
    cudaErrorHostMemoryNotRegistered     This error indicates that the pointer passed to cudaHostUnregister() does not correspond to any currently registered memory region.
    cudaErrorOperatingSystem     This error indicates that an OS call failed.
    cudaErrorStartupFailure     This indicates an internal startup failure in the CUDA runtime.
    cudaErrorApiFailureBase     Any unhandled CUDA driver error is added to this value and returned via the runtime. Production releases of CUDA should not return such errors.

    Deprecated:

        This error return is deprecated as of CUDA 4.1. 

more for 

### ONNX Runtime CUDA 错误解决方案 当遇到 `RuntimeError: CUDA error: no kernel image is available for execution on the device` 这类错误时,通常意味着运行环境中的CUDA配置存在问题。此类问题可能由多种因素引起,包括但不限于CUDA版本不匹配、PyTorch安装不当以及数据类型设置不合适。 #### 验证CUDA兼容性和驱动程序版本 确保使用的CUDA工具包版本与GPU驱动相匹配非常重要。不同版本之间可能存在兼容性差异,这可能导致内核无法正常加载或执行。建议检查当前系统的CUDA版本和显卡驱动是否是最新的,并确认它们之间的兼容性[^1]。 #### 安装合适的PyTorch版本 如果正在使用PyTorch框架,则应考虑重新安装特定版本的PyTorch来解决问题。例如,通过Conda命令可以指定安装带有CUDA支持的PyTorch版本: ```bash conda install pytorch=0.4.1 cuda92 -c pytorch ``` 此操作有助于排除因库文件损坏或其他原因造成的潜在冲突[^4]。 #### 调整损失函数的选择 某些情况下,选择不适合的任务类型的损失函数也会引发类似的异常情况。比如,在二分类任务中应该采用`nn.BCELoss()`而不是多类别交叉熵损失`nn.CrossEntropyLoss()`。因此,适当调整模型训练过程中的参数设定可能是必要的[^3]。 #### 数据类型一致性校验 另一个常见的问题是由于PyTorch默认的数据类型(dtype)与期望的不同而导致的操作失败。特别是浮点数精度方面,默认情况下PyTorch倾向于使用单精度(`float32`)而非双精度(`float64`)。可以通过查阅官方文档了解如何正确设置张量的数据类型以满足具体应用场景的需求[^5]。 #### 启用同步模式辅助调试 为了更精确地定位实际发生错误的位置,可以在启动脚本前设置环境变量`CUDA_LAUNCH_BLOCKING=1`。这样做可以让CUDA调用变为阻塞式的,从而帮助开发者更容易找到真正触发异常的地方。 综上所述,针对ONNX Runtime下的CUDA错误,可以从以上几个角度出发逐步排查并尝试修复问题所在。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值