Xid是NVIDIA GPU的错误码标准,用于GPU相关问题定位,而SXid是NVSwitch的错误码,用于NVSwitch相关问题定位。
NVIDIA GPU 诊断指南
以下是NVIDIA GPU报错时的检验流程,图示中有3条流程,因此可以将GPU错误分成3类。
第一类是通过Xid报错的错误,这类错误大多和硬件相关;第二类是通过系统监控报错,一般是GPU功耗,温度或网络问题;第三类系驱动运行时的报错日志,一般是驱动,运行时或者GPU应用等软件相关问题。
本文关注第一类问题,即GPU硬件相关问题,这些硬件相关报错系GPU设计时便已考虑的,反过来可以了解GPU可能会遇到哪些硬件错误。
Xid
Xid Message是NVIDIA驱动打印出来的日志,这类日志一般意味着GPU出现了硬件相关的错误,错误原因可能是驱动没有正确配置GPU,也可能是硬件发生了故障,也可能是运行于GPU上的程序导致。Xid信息提供硬件的错误状态,为故障诊断提供了方向。
Xid格式
以下是Xid的格式。其中NVRM表示这是NVIDIA KMD的日志。
NVRM: GPU at PCI:<gpu_pci_bdf>: <gpu_uuid>
NVRM: GPU Board Serial Number: <gpu_serial_number>
NVRM: Xid (PCI:<gpu_pci_bdf>): <Xid_Value>, <raw error information>
The following is an example of a Xid error log
[...] NVRM: GPU at PCI:0000:34:00: GPU-c43f0536-e751-7211-d7a7-
78c95249ee7d
[...] NVRM: GPU Board Serial Number: 0323618040756
[...] NVRM: Xid (PCI:0000:34:00): 45, Ch 00000010
Xid错误源
根据Xid文档的表格,Xid的错误源可以分为HW Error,Driver Error,User App Error,System Memory Corruption,Bus Error,Thermal Issue,FB Corruption这七大类。
Driver Error,User App Error系软件没有正确编程GPU导致的错误;System Memory Corrutpion是主机侧内存意外写引发的错误;Bus Error是PCIe总线发生了错误;Thermal Issue是设备温度异常导致的错误;FB Corruption是设备侧FB意外写引发的错误。其中HW Error又可以再细分为具体的硬件模块错误,如Display模块,PBDMA模块,Copy Engine模块,NVLink端口模块,DRAM模块等等。
Xid表格
Category | Error Code | ID |
---|---|---|
Graphic | ROBUST_CHANNEL_GR_EXCEPTION | 13 |
FAKE | ROBUST_CHANNEL_FAKE_ERROR | 14 |
VBANK | ROBUST_CHANNEL_VBLANK_CALLBACK_TIMEOUT | 16 |
Display | ROBUST_CHANNEL_DISP_MISSED_NOTIFIER | 19 |
Mpeg | ROBUST_CHANNEL_MPEG_ERROR_SW_METHOD | 20 |
Motion Estimation | ROBUST_CHANNEL_ME_ERROR_SW_METHOD | 21 |
Video Process | ROBUST_CHANNEL_VP_ERROR_SW_METHOD | 22 |
RC | ROBUST_CHANNEL_RC_LOGGING_ENABLED | 23 |
Video Process | ROBUST_CHANNEL_VP_ERROR | 27 |
Video Process | ROBUST_CHANNEL_VP2_ERROR | 28 |
BSP | ROBUST_CHANNEL_BSP_ERROR | 29 |
Reserved | ROBUST_CHANNEL_UNUSED_ERROR_30 | 30 |
MMU | ROBUST_CHANNEL_FIFO_ERROR_MMU_ERR_FLT | 31 |
PBDMA | ROBUST_CHANNEL_PBDMA_ERROR | 32 |
Security | ROBUST_CHANNEL_SEC_ERROR | 33 |
MSVLD | ROBUST_CHANNEL_MSVLD_ERROR | 34 |
MSPDEC | ROBUST_CHANNEL_MSPDEC_ERROR | 35 |
MSPPP | ROBUST_CHANNEL_MSPPP_ERROR | 36 |
Copy Engine | ROBUST_CHANNEL_CE0_ERROR | 39 |
Copy Engine | ROBUST_CHANNEL_CE1_ERROR | 40 |
Copy Engine | ROBUST_CHANNEL_CE2_ERROR | 41 |
VIC | ROBUST_CHANNEL_VIC_ERROR | 42 |
RESET | ROBUST_CHANNEL_RESETCHANNEL_VERIF_ERROR | 43 |
Graphic | ROBUST_CHANNEL_GR_FAULT_DURING_CTXSW | 44 |
PREEMPTIVE | ROBUST_CHANNEL_PREEMPTIVE_REMOVAL | 45 |
Video Encode | ROBUST_CHANNEL_NVENC0_ERROR | 47 |
Memory | ROBUST_CHANNEL_GPU_ECC_DBE | 48 |
Memory | FB_MEMORY_ERROR | 58 |
PMU | PMU_ERROR | 59 |
Security | ROBUST_CHANNEL_SEC2_ERROR | 60 |
PMU | PMU_BREAKPOINT | 61 |
PMU | PMU_HALT_ERROR | 62 |
Memory | INFOROM_PAGE_RETIREMENT_EVENT | 63 |
Memory | INFOROM_DRAM_RETIREMENT_EVENT | INFOROM_PAGE_RETIREMENT_EVENT |
Memory | INFOROM_PAGE_RETIREMENT_FAILURE | 64 |
Memory | INFOROM_DRAM_RETIREMENT_FAILURE | INFOROM_PAGE_RETIREMENT_FAILURE |
Video Encode | ROBUST_CHANNEL_NVENC1_ERROR | 65 |
Video Encode | ROBUST_CHANNEL_NVDEC0_ERROR | 68 |
Graphic | ROBUST_CHANNEL_GR_CLASS_ERROR | 69 |
Copy Engine | ROBUST_CHANNEL_CE3_ERROR | 70 |
Copy Engine | ROBUST_CHANNEL_CE4_ERROR | 71 |
Copy Engine | ROBUST_CHANNEL_CE5_ERROR | 72 |
Video Encode | ROBUST_CHANNEL_NVENC2_ERROR | 73 |
NVLink | NVLINK_ERROR | 74 |
Copy Engine | ROBUST_CHANNEL_CE6_ERROR | 75 |
Copy Engine | ROBUST_CHANNEL_CE7_ERROR | 76 |
Copy Engine | ROBUST_CHANNEL_CE8_ERROR | 77 |
Virtualization | VGPU_START_ERROR | 78 |
PCIe | ROBUST_CHANNEL_GPU_HAS_FALLEN_OFF_THE_BUS | 79 |
PBDMA | PBDMA_PUSHBUFFER_CRC_MISMATCH | 80 |
Display | ROBUST_CHANNEL_VGA_SUBSYSTEM_ERROR | 81 |
Jpeg | ROBUST_CHANNEL_NVJPG0_ERROR | 82 |
Video Decode | ROBUST_CHANNEL_NVDEC1_ERROR | 83 |
Video Decode | ROBUST_CHANNEL_NVDEC2_ERROR | 84 |
Copy Engine | ROBUST_CHANNEL_CE9_ERROR | 85 |
OFA | ROBUST_CHANNEL_OFA0_ERROR | 86 |
DRIVER | NVTELEMETRY_DRIVER_REPORT | 87 |
Video Decode | ROBUST_CHANNEL_NVDEC3_ERROR | 88 |
Video Decode | ROBUST_CHANNEL_NVDEC4_ERROR | 89 |
LTC | LTC_ERROR | 90 |
Reserved | RESERVED_XID | 91 |
SBE | EXCESSIVE_SBE_INTERRUPTS | 92 |
Timeout | INFOROM_ERASE_LIMIT_EXCEEDED | 93 |
Contained | ROBUST_CHANNEL_CONTAINED_ERROR | 94 |
Uncontained | ROBUST_CHANNEL_UNCONTAINED_ERROR | 95 |
Video Decode | ROBUST_CHANNEL_NVDEC5_ERROR | 96 |
Video Decode | ROBUST_CHANNEL_NVDEC6_ERROR | 97 |
Video Decode | ROBUST_CHANNEL_NVDEC7_ERROR | 98 |
Jpeg | ROBUST_CHANNEL_NVJPG1_ERROR | 99 |
Jpeg | ROBUST_CHANNEL_NVJPG2_ERROR | 100 |
Jpeg | ROBUST_CHANNEL_NVJPG3_ERROR | 101 |
Jpeg | ROBUST_CHANNEL_NVJPG4_ERROR | 102 |
Jpeg | ROBUST_CHANNEL_NVJPG5_ERROR | 103 |
Jpeg | ROBUST_CHANNEL_NVJPG6_ERROR | 104 |
Jpeg | ROBUST_CHANNEL_NVJPG7_ERROR | 105 |
MMU | DESTINATION_FLA_TRANSLATION_ERROR | 108 |
Security | SEC_FAULT_ERROR | 110 |
Timeout | GSP_RPC_TIMEOUT | 119 |
GSP | GSP_ERROR | 120 |
C2C | C2C_ERROR | 121 |
PMU | SPI_PMU_RPC_READ_FAIL | 122 |
PMU | SPI_PMU_RPC_WRITE_FAIL | 123 |
PMU | SPI_PMU_RPC_ERASE_FAIL | 124 |
FS | INFOROM_FS_ERROR | 125 |
Copy Engine | ROBUST_CHANNEL_CE10_ERROR | 126 |
Copy Engine | ROBUST_CHANNEL_CE11_ERROR | 127 |
Copy Engine | ROBUST_CHANNEL_CE12_ERROR | 128 |
Copy Engine | ROBUST_CHANNEL_CE13_ERROR | 129 |
Copy Engine | ROBUST_CHANNEL_CE14_ERROR | 130 |
Copy Engine | ROBUST_CHANNEL_CE15_ERROR | 131 |
Copy Engine | ROBUST_CHANNEL_CE16_ERROR | 132 |
Copy Engine | ROBUST_CHANNEL_CE17_ERROR | 133 |
Copy Engine | ROBUST_CHANNEL_CE18_ERROR | 134 |
Copy Engine | ROBUST_CHANNEL_CE19_ERROR | 135 |
ALI | ALI_TRAINING_FAIL | 136 |
NVLink | NVLINK_FLA_PRIV_ERR | 137 |
DLA | ROBUST_CHANNEL_DLA_ERROR | 138 |
OFA | ROBUST_CHANNEL_OFA1_ERROR | 139 |
Memory | UNRECOVERABLE_ECC_ERROR_ESCAPE | 140 |
Fast Path | ROBUST_CHANNEL_FAST_PATH_ERROR | 141 |
GPU | GPU_INIT_ERROR | 143 |
NVLink | NVLINK_SAW_ERROR | 144 |
NVLink | NVLINK_RLW_ERROR | 145 |
NVLink | NVLINK_TLW_ERROR | 146 |
NVLink | NVLINK_TREX_ERROR | 147 |
NVLink | NVLINK_NVLPW_CTRL_ERROR | 148 |
NVLink | NVLINK_NETIR_ERROR | 149 |
NVLink | NVLINK_MSE_ERROR | 150 |
Security | ROBUST_CHANNEL_KEY_ROTATION_ERROR | 151 |
Reserved | RESERVED7_ERROR | 152 |
Reserved | RESERVED8_ERROR | 153 |
DRIVER | ROBUST_CHANNEL_LAST_ERROR | 153 |
总结以上表格,可知GPU硬件相关错误的统计思路是在每个硬件模块汇报自己的错误状态,基本规律如下。
SXid
SXid Message和Xid Message相似,只不过它是用于指示NVSwitch相关的错误,是Switch Xid的简写。
SXid格式
nvidia-nvswitchX: SXid (PCI:<switch_pci_bdf>): <SXid_Value>, <Fatal
or Non-Fatal>, <Link No> < Error Description>
<raw error information for additional troubleshooting>
The following is an example of a SXid error log
[...] nvidia-nvswitch3: SXid (PCI:0000:c1:00.0): 28006, Non-fatal, Link
46 MC TS crumbstore MCTO (First)
[...] nvidia-nvswitch3: SXid (PCI:0000:c1:00.0): 28006, Severity 0
Engine instance 46 Sub-engine instance 00
[...] nvidia-nvswitch3: SXid (PCI:0000:c1:00.0): 28006, Data
{0x00140004, 0x00100000, 0x00140004, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000}
SXid分类
SXid可以根据严重程度分类,大体上可以分为Non-Fatal和Fatal的错误。少数错误如心跳超时,ECC SBE,发生了重传等为Non-Fatal错误,其它大部分错误如错误的command,crossbar overflow,ECC DBE,buffer overflow or underflow等属于NVSwitch内部的错误为Fatal错误。
typedef enum nvswitch_err_type
{
NVSWITCH_ERR_NO_ERROR = 0x0,
/*
* These error enumerations are derived from the error bits defined in each
* hardware manual.
*
* NVSwitch errors values should start from 10000 (decimal) to be
* distinguishable from GPU errors.
*/
/* HOST */
NVSWITCH_ERR_HW_HOST = 10000,
NVSWITCH_ERR_HW_HOST_PRIV_ERROR = 10001,
NVSWITCH_ERR_HW_HOST_PRIV_TIMEOUT = 10002,
NVSWITCH_ERR_HW_HOST_UNHANDLED_INTERRUPT = 10003,
NVSWITCH_ERR_HW_HOST_THERMAL_EVENT_START = 10004,
NVSWITCH_ERR_HW_HOST_THERMAL_EVENT_END = 10005,
NVSWITCH_ERR_HW_HOST_THERMAL_SHUTDOWN = 10006,
NVSWITCH_ERR_HW_HOST_IO_FAILURE = 10007,
NVSWITCH_ERR_HW_HOST_FIRMWARE_INITIALIZATION_FAILURE = 10008,
NVSWITCH_ERR_HW_HOST_LAST,
/* NPORT: Ingress errors */
NVSWITCH_ERR_HW_NPORT_INGRESS = 11000,
NVSWITCH_ERR_HW_NPORT_INGRESS_CMDDECODEERR = 11001,
NVSWITCH_ERR_HW_NPORT_INGRESS_BDFMISMATCHERR = 11002,
NVSWITCH_ERR_HW_NPORT_INGRESS_BUBBLEDETECT = 11003,
NVSWITCH_ERR_HW_NPORT_INGRESS_ACLFAIL = 11004,
NVSWITCH_ERR_HW_NPORT_INGRESS_PKTPOISONSET = 11005,
NVSWITCH_ERR_HW_NPORT_INGRESS_ECCSOFTLIMITERR = 11006,
NVSWITCH_ERR_HW_NPORT_INGRESS_ECCHDRDOUBLEBITERR = 11007,
NVSWITCH_ERR_HW_NPORT_INGRESS_INVALIDCMD = 11008,
NVSWITCH_ERR_HW_NPORT_INGRESS_INVALIDVCSET = 11009,
NVSWITCH_ERR_HW_NPORT_INGRESS_ERRORINFO = 11010,
NVSWITCH_ERR_HW_NPORT_INGRESS_REQCONTEXTMISMATCHERR = 11011,
NVSWITCH_ERR_HW_NPORT_INGRESS_NCISOC_HDR_ECC_LIMIT_ERR = 11012,
NVSWITCH_ERR_HW_NPORT_INGRESS_NCISOC_HDR_ECC_DBE_ERR = 11013,
NVSWITCH_ERR_HW_NPORT_INGRESS_ADDRBOUNDSERR = 11014,
NVSWITCH_ERR_HW_NPORT_INGRESS_RIDTABCFGERR = 11015,
NVSWITCH_ERR_HW_NPORT_INGRESS_RLANTABCFGERR = 11016,
NVSWITCH_ERR_HW_NPORT_INGRESS_REMAPTAB_ECC_DBE_ERR = 11017,
NVSWITCH_ERR_HW_NPORT_INGRESS_RIDTAB_ECC_DBE_ERR = 11018,
NVSWITCH_ERR_HW_NPORT_INGRESS_RLANTAB_ECC_DBE_ERR = 11019,
NVSWITCH_ERR_HW_NPORT_INGRESS_NCISOC_PARITY_ERR = 11020,
NVSWITCH_ERR_HW_NPORT_INGRESS_REMAPTAB_ECC_LIMIT_ERR = 11021,
NVSWITCH_ERR_HW_NPORT_INGRESS_RIDTAB_ECC_LIMIT_ERR = 11022,
NVSWITCH_ERR_HW_NPORT_INGRESS_RLANTAB_ECC_LIMIT_ERR = 11023,
NVSWITCH_ERR_HW_NPORT_INGRESS_ADDRTYPEERR = 11024,
NVSWITCH_ERR_HW_NPORT_INGRESS_EXTAREMAPTAB_INDEX_ERR = 11025,
NVSWITCH_ERR_HW_NPORT_INGRESS_EXTBREMAPTAB_INDEX_ERR = 11026,
NVSWITCH_ERR_HW_NPORT_INGRESS_MCREMAPTAB_INDEX_ERR = 11027,
NVSWITCH_ERR_HW_NPORT_INGRESS_EXTAREMAPTAB_ECC_DBE_ERR = 11028,
NVSWITCH_ERR_HW_NPORT_INGRESS_EXTBREMAPTAB_ECC_DBE_ERR = 11029,
NVSWITCH_ERR_HW_NPORT_INGRESS_MCREMAPTAB_ECC_DBE_ERR = 11030,
NVSWITCH_ERR_HW_NPORT_INGRESS_EXTAREMAPTAB_REQCONTEXTMISMATCHERR = 11031,
NVSWITCH_ERR_HW_NPORT_INGRESS_EXTBREMAPTAB_REQCONTEXTMISMATCHERR = 11032,
NVSWITCH_ERR_HW_NPORT_INGRESS_MCREMAPTAB_REQCONTEXTMISMATCHERR = 11033,
NVSWITCH_ERR_HW_NPORT_INGRESS_EXTAREMAPTAB_ACLFAIL = 11034,
NVSWITCH_ERR_HW_NPORT_INGRESS_EXTBREMAPTAB_ACLFAIL = 11035,
NVSWITCH_ERR_HW_NPORT_INGRESS_MCREMAPTAB_ACLFAIL = 11036,
NVSWITCH_ERR_HW_NPORT_INGRESS_EXTAREMAPTAB_ADDRBOUNDSERR = 11037,
NVSWITCH_ERR_HW_NPORT_INGRESS_EXTBREMAPTAB_ADDRBOUNDSERR = 11038,
NVSWITCH_ERR_HW_NPORT_INGRESS_MCREMAPTAB_ADDRBOUNDSERR = 11039,
NVSWITCH_ERR_HW_NPORT_INGRESS_EXTAREMAPTAB_ECC_LIMIT_ERR = 11040,
NVSWITCH_ERR_HW_NPORT_INGRESS_EXTBREMAPTAB_ECC_LIMIT_ERR = 11041,
NVSWITCH_ERR_HW_NPORT_INGRESS_MCREMAPTAB_ECC_LIMIT_ERR = 11042,
NVSWITCH_ERR_HW_NPORT_INGRESS_MCCMDTOUCADDRERR = 11043,
NVSWITCH_ERR_HW_NPORT_INGRESS_READMCREFLECTMEMERR = 11044,
NVSWITCH_ERR_HW_NPORT_INGRESS_EXTAREMAPTAB_ADDRTYPEERR = 11045,
NVSWITCH_ERR_HW_NPORT_INGRESS_EXTBREMAPTAB_ADDRTYPEERR = 11046,
NVSWITCH_ERR_HW_NPORT_INGRESS_MCREMAPTAB_ADDRTYPEERR = 11047,
NVSWITCH_ERR_HW_NPORT_INGRESS_LAST, /* NOTE: Must be last */
/* NPORT: Egress errors */
NVSWITCH_ERR_HW_NPORT_EGRESS = 12000,
NVSWITCH_ERR_HW_NPORT_EGRESS_EGRESSBUFERR = 12001,
NVSWITCH_ERR_HW_NPORT_EGRESS_PKTROUTEERR = 12002,
NVSWITCH_ERR_HW_NPORT_EGRESS_ECCSINGLEBITLIMITERR0 = 12003,
NVSWITCH_ERR_HW_NPORT_EGRESS_ECCHDRDOUBLEBITERR0 = 12004,
NVSWITCH_ERR_HW_NPORT_EGRESS_ECCDATADOUBLEBITERR0 = 12005,
NVSWITCH_ERR_HW_NPORT_EGRESS_ECCSINGLEBITLIMITERR1 = 12006,
NVSWITCH_ERR_HW_NPORT_EGRESS_ECCHDRDOUBLEBITERR1 = 12007,
NVSWITCH_ERR_HW_NPORT_EGRESS_ECCDATADOUBLEBITERR1 = 12008,
NVSWITCH_ERR_HW_NPORT_EGRESS_NCISOCHDRCREDITOVFL = 12009,
NVSWITCH_ERR_HW_NPORT_EGRESS_NCISOCDATACREDITOVFL = 12010,
NVSWITCH_ERR_HW_NPORT_EGRESS_ADDRMATCHERR = 12011,
NVSWITCH_ERR_HW_NPORT_EGRESS_TAGCOUNTERR = 12012,
NVSWITCH_ERR_HW_NPORT_EGRESS_FLUSHRSPERR = 12013,
NVSWITCH_ERR_HW_NPORT_EGRESS_DROPNPURRSPERR = 12014,
NVSWITCH_ERR_HW_NPORT_EGRESS_POISONERR = 12015,
NVSWITCH_ERR_HW_NPORT_EGRESS_PACKET_HEADER = 12016,
NVSWITCH_ERR_HW_NPORT_EGRESS_BUFFER_DATA = 12017,
NVSWITCH_ERR_HW_NPORT_EGRESS_NCISOC_CREDITS = 12018,
NVSWITCH_ERR_HW_NPORT_EGRESS_TAG_DATA = 12019,
NVSWITCH_ERR_HW_NPORT_EGRESS_SEQIDERR = 12020,
NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_HDR_ECC_LIMIT_ERR = 12021,
NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_HDR_ECC_DBE_ERR = 12022,
NVSWITCH_ERR_HW_NPORT_EGRESS_RAM_OUT_HDR_ECC_LIMIT_ERR = 12023,
NVSWITCH_ERR_HW_NPORT_EGRESS_RAM_OUT_HDR_ECC_DBE_ERR = 12024,
NVSWITCH_ERR_HW_NPORT_EGRESS_NCISOCCREDITOVFL = 12025,
NVSWITCH_ERR_HW_NPORT_EGRESS_REQTGTIDMISMATCHERR = 12026,
NVSWITCH_ERR_HW_NPORT_EGRESS_RSPREQIDMISMATCHERR = 12027,
NVSWITCH_ERR_HW_NPORT_EGRESS_PRIVRSPERR = 12028,
NVSWITCH_ERR_HW_NPORT_EGRESS_HWRSPERR = 12029,
NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_HDR_PARITY_ERR = 12030,
NVSWITCH_ERR_HW_NPORT_EGRESS_NCISOC_CREDIT_PARITY_ERR = 12031,
NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_FLITTYPE_MISMATCH_ERR = 12032,
NVSWITCH_ERR_HW_NPORT_EGRESS_CREDIT_TIME_OUT_ERR = 12033,
NVSWITCH_ERR_HW_NPORT_EGRESS_INVALIDVCSET_ERR = 12034,
NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_SIDEBAND_PD_PARITY_ERR = 12035,
NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_REDUCTION_HDR_ECC_LIMIT_ERR = 12036,
NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_REDUCTION_HDR_ECC_DBE_ERR = 12037,
NVSWITCH_ERR_HW_NPORT_EGRESS_MCRSPCTRLSTORE_ECC_LIMIT_ERR = 12038,
NVSWITCH_ERR_HW_NPORT_EGRESS_MCRSPCTRLSTORE_ECC_DBE_ERR = 12039,
NVSWITCH_ERR_HW_NPORT_EGRESS_RBCTRLSTORE_ECC_LIMIT_ERR = 12040,
NVSWITCH_ERR_HW_NPORT_EGRESS_RBCTRLSTORE_ECC_DBE_ERR = 12041,
NVSWITCH_ERR_HW_NPORT_EGRESS_MCREDSGT_ECC_LIMIT_ERR = 12042,
NVSWITCH_ERR_HW_NPORT_EGRESS_MCREDSGT_ECC_DBE_ERR = 12043,
NVSWITCH_ERR_HW_NPORT_EGRESS_MCREDBUF_ECC_LIMIT_ERR = 12044,
NVSWITCH_ERR_HW_NPORT_EGRESS_MCREDBUF_ECC_DBE_ERR = 12045,
NVSWITCH_ERR_HW_NPORT_EGRESS_MCRSP_RAM_HDR_ECC_LIMIT_ERR = 12046,
NVSWITCH_ERR_HW_NPORT_EGRESS_MCRSP_RAM_HDR_ECC_DBE_ERR = 12047,
NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_REDUCTION_HDR_PARITY_ERR = 12048,
NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_REDUCTION_FLITTYPE_MISMATCH_ERR = 12049,
NVSWITCH_ERR_HW_NPORT_EGRESS_MCRSP_CNT_ERR = 12050,
NVSWITCH_ERR_HW_NPORT_EGRESS_RBRSP_CNT_ERR = 12051,
NVSWITCH_ERR_HW_NPORT_EGRESS_LAST, /* NOTE: Must be last */
/* NPORT: Fstate errors */
NVSWITCH_ERR_HW_NPORT_FSTATE = 13000,
NVSWITCH_ERR_HW_NPORT_FSTATE_TAGPOOLBUFERR = 13001,
NVSWITCH_ERR_HW_NPORT_FSTATE_CRUMBSTOREBUFERR = 13002,
NVSWITCH_ERR_HW_NPORT_FSTATE_SINGLEBITECCLIMITERR_CRUMBSTORE = 13003,
NVSWITCH_ERR_HW_NPORT_FSTATE_UNCORRECTABLEECCERR_CRUMBSTORE = 13004,
NVSWITCH_ERR_HW_NPORT_FSTATE_SINGLEBITECCLIMITERR_TAGSTORE = 13005,
NVSWITCH_ERR_HW_NPORT_FSTATE_UNCORRECTABLEECCERR_TAGSTORE = 13006,
NVSWITCH_ERR_HW_NPORT_FSTATE_SINGLEBITECCLIMITERR_FLUSHREQSTORE = 13007,
NVSWITCH_ERR_HW_NPORT_FSTATE_UNCORRECTABLEECCERR_FLUSHREQSTORE = 13008,
NVSWITCH_ERR_HW_NPORT_FSTATE_LAST, /* NOTE: Must be last */
/* NPORT: Tstate errors */
NVSWITCH_ERR_HW_NPORT_TSTATE = 14000,
NVSWITCH_ERR_HW_NPORT_TSTATE_TAGPOOLBUFERR = 14001,
NVSWITCH_ERR_HW_NPORT_TSTATE_CRUMBSTOREBUFERR = 14002,
NVSWITCH_ERR_HW_NPORT_TSTATE_SINGLEBITECCLIMITERR_CRUMBSTORE = 14003,
NVSWITCH_ERR_HW_NPORT_TSTATE_UNCORRECTABLEECCERR_CRUMBSTORE = 14004,
NVSWITCH_ERR_HW_NPORT_TSTATE_SINGLEBITECCLIMITERR_TAGSTORE = 14005,
NVSWITCH_ERR_HW_NPORT_TSTATE_UNCORRECTABLEECCERR_TAGSTORE = 14006,
NVSWITCH_ERR_HW_NPORT_TSTATE_TAGPOOL_ECC_LIMIT_ERR = 14007,
NVSWITCH_ERR_HW_NPORT_TSTATE_TAGPOOL_ECC_DBE_ERR = 14008,
NVSWITCH_ERR_HW_NPORT_TSTATE_CRUMBSTORE_ECC_LIMIT_ERR = 14009,
NVSWITCH_ERR_HW_NPORT_TSTATE_CRUMBSTORE_ECC_DBE_ERR = 14010,
NVSWITCH_ERR_HW_NPORT_TSTATE_COL_CRUMBSTOREBUFERR = 14011,
NVSWITCH_ERR_HW_NPORT_TSTATE_COL_CRUMBSTORE_ECC_LIMIT_ERR = 14012,
NVSWITCH_ERR_HW_NPORT_TSTATE_COL_CRUMBSTORE_ECC_DBE_ERR = 14013,
NVSWITCH_ERR_HW_NPORT_TSTATE_TD_TID_RAMBUFERR = 14014,
NVSWITCH_ERR_HW_NPORT_TSTATE_TD_TID_RAM_ECC_LIMIT_ERR = 14015,
NVSWITCH_ERR_HW_NPORT_TSTATE_TD_TID_RAM_ECC_DBE_ERR = 14016,
NVSWITCH_ERR_HW_NPORT_TSTATE_ATO_ERR = 14017,
NVSWITCH_ERR_HW_NPORT_TSTATE_CAMRSP_ERR = 14018,
NVSWITCH_ERR_HW_NPORT_TSTATE_LAST, /* NOTE: Must be last */
/* NPORT: Route errors */
NVSWITCH_ERR_HW_NPORT_ROUTE = 15000,
NVSWITCH_ERR_HW_NPORT_ROUTE_ROUTEBUFERR = 15001,
NVSWITCH_ERR_HW_NPORT_ROUTE_NOPORTDEFINEDERR = 15002,
NVSWITCH_ERR_HW_NPORT_ROUTE_INVALIDROUTEPOLICYERR = 15003,
NVSWITCH_ERR_HW_NPORT_ROUTE_ECCLIMITERR = 15004,
NVSWITCH_ERR_HW_NPORT_ROUTE_UNCORRECTABLEECCERR = 15005,
NVSWITCH_ERR_HW_NPORT_ROUTE_TRANSDONERESVERR = 15006,
NVSWITCH_ERR_HW_NPORT_ROUTE_PACKET_HEADER = 15007,
NVSWITCH_ERR_HW_NPORT_ROUTE_GLT_ECC_LIMIT_ERR = 15008,
NVSWITCH_ERR_HW_NPORT_ROUTE_GLT_ECC_DBE_ERR = 15009,
NVSWITCH_ERR_HW_NPORT_ROUTE_PDCTRLPARERR = 15010,
NVSWITCH_ERR_HW_NPORT_ROUTE_NVS_ECC_LIMIT_ERR = 15011,
NVSWITCH_ERR_HW_NPORT_ROUTE_NVS_ECC_DBE_ERR = 15012,
NVSWITCH_ERR_HW_NPORT_ROUTE_CDTPARERR = 15013,
NVSWITCH_ERR_HW_NPORT_ROUTE_MCRID_ECC_LIMIT_ERR = 15014,
NVSWITCH_ERR_HW_NPORT_ROUTE_MCRID_ECC_DBE_ERR = 15015,
NVSWITCH_ERR_HW_NPORT_ROUTE_EXTMCRID_ECC_LIMIT_ERR = 15016,
NVSWITCH_ERR_HW_NPORT_ROUTE_EXTMCRID_ECC_DBE_ERR = 15017,
NVSWITCH_ERR_HW_NPORT_ROUTE_RAM_ECC_LIMIT_ERR = 15018,
NVSWITCH_ERR_HW_NPORT_ROUTE_RAM_ECC_DBE_ERR = 15019,
NVSWITCH_ERR_HW_NPORT_ROUTE_INVALID_MCRID_ERR = 15020,
NVSWITCH_ERR_HW_NPORT_ROUTE_LAST, /* NOTE: Must be last */
/* NPORT: Nport errors */
NVSWITCH_ERR_HW_NPORT = 16000,
NVSWITCH_ERR_HW_NPORT_DATAPOISONED = 16001,
NVSWITCH_ERR_HW_NPORT_UCINTERNAL = 16002,
NVSWITCH_ERR_HW_NPORT_CINTERNAL = 16003,
NVSWITCH_ERR_HW_NPORT_LAST, /* NOTE: Must be last */
/* NVLCTRL: NVCTRL errors */
NVSWITCH_ERR_HW_NVLCTRL = 17000,
NVSWITCH_ERR_HW_NVLCTRL_INGRESSECCSOFTLIMITERR = 17001,
NVSWITCH_ERR_HW_NVLCTRL_INGRESSECCHDRDOUBLEBITERR = 17002,
NVSWITCH_ERR_HW_NVLCTRL_INGRESSECCDATADOUBLEBITERR = 17003,
NVSWITCH_ERR_HW_NVLCTRL_INGRESSBUFFERERR = 17004,
NVSWITCH_ERR_HW_NVLCTRL_EGRESSECCSOFTLIMITERR = 17005,
NVSWITCH_ERR_HW_NVLCTRL_EGRESSECCHDRDOUBLEBITERR = 17006,
NVSWITCH_ERR_HW_NVLCTRL_EGRESSECCDATADOUBLEBITERR = 17007,
NVSWITCH_ERR_HW_NVLCTRL_EGRESSBUFFERERR = 17008,
NVSWITCH_ERR_HW_NVLCTRL_LAST, /* NOTE: Must be last */
/* Nport: Nvlipt errors */
NVSWITCH_ERR_HW_NVLIPT = 18000,
NVSWITCH_ERR_HW_NVLIPT_DLPROTOCOL = 18001,
NVSWITCH_ERR_HW_NVLIPT_DATAPOISONED = 18002,
NVSWITCH_ERR_HW_NVLIPT_FLOWCONTROL = 18003,
NVSWITCH_ERR_HW_NVLIPT_RESPONSETIMEOUT = 18004,
NVSWITCH_ERR_HW_NVLIPT_TARGETERROR = 18005,
NVSWITCH_ERR_HW_NVLIPT_UNEXPECTEDRESPONSE = 18006,
NVSWITCH_ERR_HW_NVLIPT_RECEIVEROVERFLOW = 18007,
NVSWITCH_ERR_HW_NVLIPT_MALFORMEDPACKET = 18008,
NVSWITCH_ERR_HW_NVLIPT_STOMPEDPACKETRECEIVED = 18009,
NVSWITCH_ERR_HW_NVLIPT_UNSUPPORTEDREQUEST = 18010,
NVSWITCH_ERR_HW_NVLIPT_UCINTERNAL = 18011,
NVSWITCH_ERR_HW_NVLIPT_PHYRECEIVER = 18012,
NVSWITCH_ERR_HW_NVLIPT_BADAN0PKT = 18013,
NVSWITCH_ERR_HW_NVLIPT_REPLAYTIMEOUT = 18014,
NVSWITCH_ERR_HW_NVLIPT_ADVISORYERROR = 18015,
NVSWITCH_ERR_HW_NVLIPT_CINTERNAL = 18016,
NVSWITCH_ERR_HW_NVLIPT_HEADEROVERFLOW = 18017,
NVSWITCH_ERR_HW_NVLIPT_RSTSEQ_PHYARB_TIMEOUT = 18018,
NVSWITCH_ERR_HW_NVLIPT_RSTSEQ_PLL_TIMEOUT = 18019,
NVSWITCH_ERR_HW_NVLIPT_CLKCTL_ILLEGAL_REQUEST = 18020,
NVSWITCH_ERR_HW_NVLIPT_LAST, /* NOTE: Must be last */
/* Nport: Nvltlc TX/RX errors */
NVSWITCH_ERR_HW_NVLTLC = 19000,
NVSWITCH_ERR_HW_NVLTLC_TXHDRCREDITOVFERR = 19001,
NVSWITCH_ERR_HW_NVLTLC_TXDATACREDITOVFERR = 19002,
NVSWITCH_ERR_HW_NVLTLC_TXDLCREDITOVFERR = 19003,
NVSWITCH_ERR_HW_NVLTLC_TXDLCREDITPARITYERR = 19004,
NVSWITCH_ERR_HW_NVLTLC_TXRAMHDRPARITYERR = 19005,
NVSWITCH_ERR_HW_NVLTLC_TXRAMDATAPARITYERR = 19006,
NVSWITCH_ERR_HW_NVLTLC_TXUNSUPVCOVFERR = 19007,
NVSWITCH_ERR_HW_NVLTLC_TXSTOMPDET = 19008,
NVSWITCH_ERR_HW_NVLTLC_TXPOISONDET = 19009,
NVSWITCH_ERR_HW_NVLTLC_TARGETERR = 19010,
NVSWITCH_ERR_HW_NVLTLC_TX_PACKET_HEADER = 19011,
NVSWITCH_ERR_HW_NVLTLC_UNSUPPORTEDREQUESTERR = 19012,
NVSWITCH_ERR_HW_NVLTLC_RXDLHDRPARITYERR = 19013,
NVSWITCH_ERR_HW_NVLTLC_RXDLDATAPARITYERR = 19014,
NVSWITCH_ERR_HW_NVLTLC_RXDLCTRLPARITYERR = 19015,
NVSWITCH_ERR_HW_NVLTLC_RXRAMDATAPARITYERR = 19016,
NVSWITCH_ERR_HW_NVLTLC_RXRAMHDRPARITYERR = 19017,
NVSWITCH_ERR_HW_NVLTLC_RXINVALIDAEERR = 19018,
NVSWITCH_ERR_HW_NVLTLC_RXINVALIDBEERR = 19019,
NVSWITCH_ERR_HW_NVLTLC_RXINVALIDADDRALIGNERR = 19020,
NVSWITCH_ERR_HW_NVLTLC_RXPKTLENERR = 19021,
NVSWITCH_ERR_HW_NVLTLC_RSVCMDENCERR = 19022,
NVSWITCH_ERR_HW_NVLTLC_RSVDATLENENCERR = 19023,
NVSWITCH_ERR_HW_NVLTLC_RSVADDRTYPEERR = 19024,
NVSWITCH_ERR_HW_NVLTLC_RSVRSPSTATUSERR = 19025,
NVSWITCH_ERR_HW_NVLTLC_RSVPKTSTATUSERR = 19026,
NVSWITCH_ERR_HW_NVLTLC_RSVCACHEATTRPROBEREQERR = 19027,
NVSWITCH_ERR_HW_NVLTLC_RSVCACHEATTRPROBERSPERR = 19028,
NVSWITCH_ERR_HW_NVLTLC_DATLENGTATOMICREQMAXERR = 19029,
NVSWITCH_ERR_HW_NVLTLC_DATLENGTRMWREQMAXERR = 19030,
NVSWITCH_ERR_HW_NVLTLC_DATLENLTATRRSPMINERR = 19031,
NVSWITCH_ERR_HW_NVLTLC_INVALIDCACHEATTRPOERR = 19032,
NVSWITCH_ERR_HW_NVLTLC_INVALIDCRERR = 19033,
NVSWITCH_ERR_HW_NVLTLC_RXRESPSTATUSTARGETERR = 19034,
NVSWITCH_ERR_HW_NVLTLC_RXRESPSTATUSUNSUPPORTEDREQUESTERR = 19035,
NVSWITCH_ERR_HW_NVLTLC_RXHDROVFERR = 19036,
NVSWITCH_ERR_HW_NVLTLC_RXDATAOVFERR = 19037,
NVSWITCH_ERR_HW_NVLTLC_STOMPDETERR = 19038,
NVSWITCH_ERR_HW_NVLTLC_RXPOISONERR = 19039,
NVSWITCH_ERR_HW_NVLTLC_CORRECTABLEINTERNALERR = 19040,
NVSWITCH_ERR_HW_NVLTLC_RXUNSUPVCOVFERR = 19041,
NVSWITCH_ERR_HW_NVLTLC_RXUNSUPNVLINKCREDITRELERR = 19042,
NVSWITCH_ERR_HW_NVLTLC_RXUNSUPNCISOCCREDITRELERR = 19043,
NVSWITCH_ERR_HW_NVLTLC_RX_PACKET_HEADER = 19044,
NVSWITCH_ERR_HW_NVLTLC_RX_ERR_HEADER = 19045,
NVSWITCH_ERR_HW_NVLTLC_TX_SYS_NCISOC_PARITY_ERR = 19046,
NVSWITCH_ERR_HW_NVLTLC_TX_SYS_NCISOC_HDR_ECC_DBE_ERR = 19047,
NVSWITCH_ERR_HW_NVLTLC_TX_SYS_NCISOC_DAT_ECC_DBE_ERR = 19048,
NVSWITCH_ERR_HW_NVLTLC_TX_SYS_NCISOC_ECC_LIMIT_ERR = 19049,
NVSWITCH_ERR_HW_NVLTLC_TX_SYS_TXRSPSTATUS_HW_ERR = 19050,
NVSWITCH_ERR_HW_NVLTLC_TX_SYS_TXRSPSTATUS_UR_ERR = 19051,
NVSWITCH_ERR_HW_NVLTLC_TX_SYS_TXRSPSTATUS_PRIV_ERR = 19052,
NVSWITCH_ERR_HW_NVLTLC_RX_SYS_NCISOC_PARITY_ERR = 19053,
NVSWITCH_ERR_HW_NVLTLC_RX_SYS_HDR_RAM_ECC_DBE_ERR = 19054,
NVSWITCH_ERR_HW_NVLTLC_RX_SYS_HDR_RAM_ECC_LIMIT_ERR = 19055,
NVSWITCH_ERR_HW_NVLTLC_RX_SYS_DAT0_RAM_ECC_DBE_ERR = 19056,
NVSWITCH_ERR_HW_NVLTLC_RX_SYS_DAT0_RAM_ECC_LIMIT_ERR = 19057,
NVSWITCH_ERR_HW_NVLTLC_RX_SYS_DAT1_RAM_ECC_DBE_ERR = 19058,
NVSWITCH_ERR_HW_NVLTLC_RX_SYS_DAT1_RAM_ECC_LIMIT_ERR = 19059,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_CREQ_RAM_HDR_ECC_DBE_ERR = 19060,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_CREQ_RAM_DAT_ECC_DBE_ERR = 19061,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_CREQ_RAM_ECC_LIMIT_ERR = 19062,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_RSP_RAM_HDR_ECC_DBE_ERR = 19063,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_RSP_RAM_DAT_ECC_DBE_ERR = 19064,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_RSP_RAM_ECC_LIMIT_ERR = 19065,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_COM_RAM_HDR_ECC_DBE_ERR = 19066,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_COM_RAM_DAT_ECC_DBE_ERR = 19067,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_COM_RAM_ECC_LIMIT_ERR = 19068,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_RSP1_RAM_HDR_ECC_DBE_ERR = 19069,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_RSP1_RAM_DAT_ECC_DBE_ERR = 19070,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_RSP1_RAM_ECC_LIMIT_ERR = 19071,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_AN1_TIMEOUT_VC0 = 19072,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_AN1_TIMEOUT_VC1 = 19073,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_AN1_TIMEOUT_VC2 = 19074,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_AN1_TIMEOUT_VC3 = 19075,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_AN1_TIMEOUT_VC4 = 19076,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_AN1_TIMEOUT_VC5 = 19077,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_AN1_TIMEOUT_VC6 = 19078,
NVSWITCH_ERR_HW_NVLTLC_TX_LNK_AN1_TIMEOUT_VC7 = 19079,
NVSWITCH_ERR_HW_NVLTLC_RX_LNK_RXRSPSTATUS_HW_ERR = 19080,
NVSWITCH_ERR_HW_NVLTLC_RX_LNK_RXRSPSTATUS_UR_ERR = 19081,
NVSWITCH_ERR_HW_NVLTLC_RX_LNK_RXRSPSTATUS_PRIV_ERR = 19082,
NVSWITCH_ERR_HW_NVLTLC_RX_LNK_INVALID_COLLAPSED_RESPONSE_ERR = 19083,
NVSWITCH_ERR_HW_NVLTLC_RX_LNK_AN1_HEARTBEAT_TIMEOUT_ERR = 19084,
NVSWITCH_ERR_HW_NVLTLC_LAST, /* NOTE: Must be last */
/* DLPL: errors ( SL1 errors too) */
NVSWITCH_ERR_HW_DLPL = 20000,
NVSWITCH_ERR_HW_DLPL_TX_REPLAY = 20001,
NVSWITCH_ERR_HW_DLPL_TX_RECOVERY_SHORT = 20002,
NVSWITCH_ERR_HW_DLPL_TX_RECOVERY_LONG = 20003,
NVSWITCH_ERR_HW_DLPL_TX_FAULT_RAM = 20004,
NVSWITCH_ERR_HW_DLPL_TX_FAULT_INTERFACE = 20005,
NVSWITCH_ERR_HW_DLPL_TX_FAULT_SUBLINK_CHANGE = 20006,
NVSWITCH_ERR_HW_DLPL_RX_FAULT_SUBLINK_CHANGE = 20007,
NVSWITCH_ERR_HW_DLPL_RX_FAULT_DL_PROTOCOL = 20008,
NVSWITCH_ERR_HW_DLPL_RX_SHORT_ERROR_RATE = 20009,
NVSWITCH_ERR_HW_DLPL_RX_LONG_ERROR_RATE = 20010,
NVSWITCH_ERR_HW_DLPL_RX_ILA_TRIGGER = 20011,
NVSWITCH_ERR_HW_DLPL_RX_CRC_COUNTER = 20012,
NVSWITCH_ERR_HW_DLPL_LTSSM_FAULT = 20013,
NVSWITCH_ERR_HW_DLPL_LTSSM_PROTOCOL = 20014,
NVSWITCH_ERR_HW_DLPL_MINION_REQUEST = 20015,
NVSWITCH_ERR_HW_DLPL_FIFO_DRAIN_ERR = 20016,
NVSWITCH_ERR_HW_DLPL_CONST_DET_ERR = 20017,
NVSWITCH_ERR_HW_DLPL_OFF2SAFE_LINK_DET_ERR = 20018,
NVSWITCH_ERR_HW_DLPL_SAFE2NO_LINK_DET_ERR = 20019,
NVSWITCH_ERR_HW_DLPL_SCRAM_LOCK_ERR = 20020,
NVSWITCH_ERR_HW_DLPL_SYM_LOCK_ERR = 20021,
NVSWITCH_ERR_HW_DLPL_SYM_ALIGN_END_ERR = 20022,
NVSWITCH_ERR_HW_DLPL_FIFO_SKEW_ERR = 20023,
NVSWITCH_ERR_HW_DLPL_TRAIN2SAFE_LINK_DET_ERR = 20024,
NVSWITCH_ERR_HW_DLPL_HS2SAFE_LINK_DET_ERR = 20025,
NVSWITCH_ERR_HW_DLPL_FENCE_ERR = 20026,
NVSWITCH_ERR_HW_DLPL_SAFE_NO_LD_ERR = 20027,
NVSWITCH_ERR_HW_DLPL_E2SAFE_LD_ERR = 20028,
NVSWITCH_ERR_HW_DLPL_RC_RXPWR_ERR = 20029,
NVSWITCH_ERR_HW_DLPL_RC_TXPWR_ERR = 20030,
NVSWITCH_ERR_HW_DLPL_RC_DEADLINE_ERR = 20031,
NVSWITCH_ERR_HW_DLPL_TX_HS2LP_ERR = 20032,
NVSWITCH_ERR_HW_DLPL_RX_HS2LP_ERR = 20033,
NVSWITCH_ERR_HW_DLPL_LTSSM_FAULT_UP = 20034,
NVSWITCH_ERR_HW_DLPL_LTSSM_FAULT_DOWN = 20035,
NVSWITCH_ERR_HW_DLPL_PHY_A = 20036,
NVSWITCH_ERR_HW_DLPL_TX_PL_ERROR = 20037,
NVSWITCH_ERR_HW_DLPL_RX_PL_ERROR = 20038,
NVSWITCH_ERR_HW_DLPL_LAST, /* NOTE: Must be last */
/* AFS: errors */
NVSWITCH_ERR_HW_AFS = 21000,
NVSWITCH_ERR_HW_AFS_UC_INGRESS_CREDIT_OVERFLOW = 21001,
NVSWITCH_ERR_HW_AFS_UC_INGRESS_CREDIT_UNDERFLOW = 21002,
NVSWITCH_ERR_HW_AFS_UC_EGRESS_CREDIT_OVERFLOW = 21003,
NVSWITCH_ERR_HW_AFS_UC_EGRESS_CREDIT_UNDERFLOW = 21004,
NVSWITCH_ERR_HW_AFS_UC_INGRESS_NON_BURSTY_PKT_DETECTED = 21005,
NVSWITCH_ERR_HW_AFS_UC_INGRESS_NON_STICKY_PKT_DETECTED = 21006,
NVSWITCH_ERR_HW_AFS_UC_INGRESS_BURST_GT_17_DATA_VC_DETECTED = 21007,
NVSWITCH_ERR_HW_AFS_UC_INGRESS_BURST_GT_1_NONDATA_VC_DETECTED = 21008,
NVSWITCH_ERR_HW_AFS_UC_INVALID_DST = 21009,
NVSWITCH_ERR_HW_AFS_UC_PKT_MISROUTE = 21010,
NVSWITCH_ERR_HW_AFS_LAST, /* NOTE: Must be last */
/* MINION: errors */
NVSWITCH_ERR_HW_MINION = 22000,
NVSWITCH_ERR_HW_MINION_UCODE_IMEM = 22001,
NVSWITCH_ERR_HW_MINION_UCODE_DMEM = 22002,
NVSWITCH_ERR_HW_MINION_HALT = 22003,
NVSWITCH_ERR_HW_MINION_BOOT_ERROR = 22004,
NVSWITCH_ERR_HW_MINION_TIMEOUT = 22005,
NVSWITCH_ERR_HW_MINION_DLCMD_FAULT = 22006,
NVSWITCH_ERR_HW_MINION_DLCMD_TIMEOUT = 22007,
NVSWITCH_ERR_HW_MINION_DLCMD_FAIL = 22008,
NVSWITCH_ERR_HW_MINION_FATAL_INTR = 22009,
NVSWITCH_ERR_HW_MINION_WATCHDOG = 22010,
NVSWITCH_ERR_HW_MINION_EXTERR = 22011,
NVSWITCH_ERR_HW_MINION_FATAL_LINK_INTR = 22012,
NVSWITCH_ERR_HW_MINION_NONFATAL = 22013,
NVSWITCH_ERR_HW_MINION_LAST, /* NOTE: Must be last */
/* NXBAR errors */
NVSWITCH_ERR_HW_NXBAR = 23000,
NVSWITCH_ERR_HW_NXBAR_TILE_INGRESS_BUFFER_OVERFLOW = 23001,
NVSWITCH_ERR_HW_NXBAR_TILE_INGRESS_BUFFER_UNDERFLOW = 23002,
NVSWITCH_ERR_HW_NXBAR_TILE_EGRESS_CREDIT_OVERFLOW = 23003,
NVSWITCH_ERR_HW_NXBAR_TILE_EGRESS_CREDIT_UNDERFLOW = 23004,
NVSWITCH_ERR_HW_NXBAR_TILE_INGRESS_NON_BURSTY_PKT = 23005,
NVSWITCH_ERR_HW_NXBAR_TILE_INGRESS_NON_STICKY_PKT = 23006,
NVSWITCH_ERR_HW_NXBAR_TILE_INGRESS_BURST_GT_9_DATA_VC = 23007,
NVSWITCH_ERR_HW_NXBAR_TILE_INGRESS_PKT_INVALID_DST = 23008,
NVSWITCH_ERR_HW_NXBAR_TILE_INGRESS_PKT_PARITY_ERROR = 23009,
NVSWITCH_ERR_HW_NXBAR_TILEOUT_INGRESS_BUFFER_OVERFLOW = 23010,
NVSWITCH_ERR_HW_NXBAR_TILEOUT_INGRESS_BUFFER_UNDERFLOW = 23011,
NVSWITCH_ERR_HW_NXBAR_TILEOUT_EGRESS_CREDIT_OVERFLOW = 23012,
NVSWITCH_ERR_HW_NXBAR_TILEOUT_EGRESS_CREDIT_UNDERFLOW = 23013,
NVSWITCH_ERR_HW_NXBAR_TILEOUT_INGRESS_NON_BURSTY_PKT = 23014,
NVSWITCH_ERR_HW_NXBAR_TILEOUT_INGRESS_NON_STICKY_PKT = 23015,
NVSWITCH_ERR_HW_NXBAR_TILEOUT_INGRESS_BURST_GT_9_DATA_VC = 23016,
NVSWITCH_ERR_HW_NXBAR_TILEOUT_EGRESS_CDT_PARITY_ERROR = 23017,
NVSWITCH_ERR_HW_NXBAR_LAST, /* NOTE: Must be last */
/* NPORT: SOURCETRACK errors */
NVSWITCH_ERR_HW_NPORT_SOURCETRACK = 24000,
NVSWITCH_ERR_HW_NPORT_SOURCETRACK_CREQ_TCEN0_CRUMBSTORE_ECC_LIMIT_ERR = 24001,
NVSWITCH_ERR_HW_NPORT_SOURCETRACK_CREQ_TCEN0_TD_CRUMBSTORE_ECC_LIMIT_ERR = 24002,
NVSWITCH_ERR_HW_NPORT_SOURCETRACK_CREQ_TCEN1_CRUMBSTORE_ECC_LIMIT_ERR = 24003,
NVSWITCH_ERR_HW_NPORT_SOURCETRACK_CREQ_TCEN0_CRUMBSTORE_ECC_DBE_ERR = 24004,
NVSWITCH_ERR_HW_NPORT_SOURCETRACK_CREQ_TCEN0_TD_CRUMBSTORE_ECC_DBE_ERR = 24005,
NVSWITCH_ERR_HW_NPORT_SOURCETRACK_CREQ_TCEN1_CRUMBSTORE_ECC_DBE_ERR = 24006,
NVSWITCH_ERR_HW_NPORT_SOURCETRACK_SOURCETRACK_TIME_OUT_ERR = 24007,
NVSWITCH_ERR_HW_NPORT_SOURCETRACK_DUP_CREQ_TCEN0_TAG_ERR = 24008,
NVSWITCH_ERR_HW_NPORT_SOURCETRACK_INVALID_TCEN0_RSP_ERR = 24009,
NVSWITCH_ERR_HW_NPORT_SOURCETRACK_INVALID_TCEN1_RSP_ERR = 24010,
NVSWITCH_ERR_HW_NPORT_SOURCETRACK_LAST, /* NOTE: Must be last */
/* NVLIPT_LNK errors */
NVSWITCH_ERR_HW_NVLIPT_LNK = 25000,
NVSWITCH_ERR_HW_NVLIPT_LNK_ILLEGALLINKSTATEREQUEST = 25001,
NVSWITCH_ERR_HW_NVLIPT_LNK_FAILEDMINIONREQUEST = 25002,
NVSWITCH_ERR_HW_NVLIPT_LNK_RESERVEDREQUESTVALUE = 25003,
NVSWITCH_ERR_HW_NVLIPT_LNK_LINKSTATEWRITEWHILEBUSY = 25004,
NVSWITCH_ERR_HW_NVLIPT_LNK_LINK_STATE_REQUEST_TIMEOUT = 25005,
NVSWITCH_ERR_HW_NVLIPT_LNK_WRITE_TO_LOCKED_SYSTEM_REG_ERR = 25006,
NVSWITCH_ERR_HW_NVLIPT_LNK_SLEEPWHILEACTIVELINK = 25007,
NVSWITCH_ERR_HW_NVLIPT_LNK_RSTSEQ_PHYCTL_TIMEOUT = 25008,
NVSWITCH_ERR_HW_NVLIPT_LNK_RSTSEQ_CLKCTL_TIMEOUT = 25009,
NVSWITCH_ERR_HW_NVLIPT_LNK_ALI_TRAINING_FAIL = 25010,
NVSWITCH_ERR_HW_NVLIPT_LNK_LAST, /* Note: Must be last */
/* SOE errors */
NVSWITCH_ERR_HW_SOE = 26000,
NVSWITCH_ERR_HW_SOE_RESET = 26001,
NVSWITCH_ERR_HW_SOE_BOOTSTRAP = 26002,
NVSWITCH_ERR_HW_SOE_COMMAND_QUEUE = 26003,
NVSWITCH_ERR_HW_SOE_TIMEOUT = 26004,
NVSWITCH_ERR_HW_SOE_SHUTDOWN = 26005,
NVSWITCH_ERR_HW_SOE_HALT = 26006,
NVSWITCH_ERR_HW_SOE_EXTERR = 26007,
NVSWITCH_ERR_HW_SOE_WATCHDOG = 26008,
NVSWITCH_ERR_HW_SOE_LAST, /* Note: Must be last */
/* NPORT: Multicast Tstate errors */
NVSWITCH_ERR_HW_NPORT_MULTICASTTSTATE = 28000,
NVSWITCH_ERR_HW_NPORT_MULTICASTTSTATE_TAGPOOL_ECC_LIMIT_ERR = 28001,
NVSWITCH_ERR_HW_NPORT_MULTICASTTSTATE_TAGPOOL_ECC_DBE_ERR = 28002,
NVSWITCH_ERR_HW_NPORT_MULTICASTTSTATE_CRUMBSTORE_ECC_LIMIT_ERR = 28003,
NVSWITCH_ERR_HW_NPORT_MULTICASTTSTATE_CRUMBSTORE_ECC_DBE_ERR = 28004,
NVSWITCH_ERR_HW_NPORT_MULTICASTTSTATE_CRUMBSTORE_BUF_OVERWRITE_ERR = 28005,
NVSWITCH_ERR_HW_NPORT_MULTICASTTSTATE_CRUMBSTORE_MCTO_ERR = 28006,
NVSWITCH_ERR_HW_NPORT_MULTICASTTSTATE_LAST, /* Note: Must be last */
/* NPORT: Reduction Tstate errors */
NVSWITCH_ERR_HW_NPORT_REDUCTIONTSTATE = 29000,
NVSWITCH_ERR_HW_NPORT_REDUCTIONTSTATE_TAGPOOL_ECC_LIMIT_ERR = 29001,
NVSWITCH_ERR_HW_NPORT_REDUCTIONTSTATE_TAGPOOL_ECC_DBE_ERR = 29002,
NVSWITCH_ERR_HW_NPORT_REDUCTIONTSTATE_CRUMBSTORE_ECC_LIMIT_ERR = 29003,
NVSWITCH_ERR_HW_NPORT_REDUCTIONTSTATE_CRUMBSTORE_ECC_DBE_ERR = 29004,
NVSWITCH_ERR_HW_NPORT_REDUCTIONTSTATE_CRUMBSTORE_BUF_OVERWRITE_ERR = 29005,
NVSWITCH_ERR_HW_NPORT_REDUCTIONTSTATE_CRUMBSTORE_RTO_ERR = 29006,
NVSWITCH_ERR_HW_NPORT_REDUCTIONTSTATE_LAST, /* Note: Must be last */
/* Please update nvswitch_translate_hw_errors with a newly added error class. */
NVSWITCH_ERR_LAST
/* See enum modification guidelines at the top of this file */
} NVSWITCH_ERR_TYPE;
总结
整体而言当GPU以及NVLINK出错时,硬件内部会产生错误信号由驱动收集组织成Xid错误码。而NVSWITCH内部报错由NVSWITCH的驱动程序收集组织成SXid的错误码。根据芯片模块的微架构,制定报错机制可以在遇到问题时定位根因。