NVIDIA Xid Message与SXid Message调研

Xid是NVIDIA GPU的错误码标准,用于GPU相关问题定位,而SXid是NVSwitch的错误码,用于NVSwitch相关问题定位。

NVIDIA GPU 诊断指南

以下是NVIDIA GPU报错时的检验流程,图示中有3条流程,因此可以将GPU错误分成3类。
第一类是通过Xid报错的错误,这类错误大多和硬件相关;第二类是通过系统监控报错,一般是GPU功耗,温度或网络问题;第三类系驱动运行时的报错日志,一般是驱动,运行时或者GPU应用等软件相关问题。
gpu triage flowchart
本文关注第一类问题,即GPU硬件相关问题,这些硬件相关报错系GPU设计时便已考虑的,反过来可以了解GPU可能会遇到哪些硬件错误。

Xid

Xid Message是NVIDIA驱动打印出来的日志,这类日志一般意味着GPU出现了硬件相关的错误,错误原因可能是驱动没有正确配置GPU,也可能是硬件发生了故障,也可能是运行于GPU上的程序导致。Xid信息提供硬件的错误状态,为故障诊断提供了方向。

Xid格式

以下是Xid的格式。其中NVRM表示这是NVIDIA KMD的日志。

NVRM: GPU at PCI:<gpu_pci_bdf>: <gpu_uuid>
NVRM: GPU Board Serial Number: <gpu_serial_number>
NVRM: Xid (PCI:<gpu_pci_bdf>): <Xid_Value>, <raw error information>

The following is an example of a Xid error log
[...] NVRM: GPU at PCI:0000:34:00: GPU-c43f0536-e751-7211-d7a7-
78c95249ee7d
[...] NVRM: GPU Board Serial Number: 0323618040756
[...] NVRM: Xid (PCI:0000:34:00): 45, Ch 00000010

Xid错误源

根据Xid文档的表格,Xid的错误源可以分为HW Error,Driver Error,User App Error,System Memory Corruption,Bus Error,Thermal Issue,FB Corruption这七大类。
Driver Error,User App Error系软件没有正确编程GPU导致的错误;System Memory Corrutpion是主机侧内存意外写引发的错误;Bus Error是PCIe总线发生了错误;Thermal Issue是设备温度异常导致的错误;FB Corruption是设备侧FB意外写引发的错误。其中HW Error又可以再细分为具体的硬件模块错误,如Display模块,PBDMA模块,Copy Engine模块,NVLink端口模块,DRAM模块等等。

Xid表格

CategoryError CodeID
GraphicROBUST_CHANNEL_GR_EXCEPTION13
FAKEROBUST_CHANNEL_FAKE_ERROR14
VBANKROBUST_CHANNEL_VBLANK_CALLBACK_TIMEOUT16
DisplayROBUST_CHANNEL_DISP_MISSED_NOTIFIER19
MpegROBUST_CHANNEL_MPEG_ERROR_SW_METHOD20
Motion EstimationROBUST_CHANNEL_ME_ERROR_SW_METHOD21
Video ProcessROBUST_CHANNEL_VP_ERROR_SW_METHOD22
RCROBUST_CHANNEL_RC_LOGGING_ENABLED23
Video ProcessROBUST_CHANNEL_VP_ERROR27
Video ProcessROBUST_CHANNEL_VP2_ERROR28
BSPROBUST_CHANNEL_BSP_ERROR29
ReservedROBUST_CHANNEL_UNUSED_ERROR_3030
MMUROBUST_CHANNEL_FIFO_ERROR_MMU_ERR_FLT31
PBDMAROBUST_CHANNEL_PBDMA_ERROR32
SecurityROBUST_CHANNEL_SEC_ERROR33
MSVLDROBUST_CHANNEL_MSVLD_ERROR34
MSPDECROBUST_CHANNEL_MSPDEC_ERROR35
MSPPPROBUST_CHANNEL_MSPPP_ERROR36
Copy EngineROBUST_CHANNEL_CE0_ERROR39
Copy EngineROBUST_CHANNEL_CE1_ERROR40
Copy EngineROBUST_CHANNEL_CE2_ERROR41
VICROBUST_CHANNEL_VIC_ERROR42
RESETROBUST_CHANNEL_RESETCHANNEL_VERIF_ERROR43
GraphicROBUST_CHANNEL_GR_FAULT_DURING_CTXSW44
PREEMPTIVEROBUST_CHANNEL_PREEMPTIVE_REMOVAL45
Video EncodeROBUST_CHANNEL_NVENC0_ERROR47
MemoryROBUST_CHANNEL_GPU_ECC_DBE48
MemoryFB_MEMORY_ERROR58
PMUPMU_ERROR59
SecurityROBUST_CHANNEL_SEC2_ERROR60
PMUPMU_BREAKPOINT61
PMUPMU_HALT_ERROR62
MemoryINFOROM_PAGE_RETIREMENT_EVENT63
MemoryINFOROM_DRAM_RETIREMENT_EVENTINFOROM_PAGE_RETIREMENT_EVENT
MemoryINFOROM_PAGE_RETIREMENT_FAILURE64
MemoryINFOROM_DRAM_RETIREMENT_FAILUREINFOROM_PAGE_RETIREMENT_FAILURE
Video EncodeROBUST_CHANNEL_NVENC1_ERROR65
Video EncodeROBUST_CHANNEL_NVDEC0_ERROR68
GraphicROBUST_CHANNEL_GR_CLASS_ERROR69
Copy EngineROBUST_CHANNEL_CE3_ERROR70
Copy EngineROBUST_CHANNEL_CE4_ERROR71
Copy EngineROBUST_CHANNEL_CE5_ERROR72
Video EncodeROBUST_CHANNEL_NVENC2_ERROR73
NVLinkNVLINK_ERROR74
Copy EngineROBUST_CHANNEL_CE6_ERROR75
Copy EngineROBUST_CHANNEL_CE7_ERROR76
Copy EngineROBUST_CHANNEL_CE8_ERROR77
VirtualizationVGPU_START_ERROR78
PCIeROBUST_CHANNEL_GPU_HAS_FALLEN_OFF_THE_BUS79
PBDMAPBDMA_PUSHBUFFER_CRC_MISMATCH80
DisplayROBUST_CHANNEL_VGA_SUBSYSTEM_ERROR81
JpegROBUST_CHANNEL_NVJPG0_ERROR82
Video DecodeROBUST_CHANNEL_NVDEC1_ERROR83
Video DecodeROBUST_CHANNEL_NVDEC2_ERROR84
Copy EngineROBUST_CHANNEL_CE9_ERROR85
OFAROBUST_CHANNEL_OFA0_ERROR86
DRIVERNVTELEMETRY_DRIVER_REPORT87
Video DecodeROBUST_CHANNEL_NVDEC3_ERROR88
Video DecodeROBUST_CHANNEL_NVDEC4_ERROR89
LTCLTC_ERROR90
ReservedRESERVED_XID91
SBEEXCESSIVE_SBE_INTERRUPTS92
TimeoutINFOROM_ERASE_LIMIT_EXCEEDED93
ContainedROBUST_CHANNEL_CONTAINED_ERROR94
UncontainedROBUST_CHANNEL_UNCONTAINED_ERROR95
Video DecodeROBUST_CHANNEL_NVDEC5_ERROR96
Video DecodeROBUST_CHANNEL_NVDEC6_ERROR97
Video DecodeROBUST_CHANNEL_NVDEC7_ERROR98
JpegROBUST_CHANNEL_NVJPG1_ERROR99
JpegROBUST_CHANNEL_NVJPG2_ERROR100
JpegROBUST_CHANNEL_NVJPG3_ERROR101
JpegROBUST_CHANNEL_NVJPG4_ERROR102
JpegROBUST_CHANNEL_NVJPG5_ERROR103
JpegROBUST_CHANNEL_NVJPG6_ERROR104
JpegROBUST_CHANNEL_NVJPG7_ERROR105
MMUDESTINATION_FLA_TRANSLATION_ERROR108
SecuritySEC_FAULT_ERROR110
TimeoutGSP_RPC_TIMEOUT119
GSPGSP_ERROR120
C2CC2C_ERROR121
PMUSPI_PMU_RPC_READ_FAIL122
PMUSPI_PMU_RPC_WRITE_FAIL123
PMUSPI_PMU_RPC_ERASE_FAIL124
FSINFOROM_FS_ERROR125
Copy EngineROBUST_CHANNEL_CE10_ERROR126
Copy EngineROBUST_CHANNEL_CE11_ERROR127
Copy EngineROBUST_CHANNEL_CE12_ERROR128
Copy EngineROBUST_CHANNEL_CE13_ERROR129
Copy EngineROBUST_CHANNEL_CE14_ERROR130
Copy EngineROBUST_CHANNEL_CE15_ERROR131
Copy EngineROBUST_CHANNEL_CE16_ERROR132
Copy EngineROBUST_CHANNEL_CE17_ERROR133
Copy EngineROBUST_CHANNEL_CE18_ERROR134
Copy EngineROBUST_CHANNEL_CE19_ERROR135
ALIALI_TRAINING_FAIL136
NVLinkNVLINK_FLA_PRIV_ERR137
DLAROBUST_CHANNEL_DLA_ERROR138
OFAROBUST_CHANNEL_OFA1_ERROR139
MemoryUNRECOVERABLE_ECC_ERROR_ESCAPE140
Fast PathROBUST_CHANNEL_FAST_PATH_ERROR141
GPUGPU_INIT_ERROR143
NVLinkNVLINK_SAW_ERROR144
NVLinkNVLINK_RLW_ERROR145
NVLinkNVLINK_TLW_ERROR146
NVLinkNVLINK_TREX_ERROR147
NVLinkNVLINK_NVLPW_CTRL_ERROR148
NVLinkNVLINK_NETIR_ERROR149
NVLinkNVLINK_MSE_ERROR150
SecurityROBUST_CHANNEL_KEY_ROTATION_ERROR151
ReservedRESERVED7_ERROR152
ReservedRESERVED8_ERROR153
DRIVERROBUST_CHANNEL_LAST_ERROR153

总结以上表格,可知GPU硬件相关错误的统计思路是在每个硬件模块汇报自己的错误状态,基本规律如下。

Bock Error
Bock Error
Bock Error
Bock Error
Bock Error
ECC/Page Fault
ECC
Off the Bus
HW State Error
HW Block Errors
Function Blocks Errors
Graphic
Codec
DMA
DLA
......
Memory Errors
DRAM
SRAM
Interconnect Errors
PCIe
NVLink
.....

SXid

SXid Message和Xid Message相似,只不过它是用于指示NVSwitch相关的错误,是Switch Xid的简写。

SXid格式

nvidia-nvswitchX: SXid (PCI:<switch_pci_bdf>): <SXid_Value>, <Fatal
or Non-Fatal>, <Link No> < Error Description>
<raw error information for additional troubleshooting>

The following is an example of a SXid error log
[...] nvidia-nvswitch3: SXid (PCI:0000:c1:00.0): 28006, Non-fatal, Link
46 MC TS crumbstore MCTO (First)
[...] nvidia-nvswitch3: SXid (PCI:0000:c1:00.0): 28006, Severity 0
Engine instance 46 Sub-engine instance 00
[...] nvidia-nvswitch3: SXid (PCI:0000:c1:00.0): 28006, Data
{0x00140004, 0x00100000, 0x00140004, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000}

SXid分类

SXid可以根据严重程度分类,大体上可以分为Non-Fatal和Fatal的错误。少数错误如心跳超时,ECC SBE,发生了重传等为Non-Fatal错误,其它大部分错误如错误的command,crossbar overflow,ECC DBE,buffer overflow or underflow等属于NVSwitch内部的错误为Fatal错误。

typedef enum nvswitch_err_type
{
    NVSWITCH_ERR_NO_ERROR                                                = 0x0,

    /*
     * These error enumerations are derived from the error bits defined in each
     * hardware manual.
     *
     * NVSwitch errors values should start from 10000 (decimal) to be
     * distinguishable from GPU errors.
     */

    /* HOST */
    NVSWITCH_ERR_HW_HOST                                               = 10000,
    NVSWITCH_ERR_HW_HOST_PRIV_ERROR                                    = 10001,
    NVSWITCH_ERR_HW_HOST_PRIV_TIMEOUT                                  = 10002,
    NVSWITCH_ERR_HW_HOST_UNHANDLED_INTERRUPT                           = 10003,
    NVSWITCH_ERR_HW_HOST_THERMAL_EVENT_START                           = 10004,
    NVSWITCH_ERR_HW_HOST_THERMAL_EVENT_END                             = 10005,
    NVSWITCH_ERR_HW_HOST_THERMAL_SHUTDOWN                              = 10006,
    NVSWITCH_ERR_HW_HOST_IO_FAILURE                                    = 10007,
    NVSWITCH_ERR_HW_HOST_FIRMWARE_INITIALIZATION_FAILURE               = 10008,
    NVSWITCH_ERR_HW_HOST_LAST,


    /* NPORT: Ingress errors */
    NVSWITCH_ERR_HW_NPORT_INGRESS                                      = 11000,
    NVSWITCH_ERR_HW_NPORT_INGRESS_CMDDECODEERR                         = 11001,
    NVSWITCH_ERR_HW_NPORT_INGRESS_BDFMISMATCHERR                       = 11002,
    NVSWITCH_ERR_HW_NPORT_INGRESS_BUBBLEDETECT                         = 11003,
    NVSWITCH_ERR_HW_NPORT_INGRESS_ACLFAIL                              = 11004,
    NVSWITCH_ERR_HW_NPORT_INGRESS_PKTPOISONSET                         = 11005,
    NVSWITCH_ERR_HW_NPORT_INGRESS_ECCSOFTLIMITERR                      = 11006,
    NVSWITCH_ERR_HW_NPORT_INGRESS_ECCHDRDOUBLEBITERR                   = 11007,
    NVSWITCH_ERR_HW_NPORT_INGRESS_INVALIDCMD                           = 11008,
    NVSWITCH_ERR_HW_NPORT_INGRESS_INVALIDVCSET                         = 11009,
    NVSWITCH_ERR_HW_NPORT_INGRESS_ERRORINFO                            = 11010,
    NVSWITCH_ERR_HW_NPORT_INGRESS_REQCONTEXTMISMATCHERR                = 11011,
    NVSWITCH_ERR_HW_NPORT_INGRESS_NCISOC_HDR_ECC_LIMIT_ERR             = 11012,
    NVSWITCH_ERR_HW_NPORT_INGRESS_NCISOC_HDR_ECC_DBE_ERR               = 11013,
    NVSWITCH_ERR_HW_NPORT_INGRESS_ADDRBOUNDSERR                        = 11014,
    NVSWITCH_ERR_HW_NPORT_INGRESS_RIDTABCFGERR                         = 11015,
    NVSWITCH_ERR_HW_NPORT_INGRESS_RLANTABCFGERR                        = 11016,
    NVSWITCH_ERR_HW_NPORT_INGRESS_REMAPTAB_ECC_DBE_ERR                 = 11017,
    NVSWITCH_ERR_HW_NPORT_INGRESS_RIDTAB_ECC_DBE_ERR                   = 11018,
    NVSWITCH_ERR_HW_NPORT_INGRESS_RLANTAB_ECC_DBE_ERR                  = 11019,
    NVSWITCH_ERR_HW_NPORT_INGRESS_NCISOC_PARITY_ERR                    = 11020,
    NVSWITCH_ERR_HW_NPORT_INGRESS_REMAPTAB_ECC_LIMIT_ERR               = 11021,
    NVSWITCH_ERR_HW_NPORT_INGRESS_RIDTAB_ECC_LIMIT_ERR                 = 11022,
    NVSWITCH_ERR_HW_NPORT_INGRESS_RLANTAB_ECC_LIMIT_ERR                = 11023,
    NVSWITCH_ERR_HW_NPORT_INGRESS_ADDRTYPEERR                          = 11024,
    NVSWITCH_ERR_HW_NPORT_INGRESS_EXTAREMAPTAB_INDEX_ERR               = 11025,
    NVSWITCH_ERR_HW_NPORT_INGRESS_EXTBREMAPTAB_INDEX_ERR               = 11026,
    NVSWITCH_ERR_HW_NPORT_INGRESS_MCREMAPTAB_INDEX_ERR                 = 11027,
    NVSWITCH_ERR_HW_NPORT_INGRESS_EXTAREMAPTAB_ECC_DBE_ERR             = 11028,
    NVSWITCH_ERR_HW_NPORT_INGRESS_EXTBREMAPTAB_ECC_DBE_ERR             = 11029,
    NVSWITCH_ERR_HW_NPORT_INGRESS_MCREMAPTAB_ECC_DBE_ERR               = 11030,
    NVSWITCH_ERR_HW_NPORT_INGRESS_EXTAREMAPTAB_REQCONTEXTMISMATCHERR   = 11031,
    NVSWITCH_ERR_HW_NPORT_INGRESS_EXTBREMAPTAB_REQCONTEXTMISMATCHERR   = 11032,
    NVSWITCH_ERR_HW_NPORT_INGRESS_MCREMAPTAB_REQCONTEXTMISMATCHERR     = 11033,
    NVSWITCH_ERR_HW_NPORT_INGRESS_EXTAREMAPTAB_ACLFAIL                 = 11034,
    NVSWITCH_ERR_HW_NPORT_INGRESS_EXTBREMAPTAB_ACLFAIL                 = 11035,
    NVSWITCH_ERR_HW_NPORT_INGRESS_MCREMAPTAB_ACLFAIL                   = 11036,
    NVSWITCH_ERR_HW_NPORT_INGRESS_EXTAREMAPTAB_ADDRBOUNDSERR           = 11037,
    NVSWITCH_ERR_HW_NPORT_INGRESS_EXTBREMAPTAB_ADDRBOUNDSERR           = 11038,
    NVSWITCH_ERR_HW_NPORT_INGRESS_MCREMAPTAB_ADDRBOUNDSERR             = 11039,
    NVSWITCH_ERR_HW_NPORT_INGRESS_EXTAREMAPTAB_ECC_LIMIT_ERR           = 11040,
    NVSWITCH_ERR_HW_NPORT_INGRESS_EXTBREMAPTAB_ECC_LIMIT_ERR           = 11041,
    NVSWITCH_ERR_HW_NPORT_INGRESS_MCREMAPTAB_ECC_LIMIT_ERR             = 11042,
    NVSWITCH_ERR_HW_NPORT_INGRESS_MCCMDTOUCADDRERR                     = 11043,
    NVSWITCH_ERR_HW_NPORT_INGRESS_READMCREFLECTMEMERR                  = 11044,
    NVSWITCH_ERR_HW_NPORT_INGRESS_EXTAREMAPTAB_ADDRTYPEERR             = 11045,
    NVSWITCH_ERR_HW_NPORT_INGRESS_EXTBREMAPTAB_ADDRTYPEERR             = 11046,
    NVSWITCH_ERR_HW_NPORT_INGRESS_MCREMAPTAB_ADDRTYPEERR               = 11047,
    NVSWITCH_ERR_HW_NPORT_INGRESS_LAST, /* NOTE: Must be last */

    /* NPORT: Egress errors */
    NVSWITCH_ERR_HW_NPORT_EGRESS                                       = 12000,
    NVSWITCH_ERR_HW_NPORT_EGRESS_EGRESSBUFERR                          = 12001,
    NVSWITCH_ERR_HW_NPORT_EGRESS_PKTROUTEERR                           = 12002,
    NVSWITCH_ERR_HW_NPORT_EGRESS_ECCSINGLEBITLIMITERR0                 = 12003,
    NVSWITCH_ERR_HW_NPORT_EGRESS_ECCHDRDOUBLEBITERR0                   = 12004,
    NVSWITCH_ERR_HW_NPORT_EGRESS_ECCDATADOUBLEBITERR0                  = 12005,
    NVSWITCH_ERR_HW_NPORT_EGRESS_ECCSINGLEBITLIMITERR1                 = 12006,
    NVSWITCH_ERR_HW_NPORT_EGRESS_ECCHDRDOUBLEBITERR1                   = 12007,
    NVSWITCH_ERR_HW_NPORT_EGRESS_ECCDATADOUBLEBITERR1                  = 12008,
    NVSWITCH_ERR_HW_NPORT_EGRESS_NCISOCHDRCREDITOVFL                   = 12009,
    NVSWITCH_ERR_HW_NPORT_EGRESS_NCISOCDATACREDITOVFL                  = 12010,
    NVSWITCH_ERR_HW_NPORT_EGRESS_ADDRMATCHERR                          = 12011,
    NVSWITCH_ERR_HW_NPORT_EGRESS_TAGCOUNTERR                           = 12012,
    NVSWITCH_ERR_HW_NPORT_EGRESS_FLUSHRSPERR                           = 12013,
    NVSWITCH_ERR_HW_NPORT_EGRESS_DROPNPURRSPERR                        = 12014,
    NVSWITCH_ERR_HW_NPORT_EGRESS_POISONERR                             = 12015,
    NVSWITCH_ERR_HW_NPORT_EGRESS_PACKET_HEADER                         = 12016,
    NVSWITCH_ERR_HW_NPORT_EGRESS_BUFFER_DATA                           = 12017,
    NVSWITCH_ERR_HW_NPORT_EGRESS_NCISOC_CREDITS                        = 12018,
    NVSWITCH_ERR_HW_NPORT_EGRESS_TAG_DATA                              = 12019,
    NVSWITCH_ERR_HW_NPORT_EGRESS_SEQIDERR                              = 12020,
    NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_HDR_ECC_LIMIT_ERR               = 12021,
    NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_HDR_ECC_DBE_ERR                 = 12022,
    NVSWITCH_ERR_HW_NPORT_EGRESS_RAM_OUT_HDR_ECC_LIMIT_ERR             = 12023,
    NVSWITCH_ERR_HW_NPORT_EGRESS_RAM_OUT_HDR_ECC_DBE_ERR               = 12024,
    NVSWITCH_ERR_HW_NPORT_EGRESS_NCISOCCREDITOVFL                      = 12025,
    NVSWITCH_ERR_HW_NPORT_EGRESS_REQTGTIDMISMATCHERR                   = 12026,
    NVSWITCH_ERR_HW_NPORT_EGRESS_RSPREQIDMISMATCHERR                   = 12027,
    NVSWITCH_ERR_HW_NPORT_EGRESS_PRIVRSPERR                            = 12028,
    NVSWITCH_ERR_HW_NPORT_EGRESS_HWRSPERR                              = 12029,
    NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_HDR_PARITY_ERR                  = 12030,
    NVSWITCH_ERR_HW_NPORT_EGRESS_NCISOC_CREDIT_PARITY_ERR              = 12031,
    NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_FLITTYPE_MISMATCH_ERR           = 12032,
    NVSWITCH_ERR_HW_NPORT_EGRESS_CREDIT_TIME_OUT_ERR                   = 12033,
    NVSWITCH_ERR_HW_NPORT_EGRESS_INVALIDVCSET_ERR                      = 12034,
    NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_SIDEBAND_PD_PARITY_ERR          = 12035,
    NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_REDUCTION_HDR_ECC_LIMIT_ERR     = 12036,
    NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_REDUCTION_HDR_ECC_DBE_ERR       = 12037,
    NVSWITCH_ERR_HW_NPORT_EGRESS_MCRSPCTRLSTORE_ECC_LIMIT_ERR          = 12038,
    NVSWITCH_ERR_HW_NPORT_EGRESS_MCRSPCTRLSTORE_ECC_DBE_ERR            = 12039,
    NVSWITCH_ERR_HW_NPORT_EGRESS_RBCTRLSTORE_ECC_LIMIT_ERR             = 12040,
    NVSWITCH_ERR_HW_NPORT_EGRESS_RBCTRLSTORE_ECC_DBE_ERR               = 12041,
    NVSWITCH_ERR_HW_NPORT_EGRESS_MCREDSGT_ECC_LIMIT_ERR                = 12042,
    NVSWITCH_ERR_HW_NPORT_EGRESS_MCREDSGT_ECC_DBE_ERR                  = 12043,
    NVSWITCH_ERR_HW_NPORT_EGRESS_MCREDBUF_ECC_LIMIT_ERR                = 12044,
    NVSWITCH_ERR_HW_NPORT_EGRESS_MCREDBUF_ECC_DBE_ERR                  = 12045,
    NVSWITCH_ERR_HW_NPORT_EGRESS_MCRSP_RAM_HDR_ECC_LIMIT_ERR           = 12046,
    NVSWITCH_ERR_HW_NPORT_EGRESS_MCRSP_RAM_HDR_ECC_DBE_ERR             = 12047,
    NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_REDUCTION_HDR_PARITY_ERR        = 12048,
    NVSWITCH_ERR_HW_NPORT_EGRESS_NXBAR_REDUCTION_FLITTYPE_MISMATCH_ERR = 12049,
    NVSWITCH_ERR_HW_NPORT_EGRESS_MCRSP_CNT_ERR                         = 12050,
    NVSWITCH_ERR_HW_NPORT_EGRESS_RBRSP_CNT_ERR                         = 12051,
    NVSWITCH_ERR_HW_NPORT_EGRESS_LAST, /* NOTE: Must be last */

    /* NPORT: Fstate errors */
    NVSWITCH_ERR_HW_NPORT_FSTATE                                       = 13000,
    NVSWITCH_ERR_HW_NPORT_FSTATE_TAGPOOLBUFERR                         = 13001,
    NVSWITCH_ERR_HW_NPORT_FSTATE_CRUMBSTOREBUFERR                      = 13002,
    NVSWITCH_ERR_HW_NPORT_FSTATE_SINGLEBITECCLIMITERR_CRUMBSTORE       = 13003,
    NVSWITCH_ERR_HW_NPORT_FSTATE_UNCORRECTABLEECCERR_CRUMBSTORE        = 13004,
    NVSWITCH_ERR_HW_NPORT_FSTATE_SINGLEBITECCLIMITERR_TAGSTORE         = 13005,
    NVSWITCH_ERR_HW_NPORT_FSTATE_UNCORRECTABLEECCERR_TAGSTORE          = 13006,
    NVSWITCH_ERR_HW_NPORT_FSTATE_SINGLEBITECCLIMITERR_FLUSHREQSTORE    = 13007,
    NVSWITCH_ERR_HW_NPORT_FSTATE_UNCORRECTABLEECCERR_FLUSHREQSTORE     = 13008,
    NVSWITCH_ERR_HW_NPORT_FSTATE_LAST, /* NOTE: Must be last */

    /* NPORT: Tstate errors */
    NVSWITCH_ERR_HW_NPORT_TSTATE                                       = 14000,
    NVSWITCH_ERR_HW_NPORT_TSTATE_TAGPOOLBUFERR                         = 14001,
    NVSWITCH_ERR_HW_NPORT_TSTATE_CRUMBSTOREBUFERR                      = 14002,
    NVSWITCH_ERR_HW_NPORT_TSTATE_SINGLEBITECCLIMITERR_CRUMBSTORE       = 14003,
    NVSWITCH_ERR_HW_NPORT_TSTATE_UNCORRECTABLEECCERR_CRUMBSTORE        = 14004,
    NVSWITCH_ERR_HW_NPORT_TSTATE_SINGLEBITECCLIMITERR_TAGSTORE         = 14005,
    NVSWITCH_ERR_HW_NPORT_TSTATE_UNCORRECTABLEECCERR_TAGSTORE          = 14006,
    NVSWITCH_ERR_HW_NPORT_TSTATE_TAGPOOL_ECC_LIMIT_ERR                 = 14007,
    NVSWITCH_ERR_HW_NPORT_TSTATE_TAGPOOL_ECC_DBE_ERR                   = 14008,
    NVSWITCH_ERR_HW_NPORT_TSTATE_CRUMBSTORE_ECC_LIMIT_ERR              = 14009,
    NVSWITCH_ERR_HW_NPORT_TSTATE_CRUMBSTORE_ECC_DBE_ERR                = 14010,
    NVSWITCH_ERR_HW_NPORT_TSTATE_COL_CRUMBSTOREBUFERR                  = 14011,
    NVSWITCH_ERR_HW_NPORT_TSTATE_COL_CRUMBSTORE_ECC_LIMIT_ERR          = 14012,
    NVSWITCH_ERR_HW_NPORT_TSTATE_COL_CRUMBSTORE_ECC_DBE_ERR            = 14013,
    NVSWITCH_ERR_HW_NPORT_TSTATE_TD_TID_RAMBUFERR                      = 14014,
    NVSWITCH_ERR_HW_NPORT_TSTATE_TD_TID_RAM_ECC_LIMIT_ERR              = 14015,
    NVSWITCH_ERR_HW_NPORT_TSTATE_TD_TID_RAM_ECC_DBE_ERR                = 14016,
    NVSWITCH_ERR_HW_NPORT_TSTATE_ATO_ERR                               = 14017,
    NVSWITCH_ERR_HW_NPORT_TSTATE_CAMRSP_ERR                            = 14018,
    NVSWITCH_ERR_HW_NPORT_TSTATE_LAST, /* NOTE: Must be last */

    /* NPORT: Route errors */
    NVSWITCH_ERR_HW_NPORT_ROUTE                                        = 15000,
    NVSWITCH_ERR_HW_NPORT_ROUTE_ROUTEBUFERR                            = 15001,
    NVSWITCH_ERR_HW_NPORT_ROUTE_NOPORTDEFINEDERR                       = 15002,
    NVSWITCH_ERR_HW_NPORT_ROUTE_INVALIDROUTEPOLICYERR                  = 15003,
    NVSWITCH_ERR_HW_NPORT_ROUTE_ECCLIMITERR                            = 15004,
    NVSWITCH_ERR_HW_NPORT_ROUTE_UNCORRECTABLEECCERR                    = 15005,
    NVSWITCH_ERR_HW_NPORT_ROUTE_TRANSDONERESVERR                       = 15006,
    NVSWITCH_ERR_HW_NPORT_ROUTE_PACKET_HEADER                          = 15007,
    NVSWITCH_ERR_HW_NPORT_ROUTE_GLT_ECC_LIMIT_ERR                      = 15008,
    NVSWITCH_ERR_HW_NPORT_ROUTE_GLT_ECC_DBE_ERR                        = 15009,
    NVSWITCH_ERR_HW_NPORT_ROUTE_PDCTRLPARERR                           = 15010,
    NVSWITCH_ERR_HW_NPORT_ROUTE_NVS_ECC_LIMIT_ERR                      = 15011,
    NVSWITCH_ERR_HW_NPORT_ROUTE_NVS_ECC_DBE_ERR                        = 15012,
    NVSWITCH_ERR_HW_NPORT_ROUTE_CDTPARERR                              = 15013,
    NVSWITCH_ERR_HW_NPORT_ROUTE_MCRID_ECC_LIMIT_ERR                    = 15014,
    NVSWITCH_ERR_HW_NPORT_ROUTE_MCRID_ECC_DBE_ERR                      = 15015,
    NVSWITCH_ERR_HW_NPORT_ROUTE_EXTMCRID_ECC_LIMIT_ERR                 = 15016,
    NVSWITCH_ERR_HW_NPORT_ROUTE_EXTMCRID_ECC_DBE_ERR                   = 15017,
    NVSWITCH_ERR_HW_NPORT_ROUTE_RAM_ECC_LIMIT_ERR                      = 15018,
    NVSWITCH_ERR_HW_NPORT_ROUTE_RAM_ECC_DBE_ERR                        = 15019,
    NVSWITCH_ERR_HW_NPORT_ROUTE_INVALID_MCRID_ERR                      = 15020,
    NVSWITCH_ERR_HW_NPORT_ROUTE_LAST, /* NOTE: Must be last */

    /* NPORT: Nport errors */
    NVSWITCH_ERR_HW_NPORT                                              = 16000,
    NVSWITCH_ERR_HW_NPORT_DATAPOISONED                                 = 16001,
    NVSWITCH_ERR_HW_NPORT_UCINTERNAL                                   = 16002,
    NVSWITCH_ERR_HW_NPORT_CINTERNAL                                    = 16003,
    NVSWITCH_ERR_HW_NPORT_LAST, /* NOTE: Must be last */

    /* NVLCTRL: NVCTRL errors */
    NVSWITCH_ERR_HW_NVLCTRL                                            = 17000,
    NVSWITCH_ERR_HW_NVLCTRL_INGRESSECCSOFTLIMITERR                     = 17001,
    NVSWITCH_ERR_HW_NVLCTRL_INGRESSECCHDRDOUBLEBITERR                  = 17002,
    NVSWITCH_ERR_HW_NVLCTRL_INGRESSECCDATADOUBLEBITERR                 = 17003,
    NVSWITCH_ERR_HW_NVLCTRL_INGRESSBUFFERERR                           = 17004,
    NVSWITCH_ERR_HW_NVLCTRL_EGRESSECCSOFTLIMITERR                      = 17005,
    NVSWITCH_ERR_HW_NVLCTRL_EGRESSECCHDRDOUBLEBITERR                   = 17006,
    NVSWITCH_ERR_HW_NVLCTRL_EGRESSECCDATADOUBLEBITERR                  = 17007,
    NVSWITCH_ERR_HW_NVLCTRL_EGRESSBUFFERERR                            = 17008,
    NVSWITCH_ERR_HW_NVLCTRL_LAST, /* NOTE: Must be last */

    /* Nport: Nvlipt errors */
    NVSWITCH_ERR_HW_NVLIPT                                             = 18000,
    NVSWITCH_ERR_HW_NVLIPT_DLPROTOCOL                                  = 18001,
    NVSWITCH_ERR_HW_NVLIPT_DATAPOISONED                                = 18002,
    NVSWITCH_ERR_HW_NVLIPT_FLOWCONTROL                                 = 18003,
    NVSWITCH_ERR_HW_NVLIPT_RESPONSETIMEOUT                             = 18004,
    NVSWITCH_ERR_HW_NVLIPT_TARGETERROR                                 = 18005,
    NVSWITCH_ERR_HW_NVLIPT_UNEXPECTEDRESPONSE                          = 18006,
    NVSWITCH_ERR_HW_NVLIPT_RECEIVEROVERFLOW                            = 18007,
    NVSWITCH_ERR_HW_NVLIPT_MALFORMEDPACKET                             = 18008,
    NVSWITCH_ERR_HW_NVLIPT_STOMPEDPACKETRECEIVED                       = 18009,
    NVSWITCH_ERR_HW_NVLIPT_UNSUPPORTEDREQUEST                          = 18010,
    NVSWITCH_ERR_HW_NVLIPT_UCINTERNAL                                  = 18011,
    NVSWITCH_ERR_HW_NVLIPT_PHYRECEIVER                                 = 18012,
    NVSWITCH_ERR_HW_NVLIPT_BADAN0PKT                                   = 18013,
    NVSWITCH_ERR_HW_NVLIPT_REPLAYTIMEOUT                               = 18014,
    NVSWITCH_ERR_HW_NVLIPT_ADVISORYERROR                               = 18015,
    NVSWITCH_ERR_HW_NVLIPT_CINTERNAL                                   = 18016,
    NVSWITCH_ERR_HW_NVLIPT_HEADEROVERFLOW                              = 18017,
    NVSWITCH_ERR_HW_NVLIPT_RSTSEQ_PHYARB_TIMEOUT                       = 18018,
    NVSWITCH_ERR_HW_NVLIPT_RSTSEQ_PLL_TIMEOUT                          = 18019,
    NVSWITCH_ERR_HW_NVLIPT_CLKCTL_ILLEGAL_REQUEST                      = 18020,
    NVSWITCH_ERR_HW_NVLIPT_LAST, /* NOTE: Must be last */

    /* Nport: Nvltlc TX/RX errors */
    NVSWITCH_ERR_HW_NVLTLC                                             = 19000,
    NVSWITCH_ERR_HW_NVLTLC_TXHDRCREDITOVFERR                           = 19001,
    NVSWITCH_ERR_HW_NVLTLC_TXDATACREDITOVFERR                          = 19002,
    NVSWITCH_ERR_HW_NVLTLC_TXDLCREDITOVFERR                            = 19003,
    NVSWITCH_ERR_HW_NVLTLC_TXDLCREDITPARITYERR                         = 19004,
    NVSWITCH_ERR_HW_NVLTLC_TXRAMHDRPARITYERR                           = 19005,
    NVSWITCH_ERR_HW_NVLTLC_TXRAMDATAPARITYERR                          = 19006,
    NVSWITCH_ERR_HW_NVLTLC_TXUNSUPVCOVFERR                             = 19007,
    NVSWITCH_ERR_HW_NVLTLC_TXSTOMPDET                                  = 19008,
    NVSWITCH_ERR_HW_NVLTLC_TXPOISONDET                                 = 19009,
    NVSWITCH_ERR_HW_NVLTLC_TARGETERR                                   = 19010,
    NVSWITCH_ERR_HW_NVLTLC_TX_PACKET_HEADER                            = 19011,
    NVSWITCH_ERR_HW_NVLTLC_UNSUPPORTEDREQUESTERR                       = 19012,
    NVSWITCH_ERR_HW_NVLTLC_RXDLHDRPARITYERR                            = 19013,
    NVSWITCH_ERR_HW_NVLTLC_RXDLDATAPARITYERR                           = 19014,
    NVSWITCH_ERR_HW_NVLTLC_RXDLCTRLPARITYERR                           = 19015,
    NVSWITCH_ERR_HW_NVLTLC_RXRAMDATAPARITYERR                          = 19016,
    NVSWITCH_ERR_HW_NVLTLC_RXRAMHDRPARITYERR                           = 19017,
    NVSWITCH_ERR_HW_NVLTLC_RXINVALIDAEERR                              = 19018,
    NVSWITCH_ERR_HW_NVLTLC_RXINVALIDBEERR                              = 19019,
    NVSWITCH_ERR_HW_NVLTLC_RXINVALIDADDRALIGNERR                       = 19020,
    NVSWITCH_ERR_HW_NVLTLC_RXPKTLENERR                                 = 19021,
    NVSWITCH_ERR_HW_NVLTLC_RSVCMDENCERR                                = 19022,
    NVSWITCH_ERR_HW_NVLTLC_RSVDATLENENCERR                             = 19023,
    NVSWITCH_ERR_HW_NVLTLC_RSVADDRTYPEERR                              = 19024,
    NVSWITCH_ERR_HW_NVLTLC_RSVRSPSTATUSERR                             = 19025,
    NVSWITCH_ERR_HW_NVLTLC_RSVPKTSTATUSERR                             = 19026,
    NVSWITCH_ERR_HW_NVLTLC_RSVCACHEATTRPROBEREQERR                     = 19027,
    NVSWITCH_ERR_HW_NVLTLC_RSVCACHEATTRPROBERSPERR                     = 19028,
    NVSWITCH_ERR_HW_NVLTLC_DATLENGTATOMICREQMAXERR                     = 19029,
    NVSWITCH_ERR_HW_NVLTLC_DATLENGTRMWREQMAXERR                        = 19030,
    NVSWITCH_ERR_HW_NVLTLC_DATLENLTATRRSPMINERR                        = 19031,
    NVSWITCH_ERR_HW_NVLTLC_INVALIDCACHEATTRPOERR                       = 19032,
    NVSWITCH_ERR_HW_NVLTLC_INVALIDCRERR                                = 19033,
    NVSWITCH_ERR_HW_NVLTLC_RXRESPSTATUSTARGETERR                       = 19034,
    NVSWITCH_ERR_HW_NVLTLC_RXRESPSTATUSUNSUPPORTEDREQUESTERR           = 19035,
    NVSWITCH_ERR_HW_NVLTLC_RXHDROVFERR                                 = 19036,
    NVSWITCH_ERR_HW_NVLTLC_RXDATAOVFERR                                = 19037,
    NVSWITCH_ERR_HW_NVLTLC_STOMPDETERR                                 = 19038,
    NVSWITCH_ERR_HW_NVLTLC_RXPOISONERR                                 = 19039,
    NVSWITCH_ERR_HW_NVLTLC_CORRECTABLEINTERNALERR                      = 19040,
    NVSWITCH_ERR_HW_NVLTLC_RXUNSUPVCOVFERR                             = 19041,
    NVSWITCH_ERR_HW_NVLTLC_RXUNSUPNVLINKCREDITRELERR                   = 19042,
    NVSWITCH_ERR_HW_NVLTLC_RXUNSUPNCISOCCREDITRELERR                   = 19043,
    NVSWITCH_ERR_HW_NVLTLC_RX_PACKET_HEADER                            = 19044,
    NVSWITCH_ERR_HW_NVLTLC_RX_ERR_HEADER                               = 19045,
    NVSWITCH_ERR_HW_NVLTLC_TX_SYS_NCISOC_PARITY_ERR                    = 19046,
    NVSWITCH_ERR_HW_NVLTLC_TX_SYS_NCISOC_HDR_ECC_DBE_ERR               = 19047,
    NVSWITCH_ERR_HW_NVLTLC_TX_SYS_NCISOC_DAT_ECC_DBE_ERR               = 19048,
    NVSWITCH_ERR_HW_NVLTLC_TX_SYS_NCISOC_ECC_LIMIT_ERR                 = 19049,
    NVSWITCH_ERR_HW_NVLTLC_TX_SYS_TXRSPSTATUS_HW_ERR                   = 19050,
    NVSWITCH_ERR_HW_NVLTLC_TX_SYS_TXRSPSTATUS_UR_ERR                   = 19051,
    NVSWITCH_ERR_HW_NVLTLC_TX_SYS_TXRSPSTATUS_PRIV_ERR                 = 19052,
    NVSWITCH_ERR_HW_NVLTLC_RX_SYS_NCISOC_PARITY_ERR                    = 19053,
    NVSWITCH_ERR_HW_NVLTLC_RX_SYS_HDR_RAM_ECC_DBE_ERR                  = 19054,
    NVSWITCH_ERR_HW_NVLTLC_RX_SYS_HDR_RAM_ECC_LIMIT_ERR                = 19055,
    NVSWITCH_ERR_HW_NVLTLC_RX_SYS_DAT0_RAM_ECC_DBE_ERR                 = 19056,
    NVSWITCH_ERR_HW_NVLTLC_RX_SYS_DAT0_RAM_ECC_LIMIT_ERR               = 19057,
    NVSWITCH_ERR_HW_NVLTLC_RX_SYS_DAT1_RAM_ECC_DBE_ERR                 = 19058,
    NVSWITCH_ERR_HW_NVLTLC_RX_SYS_DAT1_RAM_ECC_LIMIT_ERR               = 19059,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_CREQ_RAM_HDR_ECC_DBE_ERR             = 19060,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_CREQ_RAM_DAT_ECC_DBE_ERR             = 19061,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_CREQ_RAM_ECC_LIMIT_ERR               = 19062,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_RSP_RAM_HDR_ECC_DBE_ERR              = 19063,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_RSP_RAM_DAT_ECC_DBE_ERR              = 19064,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_RSP_RAM_ECC_LIMIT_ERR                = 19065,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_COM_RAM_HDR_ECC_DBE_ERR              = 19066,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_COM_RAM_DAT_ECC_DBE_ERR              = 19067,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_COM_RAM_ECC_LIMIT_ERR                = 19068,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_RSP1_RAM_HDR_ECC_DBE_ERR             = 19069,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_RSP1_RAM_DAT_ECC_DBE_ERR             = 19070,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_RSP1_RAM_ECC_LIMIT_ERR               = 19071,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_AN1_TIMEOUT_VC0                      = 19072,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_AN1_TIMEOUT_VC1                      = 19073,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_AN1_TIMEOUT_VC2                      = 19074,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_AN1_TIMEOUT_VC3                      = 19075,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_AN1_TIMEOUT_VC4                      = 19076,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_AN1_TIMEOUT_VC5                      = 19077,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_AN1_TIMEOUT_VC6                      = 19078,
    NVSWITCH_ERR_HW_NVLTLC_TX_LNK_AN1_TIMEOUT_VC7                      = 19079,
    NVSWITCH_ERR_HW_NVLTLC_RX_LNK_RXRSPSTATUS_HW_ERR                   = 19080,
    NVSWITCH_ERR_HW_NVLTLC_RX_LNK_RXRSPSTATUS_UR_ERR                   = 19081,
    NVSWITCH_ERR_HW_NVLTLC_RX_LNK_RXRSPSTATUS_PRIV_ERR                 = 19082,
    NVSWITCH_ERR_HW_NVLTLC_RX_LNK_INVALID_COLLAPSED_RESPONSE_ERR       = 19083,
    NVSWITCH_ERR_HW_NVLTLC_RX_LNK_AN1_HEARTBEAT_TIMEOUT_ERR            = 19084,
    NVSWITCH_ERR_HW_NVLTLC_LAST, /* NOTE: Must be last */

    /* DLPL: errors ( SL1 errors too) */
    NVSWITCH_ERR_HW_DLPL                                               = 20000,
    NVSWITCH_ERR_HW_DLPL_TX_REPLAY                                     = 20001,
    NVSWITCH_ERR_HW_DLPL_TX_RECOVERY_SHORT                             = 20002,
    NVSWITCH_ERR_HW_DLPL_TX_RECOVERY_LONG                              = 20003,
    NVSWITCH_ERR_HW_DLPL_TX_FAULT_RAM                                  = 20004,
    NVSWITCH_ERR_HW_DLPL_TX_FAULT_INTERFACE                            = 20005,
    NVSWITCH_ERR_HW_DLPL_TX_FAULT_SUBLINK_CHANGE                       = 20006,
    NVSWITCH_ERR_HW_DLPL_RX_FAULT_SUBLINK_CHANGE                       = 20007,
    NVSWITCH_ERR_HW_DLPL_RX_FAULT_DL_PROTOCOL                          = 20008,
    NVSWITCH_ERR_HW_DLPL_RX_SHORT_ERROR_RATE                           = 20009,
    NVSWITCH_ERR_HW_DLPL_RX_LONG_ERROR_RATE                            = 20010,
    NVSWITCH_ERR_HW_DLPL_RX_ILA_TRIGGER                                = 20011,
    NVSWITCH_ERR_HW_DLPL_RX_CRC_COUNTER                                = 20012,
    NVSWITCH_ERR_HW_DLPL_LTSSM_FAULT                                   = 20013,
    NVSWITCH_ERR_HW_DLPL_LTSSM_PROTOCOL                                = 20014,
    NVSWITCH_ERR_HW_DLPL_MINION_REQUEST                                = 20015,
    NVSWITCH_ERR_HW_DLPL_FIFO_DRAIN_ERR                                = 20016,
    NVSWITCH_ERR_HW_DLPL_CONST_DET_ERR                                 = 20017,
    NVSWITCH_ERR_HW_DLPL_OFF2SAFE_LINK_DET_ERR                         = 20018,
    NVSWITCH_ERR_HW_DLPL_SAFE2NO_LINK_DET_ERR                          = 20019,
    NVSWITCH_ERR_HW_DLPL_SCRAM_LOCK_ERR                                = 20020,
    NVSWITCH_ERR_HW_DLPL_SYM_LOCK_ERR                                  = 20021,
    NVSWITCH_ERR_HW_DLPL_SYM_ALIGN_END_ERR                             = 20022,
    NVSWITCH_ERR_HW_DLPL_FIFO_SKEW_ERR                                 = 20023,
    NVSWITCH_ERR_HW_DLPL_TRAIN2SAFE_LINK_DET_ERR                       = 20024,
    NVSWITCH_ERR_HW_DLPL_HS2SAFE_LINK_DET_ERR                          = 20025,
    NVSWITCH_ERR_HW_DLPL_FENCE_ERR                                     = 20026,
    NVSWITCH_ERR_HW_DLPL_SAFE_NO_LD_ERR                                = 20027,
    NVSWITCH_ERR_HW_DLPL_E2SAFE_LD_ERR                                 = 20028,
    NVSWITCH_ERR_HW_DLPL_RC_RXPWR_ERR                                  = 20029,
    NVSWITCH_ERR_HW_DLPL_RC_TXPWR_ERR                                  = 20030,
    NVSWITCH_ERR_HW_DLPL_RC_DEADLINE_ERR                               = 20031,
    NVSWITCH_ERR_HW_DLPL_TX_HS2LP_ERR                                  = 20032,
    NVSWITCH_ERR_HW_DLPL_RX_HS2LP_ERR                                  = 20033,
    NVSWITCH_ERR_HW_DLPL_LTSSM_FAULT_UP                                = 20034,
    NVSWITCH_ERR_HW_DLPL_LTSSM_FAULT_DOWN                              = 20035,
    NVSWITCH_ERR_HW_DLPL_PHY_A                                         = 20036,
    NVSWITCH_ERR_HW_DLPL_TX_PL_ERROR                                   = 20037,
    NVSWITCH_ERR_HW_DLPL_RX_PL_ERROR                                   = 20038,
    NVSWITCH_ERR_HW_DLPL_LAST, /* NOTE: Must be last */

    /* AFS: errors */
    NVSWITCH_ERR_HW_AFS                                                = 21000,
    NVSWITCH_ERR_HW_AFS_UC_INGRESS_CREDIT_OVERFLOW                     = 21001,
    NVSWITCH_ERR_HW_AFS_UC_INGRESS_CREDIT_UNDERFLOW                    = 21002,
    NVSWITCH_ERR_HW_AFS_UC_EGRESS_CREDIT_OVERFLOW                      = 21003,
    NVSWITCH_ERR_HW_AFS_UC_EGRESS_CREDIT_UNDERFLOW                     = 21004,
    NVSWITCH_ERR_HW_AFS_UC_INGRESS_NON_BURSTY_PKT_DETECTED             = 21005,
    NVSWITCH_ERR_HW_AFS_UC_INGRESS_NON_STICKY_PKT_DETECTED             = 21006,
    NVSWITCH_ERR_HW_AFS_UC_INGRESS_BURST_GT_17_DATA_VC_DETECTED        = 21007,
    NVSWITCH_ERR_HW_AFS_UC_INGRESS_BURST_GT_1_NONDATA_VC_DETECTED      = 21008,
    NVSWITCH_ERR_HW_AFS_UC_INVALID_DST                                 = 21009,
    NVSWITCH_ERR_HW_AFS_UC_PKT_MISROUTE                                = 21010,
    NVSWITCH_ERR_HW_AFS_LAST, /* NOTE: Must be last */

    /* MINION: errors */
    NVSWITCH_ERR_HW_MINION                                             = 22000,
    NVSWITCH_ERR_HW_MINION_UCODE_IMEM                                  = 22001,
    NVSWITCH_ERR_HW_MINION_UCODE_DMEM                                  = 22002,
    NVSWITCH_ERR_HW_MINION_HALT                                        = 22003,
    NVSWITCH_ERR_HW_MINION_BOOT_ERROR                                  = 22004,
    NVSWITCH_ERR_HW_MINION_TIMEOUT                                     = 22005,
    NVSWITCH_ERR_HW_MINION_DLCMD_FAULT                                 = 22006,
    NVSWITCH_ERR_HW_MINION_DLCMD_TIMEOUT                               = 22007,
    NVSWITCH_ERR_HW_MINION_DLCMD_FAIL                                  = 22008,
    NVSWITCH_ERR_HW_MINION_FATAL_INTR                                  = 22009,
    NVSWITCH_ERR_HW_MINION_WATCHDOG                                    = 22010,
    NVSWITCH_ERR_HW_MINION_EXTERR                                      = 22011,
    NVSWITCH_ERR_HW_MINION_FATAL_LINK_INTR                             = 22012,
    NVSWITCH_ERR_HW_MINION_NONFATAL                                    = 22013,
    NVSWITCH_ERR_HW_MINION_LAST, /* NOTE: Must be last */

    /* NXBAR errors */
    NVSWITCH_ERR_HW_NXBAR                                              = 23000,
    NVSWITCH_ERR_HW_NXBAR_TILE_INGRESS_BUFFER_OVERFLOW                 = 23001,
    NVSWITCH_ERR_HW_NXBAR_TILE_INGRESS_BUFFER_UNDERFLOW                = 23002,
    NVSWITCH_ERR_HW_NXBAR_TILE_EGRESS_CREDIT_OVERFLOW                  = 23003,
    NVSWITCH_ERR_HW_NXBAR_TILE_EGRESS_CREDIT_UNDERFLOW                 = 23004,
    NVSWITCH_ERR_HW_NXBAR_TILE_INGRESS_NON_BURSTY_PKT                  = 23005,
    NVSWITCH_ERR_HW_NXBAR_TILE_INGRESS_NON_STICKY_PKT                  = 23006,
    NVSWITCH_ERR_HW_NXBAR_TILE_INGRESS_BURST_GT_9_DATA_VC              = 23007,
    NVSWITCH_ERR_HW_NXBAR_TILE_INGRESS_PKT_INVALID_DST                 = 23008,
    NVSWITCH_ERR_HW_NXBAR_TILE_INGRESS_PKT_PARITY_ERROR                = 23009,
    NVSWITCH_ERR_HW_NXBAR_TILEOUT_INGRESS_BUFFER_OVERFLOW              = 23010,
    NVSWITCH_ERR_HW_NXBAR_TILEOUT_INGRESS_BUFFER_UNDERFLOW             = 23011,
    NVSWITCH_ERR_HW_NXBAR_TILEOUT_EGRESS_CREDIT_OVERFLOW               = 23012,
    NVSWITCH_ERR_HW_NXBAR_TILEOUT_EGRESS_CREDIT_UNDERFLOW              = 23013,
    NVSWITCH_ERR_HW_NXBAR_TILEOUT_INGRESS_NON_BURSTY_PKT               = 23014,
    NVSWITCH_ERR_HW_NXBAR_TILEOUT_INGRESS_NON_STICKY_PKT               = 23015,
    NVSWITCH_ERR_HW_NXBAR_TILEOUT_INGRESS_BURST_GT_9_DATA_VC           = 23016,
    NVSWITCH_ERR_HW_NXBAR_TILEOUT_EGRESS_CDT_PARITY_ERROR              = 23017,
    NVSWITCH_ERR_HW_NXBAR_LAST, /* NOTE: Must be last */

    /* NPORT: SOURCETRACK errors */
    NVSWITCH_ERR_HW_NPORT_SOURCETRACK                                         = 24000,
    NVSWITCH_ERR_HW_NPORT_SOURCETRACK_CREQ_TCEN0_CRUMBSTORE_ECC_LIMIT_ERR     = 24001,
    NVSWITCH_ERR_HW_NPORT_SOURCETRACK_CREQ_TCEN0_TD_CRUMBSTORE_ECC_LIMIT_ERR  = 24002,
    NVSWITCH_ERR_HW_NPORT_SOURCETRACK_CREQ_TCEN1_CRUMBSTORE_ECC_LIMIT_ERR     = 24003,
    NVSWITCH_ERR_HW_NPORT_SOURCETRACK_CREQ_TCEN0_CRUMBSTORE_ECC_DBE_ERR       = 24004,
    NVSWITCH_ERR_HW_NPORT_SOURCETRACK_CREQ_TCEN0_TD_CRUMBSTORE_ECC_DBE_ERR    = 24005,
    NVSWITCH_ERR_HW_NPORT_SOURCETRACK_CREQ_TCEN1_CRUMBSTORE_ECC_DBE_ERR       = 24006,
    NVSWITCH_ERR_HW_NPORT_SOURCETRACK_SOURCETRACK_TIME_OUT_ERR                = 24007,
    NVSWITCH_ERR_HW_NPORT_SOURCETRACK_DUP_CREQ_TCEN0_TAG_ERR                  = 24008,
    NVSWITCH_ERR_HW_NPORT_SOURCETRACK_INVALID_TCEN0_RSP_ERR                   = 24009,
    NVSWITCH_ERR_HW_NPORT_SOURCETRACK_INVALID_TCEN1_RSP_ERR                   = 24010,
    NVSWITCH_ERR_HW_NPORT_SOURCETRACK_LAST, /* NOTE: Must be last */

    /* NVLIPT_LNK errors */
    NVSWITCH_ERR_HW_NVLIPT_LNK                                         = 25000,
    NVSWITCH_ERR_HW_NVLIPT_LNK_ILLEGALLINKSTATEREQUEST                 = 25001,
    NVSWITCH_ERR_HW_NVLIPT_LNK_FAILEDMINIONREQUEST                     = 25002,
    NVSWITCH_ERR_HW_NVLIPT_LNK_RESERVEDREQUESTVALUE                    = 25003,
    NVSWITCH_ERR_HW_NVLIPT_LNK_LINKSTATEWRITEWHILEBUSY                 = 25004,
    NVSWITCH_ERR_HW_NVLIPT_LNK_LINK_STATE_REQUEST_TIMEOUT              = 25005,
    NVSWITCH_ERR_HW_NVLIPT_LNK_WRITE_TO_LOCKED_SYSTEM_REG_ERR          = 25006,
    NVSWITCH_ERR_HW_NVLIPT_LNK_SLEEPWHILEACTIVELINK                    = 25007,
    NVSWITCH_ERR_HW_NVLIPT_LNK_RSTSEQ_PHYCTL_TIMEOUT                   = 25008,
    NVSWITCH_ERR_HW_NVLIPT_LNK_RSTSEQ_CLKCTL_TIMEOUT                   = 25009,
    NVSWITCH_ERR_HW_NVLIPT_LNK_ALI_TRAINING_FAIL                       = 25010,
    NVSWITCH_ERR_HW_NVLIPT_LNK_LAST, /* Note: Must be last */

    /* SOE errors */
    NVSWITCH_ERR_HW_SOE                                                = 26000,
    NVSWITCH_ERR_HW_SOE_RESET                                          = 26001,
    NVSWITCH_ERR_HW_SOE_BOOTSTRAP                                      = 26002,
    NVSWITCH_ERR_HW_SOE_COMMAND_QUEUE                                  = 26003,
    NVSWITCH_ERR_HW_SOE_TIMEOUT                                        = 26004,
    NVSWITCH_ERR_HW_SOE_SHUTDOWN                                       = 26005,
    NVSWITCH_ERR_HW_SOE_HALT                                           = 26006,
    NVSWITCH_ERR_HW_SOE_EXTERR                                         = 26007,
    NVSWITCH_ERR_HW_SOE_WATCHDOG                                       = 26008,
    NVSWITCH_ERR_HW_SOE_LAST, /* Note: Must be last */

    /* NPORT: Multicast Tstate errors */
    NVSWITCH_ERR_HW_NPORT_MULTICASTTSTATE                              = 28000,
    NVSWITCH_ERR_HW_NPORT_MULTICASTTSTATE_TAGPOOL_ECC_LIMIT_ERR        = 28001,
    NVSWITCH_ERR_HW_NPORT_MULTICASTTSTATE_TAGPOOL_ECC_DBE_ERR          = 28002,
    NVSWITCH_ERR_HW_NPORT_MULTICASTTSTATE_CRUMBSTORE_ECC_LIMIT_ERR     = 28003,
    NVSWITCH_ERR_HW_NPORT_MULTICASTTSTATE_CRUMBSTORE_ECC_DBE_ERR       = 28004,
    NVSWITCH_ERR_HW_NPORT_MULTICASTTSTATE_CRUMBSTORE_BUF_OVERWRITE_ERR = 28005,
    NVSWITCH_ERR_HW_NPORT_MULTICASTTSTATE_CRUMBSTORE_MCTO_ERR          = 28006,
    NVSWITCH_ERR_HW_NPORT_MULTICASTTSTATE_LAST, /* Note: Must be last */

    /* NPORT: Reduction Tstate errors */
    NVSWITCH_ERR_HW_NPORT_REDUCTIONTSTATE                              = 29000,
    NVSWITCH_ERR_HW_NPORT_REDUCTIONTSTATE_TAGPOOL_ECC_LIMIT_ERR        = 29001,
    NVSWITCH_ERR_HW_NPORT_REDUCTIONTSTATE_TAGPOOL_ECC_DBE_ERR          = 29002,
    NVSWITCH_ERR_HW_NPORT_REDUCTIONTSTATE_CRUMBSTORE_ECC_LIMIT_ERR     = 29003,
    NVSWITCH_ERR_HW_NPORT_REDUCTIONTSTATE_CRUMBSTORE_ECC_DBE_ERR       = 29004,
    NVSWITCH_ERR_HW_NPORT_REDUCTIONTSTATE_CRUMBSTORE_BUF_OVERWRITE_ERR = 29005,
    NVSWITCH_ERR_HW_NPORT_REDUCTIONTSTATE_CRUMBSTORE_RTO_ERR           = 29006,
    NVSWITCH_ERR_HW_NPORT_REDUCTIONTSTATE_LAST, /* Note: Must be last */

    /* Please update nvswitch_translate_hw_errors with a newly added error class. */
    NVSWITCH_ERR_LAST
    /* See enum modification guidelines at the top of this file */
} NVSWITCH_ERR_TYPE;

总结

整体而言当GPU以及NVLINK出错时,硬件内部会产生错误信号由驱动收集组织成Xid错误码。而NVSWITCH内部报错由NVSWITCH的驱动程序收集组织成SXid的错误码。根据芯片模块的微架构,制定报错机制可以在遇到问题时定位根因。

参考链接

GPU Debug Guidelines
Fabric Manager User Guide

智能网联汽车的安全员高级考试涉及多个方面的专业知识,包括但不限于自动驾驶技术原理、车辆传感器融合、网络安全防护以及法律法规等内容。以下是针对该主题的些核心知识点解析: ### 关于智能网联车安全员高级考试的核心内容 #### 1. 自动驾驶分级标准 国际自动机工程师学会(SAE International)定义了六个级别的自动驾驶等级,从L0到L5[^1]。其中,L3及以上级别需要安全员具备更高的应急处理能力。 #### 2. 车辆感知系统的组成与功能 智能网联车通常配备多种传感器,如激光雷达、毫米波雷达、摄像头和超声波传感器等。这些设备协同工作以实现环境感知、障碍物检测等功能[^2]。 #### 3. 数据通信与网络安全 智能网联车依赖V2X(Vehicle-to-Everything)技术进行数据交换,在此过程中需防范潜在的网络攻击风险,例如中间人攻击或恶意软件入侵[^3]。 #### 4. 法律法规要求 不同国家和地区对于无人驾驶测试及运营有着严格的规定,考生应熟悉当地交通法典中有关自动化驾驶部分的具体条款[^4]。 ```python # 示例代码:模拟简单决策逻辑 def decide_action(sensor_data): if sensor_data['obstacle'] and not sensor_data['emergency']: return 'slow_down' elif sensor_data['pedestrian_crossing']: return 'stop_and_yield' else: return 'continue_driving' example_input = {'obstacle': True, 'emergency': False, 'pedestrian_crossing': False} action = decide_action(example_input) print(f"Action to take: {action}") ``` 需要注意的是,“橙点同学”作为特定平台上的学习资源名称,并不提供官方认证的标准答案集;建议通过正规渠道获取教材并参加培训课程来准备此类资格认证考试。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值