今天巡检一套rac环境,发现节点二上有Error字样,逐行看完细节是服务器温度过高导致的,半夜又没啥业务的,查看硬件也没任何告警,哎,某品牌的品控确实越来越烂,log一下
Mar 21 02:53:21 hydb1 kernel: mce: [Hardware Error]: Machine check events logged
Mar 21 02:53:21 hydb1 mcelog[2943]: Hardware event. This is not a software error.
Mar 21 02:53:21 hydb1 mcelog[2943]: MCE 0
Mar 21 02:53:21 hydb1 mcelog[2943]: CPU 48 THERMAL EVENT TSC 370b9f6088f9408
Mar 21 02:53:21 hydb1 mcelog[2943]: TIME 1681843802 Wed Mar 21 02:50:02 2023
Mar 21 02:53:21 hydb1 mcelog[2943]: Processor 48 heated above trip temperature. Throttling enabled.
Mar 21 02:53:21 hydb1 mcelog[2943]: Please check your system cooling. Performance will be impacted
Mar 21 02:53:21 hydb1 mcelog[2943]: STATUS 880003c3 MCGSTATUS 0
Mar 21 02:53:21 hydb1 mcelog[2943]: MCGCAP f000814 APICID 21 SOCKETID 1
Mar 21 02:53:21 hydb1 mcelog[2943]: CPUID Vendor Intel Family 6 Model 85
Mar 21 02:53:21 hydb1 mcelog[2943]: Hardware event. This is not a software error.
Mar 21 02:53:21 hydb1 mcelog[2943]: MCE 1
Mar 21 02:53:21 hydb1 mcelog[2943]: CPU 16 THERMAL EVENT TSC 370b9f6088fb95a
Mar 21 02:53:21 hydb1 mcelog[2943]: TIME 1681843802 Wed Mar 21 02:50:02 2023
Mar 21 02:53:21 hydb1 mcelog[2943]: Processor 16 heated above trip temperature. Throttling enabled.
Mar 21 02:53:21 hydb1 mcelog[2943]: Please check your system cooling. Performance will be impacted
Mar 21 02:53:21 hydb1 mcelog[2943]: STATUS 880003c3 MCGSTATUS 0
Mar 21 02:53:21 hydb1 mcelog[2943]: MCGCAP f000814 APICID 20 SOCKETID 1
Mar 21 02:53:21 hydb1 mcelog[2943]: CPUID Vendor Intel Family 6 Model 85
Mar 21 02:53:21 hydb1 mcelog[2943]: Hardware event. This is not a software error.
Mar 21 02:53:21 hydb1 mcelog[2943]: MCE 2
Mar 21 02:53:21 hydb1 mcelog[2943]: CPU 48 THERMAL EVENT TSC 370b9f608b51c98
Mar 21 02:53:21 hydb1 mcelog[2943]: TIME 1681843802 Wed Mar 21 02:50:02 2023
Mar 21 02:53:21 hydb1 mcelog[2943]: Processor 48 below trip temperature. Throttling disabled
Mar 21 02:53:21 hydb1 mcelog[2943]: STATUS 88030282 MCGSTATUS 0
Mar 21 02:53:21 hydb1 mcelog[2943]: MCGCAP f000814 APICID 21 SOCKETID 1
Mar 21 02:53:21 hydb1 mcelog[2943]: CPUID Vendor Intel Family 6 Model 85
Mar 21 02:53:21 hydb1 mcelog[2943]: Hardware event. This is not a software error.
Mar 21 02:53:21 hydb1 mcelog[2943]: MCE 3
Mar 21 02:53:21 hydb1 mcelog[2943]: CPU 16 THERMAL EVENT TSC 370b9f608b64df0
Mar 21 02:53:21 hydb1 mcelog[2943]: TIME 1681843802 Wed Mar 21 02:50:02 2023
Mar 21 02:53:21 hydb1 mcelog[2943]: Processor 16 below trip temperature. Throttling disabled
Mar 21 02:53:21 hydb1 mcelog[2943]: STATUS 88030282 MCGSTATUS 0
Mar 21 02:53:21 hydb1 mcelog[2943]: MCGCAP f000814 APICID 20 SOCKETID 1
Mar 21 02:53:21 hydb1 mcelog[2943]: CPUID Vendor Intel Family 6 Model 85