1写在前面
对每个人而言,真正的职责只有一个:找到自我。然后在心中坚守其一生,全心全意,永不停息。所有其它的路都是不完整的,是人的逃避方式,是对大众理想的懦弱回归,是随波逐流,是对内心的恐惧 ——赫尔曼·黑塞《德米安》
系统出现问题,或者存在异常的日志信息,某些进程运行缓慢,往往可能需要排除是否存在硬件问题,所以需要对硬件信息进行监控,查看是否存在异常信息
启动系统时会进行系统硬件检测,这些检测信息同时还会被写到 dmesg buffer
中, 在 Linux 系统中 ,dmesg buffer
记录下面一些信息:
-
启动系统
硬件检测信息
-
驱动程序
的信息 -
查看
系统警告
或者错误
使用 dmesg
和 jounalctl -k
选项 可以查看 dmesg buffer
的信息。
查看最后 10 行的数据信息,系统事件和操作的信息
-
┌──[root@liruilongs.github.io]-[~]
-
└─$dmesg | tail -f -n 10
-
[56429.310740] br0: port 3(vnet4) entered blocking state
-
[56429.310741] br0: port 3(vnet4) entered forwarding state
-
[56431.360035] privbr0: port 3(vnet3) entered learning state
-
[56433.408995] privbr0: port 3(vnet3) entered forwarding state
-
[56433.409013] privbr0: topology change detected, propagating
-
[56440.853859] kvm [45569]: vcpu0, guest rIP: 0xffffffff9e060e38 disabled perfctr wrmsr: 0xc2 data 0xffff
-
[59043.415922] device-mapper: uevent: version 1.0.3
-
[59043.416104] device-mapper: ioctl: 4.39.0-ioctl (2018-04-03) initialised: dm-devel@redhat.com
-
[59176.644265] kvm [45401]: vcpu0, guest rIP: 0xffffffffa0260e38 disabled perfctr wrmsr: 0xc2 data 0xffff
-
[59463.089835] bash (2579): drop_caches: 3
dmesg -T
可以将时间转化为人类可读的形式
-
┌──[root@liruilongs.github.io]-[~]
-
└─$dmesg -T | tail -f -n 10
-
[Sun Sep 17 02:19:18 2023] br0: port 3(vnet4) entered blocking state
-
[Sun Sep 17 02:19:18 2023] br0: port 3(vnet4) entered forwarding state
-
[Sun Sep 17 02:19:20 2023] privbr0: port 3(vnet3) entered learning state
-
[Sun Sep 17 02:19:22 2023] privbr0: port 3(vnet3) entered forwarding state
-
[Sun Sep 17 02:19:22 2023] privbr0: topology change detected, propagating
-
[Sun Sep 17 02:19:29 2023] kvm [45569]: vcpu0, guest rIP: 0xffffffff9e060e38 disabled perfctr wrmsr: 0xc2 data 0xffff
-
[Sun Sep 17 03:02:52 2023] device-mapper: uevent: version 1.0.3
-
[Sun Sep 17 03:02:52 2023] device-mapper: ioctl: 4.39.0-ioctl (2018-04-03) initialised: dm-devel@redhat.com
-
[Sun Sep 17 03:05:05 2023] kvm [45401]: vcpu0, guest rIP: 0xffffffffa0260e38 disabled perfctr wrmsr: 0xc2 data 0xffff
-
[Sun Sep 17 03:09:52 2023] bash (2579): drop_caches: 3
查看前 10 行的数据信息.Linux内核启动过程的信息
-
┌──[root@liruilongs.github.io]-[~]
-
└─$dmesg -T | head -n 10
-
[Sat Sep 16 10:38:49 2023] Linux version 4.18.0-193.el8.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com) (gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Fri Mar 27 14:35:58 UTC 2020
-
[Sat Sep 16 10:38:49 2023] Command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-193.el8.x86_64 root=UUID=893bf4a5-f929-4a4f-9bb3-f1694d8ad757 ro resume=UUID=56504db0-34ca-458f-970b-1591a6af18bb rhgb quiet rd.shell=0
-
[Sat Sep 16 10:38:49 2023] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
-
[Sat Sep 16 10:38:49 2023] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
-
[Sat Sep 16 10:38:49 2023] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
-
[Sat Sep 16 10:38:49 2023] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
-
[Sat Sep 16 10:38:49 2023] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
-
[Sat Sep 16 10:38:49 2023] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
-
[Sat Sep 16 10:38:49 2023] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
-
[Sat Sep 16 10:38:49 2023] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
-
┌──[root@liruilongs.github.io]-[~]
-
└─$
通过 journalctl -k
命令来查看
-
┌──[root@liruilongs.github.io]-[~]
-
└─$ journalctl -k
-
-- Logs begin at 五 2023-11-10 10:32:56 CST, end at 五 2023-11-10 10:36:16 CST. --
-
11月 10 10:32:56 vms81.liruilongs.github.io kernel: Initializing cgroup subsys cpuset
-
11月 10 10:32:56 vms81.liruilongs.github.io kernel: Initializing cgroup subsys cpu
-
11月 10 10:32:56 vms81.liruilongs.github.io kernel: Initializing cgroup subsys cpuacct
-
11月 10 10:32:56 vms81.liruilongs.github.io kernel: Linux version 3.10.0-1160.76.1.el7.x86_64 (mockbuild@kbuilder.bsys.c
-
11月 10 10:32:56 vms81.liruilongs.github.io kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-1160.76.1.el7.x86_64 r
-
......
在日常维护中,往往结合 grep
快速定位问题
-
┌──[root@liruilongs.github.io]-[~]
-
└─$ dmesg -T | grep -i error
-
[五 11月 10 10:32:57 2023] BERT: Boot Error Record Table support is disabled. Enable it by using bert_enable as kernel parameter.
-
┌──[root@liruilongs.github.io]-[~]
-
└─$ dmesg -T | grep -i warn
-
[五 11月 10 10:32:54 2023] Warning: Intel Processor - this hardware has not undergone upstream testing. Please consult http://wiki.centos.org/FAQ for more information
-
┌──[root@liruilongs.github.io]-[~]
-
└─$
2硬件信息查看
当前系统中一般会使用多个 CPU,每个 CPU 有多个核心,每个内核还可能具备超线程并具备不同级别的共享缓存
lscpu
命令可以查看系统的 CPU 的信息
Intel CPU 信息
-
┌──[root@liruilongs.github.io]-[~]
-
└─$lscpu
-
Architecture: x86_64
-
CPU op-mode(s): 32-bit, 64-bit
-
Byte Order: Little Endian
-
CPU(s): 8
-
On-line CPU(s) list: 0-7
-
Thread(s) per core: 1
-
Core(s) per socket: 4
-
Socket(s): 2
-
NUMA node(s): 1
-
Vendor ID: GenuineIntel
-
CPU family: 6
-
Model: 140
-
Model name: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
-
Stepping: 1
-
CPU MHz: 2419.226
-
BogoMIPS: 4838.45
-
Virtualization: VT-x
-
Hypervisor vendor: VMware
-
Virtualization type: full
-
L1d cache: 48K
-
L1i cache: 32K
-
L2 cache: 1280K
-
L3 cache: 8192K
-
NUMA node0 CPU(s): 0-7
-
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid movdiri movdir64b md_clear flush_l1d arch_capabilities
简单的输出信息说明
系统架构是 x86_64(64 位)
,支持 32 位和 64 位的 CPU 操作模式。字节顺序为小端(Little Endian)。系统有 8 个 CPU 核心
,每个核心有 1 个线程
。每个 CPU 插槽有 4 个核心,共有 2 个插槽。NUMA 节点数为 1。
以下是有关您的 CPU 的信息:
-
厂商 ID:GenuineIntel
-
CPU 家族:6
-
型号:140
-
型号名称:11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
-
步进:1
-
CPU 频率:2419.226 MHz
-
BogoMIPS:4838.45
-
支持虚拟化技术:VT-x
-
Hypervisor 厂商:VMware
-
虚拟化类型:full
-
关于 CPU 缓存的信息:
-
L1d 缓存:48K
-
L1i 缓存:32K
-
L2 缓存:1280K
-
L3 缓存:8192K
-
系统具有许多 CPU 功能和特性,包括浮点运算单元(fpu)、虚拟化扩展(vmx)、超线程(ht)、AES 指令集(aes)、AVX 指令集(avx)等等。
服务器 CPU 信息查看
-
┌──[root@hp-ProLiant-SL270s-Gen8-SE]-[~]
-
└─$ lscpu
-
架构: x86_64
-
CPU 运行模式: 32-bit, 64-bit
-
Address sizes: 46 bits physical, 48 bits virtual
-
字节序: Little Endian
-
CPU: 32
-
在线 CPU 列表: 0-31
-
厂商 ID: GenuineIntel
-
型号名称: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
-
CPU 系列: 6
-
型号: 45
-
每个核的线程数: 2
-
每个座的核数: 8
-
座: 2
-
步进: 7
-
CPU 最大 MHz: 3300.0000
-
CPU 最小 MHz: 1200.0000
-
BogoMIPS: 5187.49
-
标记: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fx
-
sr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_go
-
od nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
-
tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx
-
lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida
-
arat pln pts md_clear flush_l1d
-
Virtualization features:
-
虚拟化: VT-x
-
Caches (sum of all):
-
L1d: 512 KiB (16 instances)
-
L1i: 512 KiB (16 instances)
-
L2: 4 MiB (16 instances)
-
L3: 40 MiB (2 instances)
-
NUMA:
-
NUMA 节点: 2
-
NUMA 节点0 CPU: 0-7,16-23
-
NUMA 节点1 CPU: 8-15,24-31
-
Vulnerabilities:
-
Itlb multihit: KVM: Mitigation: VMX disabled
-
L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
-
Mds: Mitigation; Clear CPU buffers; SMT vulnerable
-
Meltdown: Mitigation; PTI
-
Mmio stale data: Unknown: No mitigations
-
Retbleed: Not affected
-
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
-
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
-
Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS
-
Not affected
-
Srbds: Not affected
-
Tsx async abort: Not affected
-
┌──[root@hp-ProLiant-SL270s-Gen8-SE]-[~]
-
└─$
基本信息:
-
CPU: Intel Xeon E5-2670, Sandy Bridge-EP微架构,双芯片(Socket)每个Socket 8核心
-
多线程支持:每个核心支持两个线程
-
缓存结构:每个核心有512KB L1缓存,4MB L2缓存,两颗CPU共享40MB L3缓存
-
NUMA结构:有两个NUMA节点,第一个节点CPU为0-7,第二个为8-15
-
虚拟化支持:支持Intel VT-x虚拟化技术
-
性能信息:基准指标5187.49 Bogomips
-
支持特性:SSE,AVX,虚拟化、数据本地性等
-
漏洞修复:针对Meltdown、Spectre等已修复
AMD CPU 信息
-
┌──[root@liruilongs.github.io]-[~]
-
└─$ lscpu
-
Architecture: x86_64
-
CPU op-mode(s): 32-bit, 64-bit
-
Byte Order: Little Endian
-
CPU(s): 4
-
On-line CPU(s) list: 0-3
-
Thread(s) per core: 1
-
Core(s) per socket: 2
-
座: 2
-
NUMA 节点: 1
-
厂商 ID: AuthenticAMD
-
CPU 系列: 23
-
型号: 17
-
型号名称: AMD Ryzen 7 2700U with Radeon Vega Mobile Gfx
-
步进: 0
-
CPU MHz: 2195.781
-
BogoMIPS: 4391.56
-
超管理器厂商: VMware
-
虚拟化类型: 完全
-
L1d 缓存: 32K
-
L1i 缓存: 64K
-
L2 缓存: 512K
-
L3 缓存: 4096K
-
NUMA 节点0 CPU: 0-3
-
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl tsc_reliable nonstop_tsc extd_apicid eagerfpu pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec arat overflow_recov succor
-
┌──[root@liruilongs.github.io]-[~]
-
└─$
dmidecode
可以查看 主板设备信息
-
┌──[root@hp-ProLiant-SL270s-Gen8-SE]-[~]
-
└─$ dmidecode | head -n 10
-
# dmidecode 3.3
-
Getting SMBIOS data from sysfs.
-
SMBIOS 2.8 present.
-
188 structures occupying 5969 bytes.
-
Table at 0xBFBD8000.
-
Handle 0x0000, DMI type 0, 24 bytes
-
BIOS Information
-
Vendor: HP
-
Version: P75
-
┌──[root@hp-ProLiant-SL270s-Gen8-SE]-[~]
-
└─$
查看 usb 设备信息,通过 -vv
可以查看详细信息
-
┌──[root@liruilongs.github.io]-[~]
-
└─$lsusb
-
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
-
Bus 003 Device 002: ID 0e0f:0003 VMware, Inc. Virtual Mouse
-
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
-
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
-
Bus 002 Device 003: ID 0e0f:0002 VMware, Inc. Virtual USB Hub
-
Bus 002 Device 002: ID 0e0f:0008 VMware, Inc.
-
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
lspci
命令用于列出连接到 PCI 总线的设备信息,它可以显示计算机上安装的 PCI 设备的详细信息,包括网络适配器、显卡、声卡、存储控制器等。 -vv 选项可以查看详细的信息
-
┌──[root@liruilongs.github.io]-[~]
-
└─$lspci -vv
-
00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (rev 01)
-
Subsystem: VMware Virtual Machine Chipset
-
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
-
Latency: 0
-
00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge (rev 01) (prog-if 00 [Normal decode])
-
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
-
Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
-
Latency: 0
-
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
-
I/O behind bridge: None
-
Memory behind bridge: None
-
Prefetchable memory behind bridge: None
-
Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- <SERR- <PERR-
-
BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B+
-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
-
00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 08)
-
Subsystem: VMware Virtual Machine Chipset
-
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
-
Latency: 0
-
00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01) (prog-if 8a [ISA Compatibility mode controller, supports both channels switched to PCI native mode, supports bus mastering])
-
Subsystem: VMware Virtual Machine Chipset
-
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
-
Latency: 64
-
Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
-
Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable)
-
Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8]
-
Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable)
-
Region 4: I/O ports at 1060 [size=16]
-
Kernel driver in use: ata_piix
-
Kernel modules: ata_piix, ata_generic
-
...............
hwloc
是一个开源软件包,提供了命令行和图形工具,用于收集和展示硬件信息。它可以帮助用户了解系统中的硬件拓扑结构
,包括处理器、缓存、内存、PCI设备和网络设备等
lstopo:是hwloc
的主要命令行工具,用于展示硬件拓扑结构。它会生成一个图形化的拓扑图,显示处理器、缓存、内存和其他设备的层次结构和拓扑关系,如果没有图形环境,Istopo-no-graphics
可以提供命令行文字信息输出
-
┌──[root@liruilongs.github.io]-[~]
-
└─$ yum -y install hwloc
-
┌──[root@liruilongs.github.io]-[~]
-
└─$ lstopo-no-graphics
-
Machine (4226MB)
-
L3 L#0 (4096KB)
-
Package L#0
-
L2 L#0 (512KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 (P#0)
-
L2 L#1 (512KB) + L1d L#1 (32KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 (P#1)
-
Package L#1
-
L2 L#2 (512KB) + L1d L#2 (32KB) + L1i L#2 (64KB) + Core L#2 + PU L#2 (P#2)
-
L2 L#3 (512KB) + L1d L#3 (32KB) + L1i L#3 (64KB) + Core L#3 + PU L#3 (P#3)
-
HostBridge L#0
-
PCI 8086:7111
-
PCI 15ad:0405
-
PCI 1000:0030
-
Block(Disk) L#0 "sda"
-
PCIBridge
-
PCI 8086:100f
-
Net L#1 "ens32"
-
PCI 15ad:07e0
根据输出,可以看到以下组件:
-
机器(Machine):总共有 4226MB 的内存容量。
-
L3 缓存(L3 L#0):具有 4096KB 的容量。
-
处理器包(Package):存在两个处理器包(Package L#0 和 Package L#1)。
-
L2 缓存(L2 L#0 和 L2 L#1):每个 L2 缓存具有 512KB 的容量。
-
L1 数据缓存(L1d L#0 和 L1d L#1):每个 L1 数据缓存具有 32KB 的容量。
-
L1 指令缓存(L1i L#0 和 L1i L#1):每个 L1 指令缓存具有 64KB 的容量。
-
核心(Core L#0、Core L#1、Core L#2 和 Core L#3):每个处理器包中有四个核心。
-
处理单元(PU L#0、PU L#1、PU L#2 和 PU L#3):每个核心有一个处理单元。
-
每个处理器包中包含以下组件:
-
-
主机桥接器(HostBridge L#0):用于连接其他组件的主机桥接器。
-
网络设备(Net):标识为 "ens32" 的网络设备。
-
硬盘块(Block):标识为 "sda" 的磁盘块。
-
PCI 设备:显示了一些 PCI 设备的信息,包括厂商和设备的 ID。
-
PCI 8086:7111
-
PCI 15ad:0405
-
PCI 1000:0030
-
PCI 8086:100f
-
PCI 15ad:07e0
-
通过 -v
选项,可以查看更详细的信息
-
┌──[root@liruilongs.github.io]-[~]
-
└─$ lstopo-no-graphics -v
-
Machine (P#0 local=4327132KB total=4327132KB DMIProductName="VMware Virtual Platform" DMIProductVersion=None DMIProductSerial="VMware-56 4d 75 ae f9 4e 9f ad-ba a0 f4 a3 26 c9 6f ae" DMIProductUUID=AE754D56-4EF9-AD9F-BAA0-F4A326C96FAE DMIBoardVendor="Intel Corporation" DMIBoardName="440BX Desktop Reference Platform" DMIBoardVersion=None DMIBoardSerial=None DMIBoardAssetTag= DMIChassisVendor="No Enclosure" DMIChassisType=1 DMIChassisVersion=N/A DMIChassisSerial=None DMIChassisAssetTag="No Asset Tag" DMIBIOSVendor="Phoenix Technologies LTD" DMIBIOSVersion=6.00 DMIBIOSDate=07/22/2020 DMISysVendor="VMware, Inc." Backend=Linux LinuxCgroup=/ OSName=Linux OSRelease=3.10.0-693.el7.x86_64 OSVersion="#1 SMP Tue Aug 22 21:09:27 UTC 2017" HostName=liruilongs.github.io Architecture=x86_64 hwlocVersion=1.11.8 ProcessName=lstopo-no-graphics)
-
L3Cache L#0 (size=4096KB linesize=64 ways=16 Inclusive=0)
-
Package L#0 (P#0 CPUVendor=AuthenticAMD CPUFamilyNumber=23 CPUModelNumber=17 CPUModel="AMD Ryzen 7 2700U with Radeon Vega Mobile Gfx" CPUStepping=0)
-
L2Cache L#0 (size=512KB linesize=64 ways=8 Inclusive=0)
-
L1dCache L#0 (size=32KB linesize=64 ways=16 Inclusive=0)
-
L1iCache L#0 (size=64KB linesize=64 ways=4 Inclusive=0)
-
Core L#0 (P#0)
-
PU L#0 (P#0)
-
L2Cache L#1 (size=512KB linesize=64 ways=8 Inclusive=0)
-
L1dCache L#1 (size=32KB linesize=64 ways=16 Inclusive=0)
-
L1iCache L#1 (size=64KB linesize=64 ways=4 Inclusive=0)
-
Core L#1 (P#1)
-
PU L#1 (P#1)
-
...................................
-
┌──[root@liruilongs.github.io]-[~]
-
└─$
支持通过以下方式来查看硬件拓扑结构信息
-
lstopo-no-graphics --output-format txt
:以文本格式输出硬件拓扑结构。 -
lstopo-no-graphics --output-format xml
:以XML格式输出硬件拓扑结构。 -
lstopo-no-graphics --output-format fig
:以FIG格式输出硬件拓扑结构。
在这里插入图片描述
从上面 lstopo 的输出可以看到这个系统的拓扑结构:
-
这是一台双插槽双 处理器(processor) 的服务器,每个 processor 插槽默认安装了一个
AMD EPYC 7002
系列的CPU
-
每个 CPU 有多个核心,每个核有各级 cache,如 L1,L2,L3 缓存
-
多个核心通过
高速互联交换机
连接在一起,组成一个 numa 节点 -
每个
numa
节点有固定容量的内存,这里的是64GB
-
系统一共有两个
numa
节点,总内存为128GB
-
还安装了多块
OpenCL
加速卡,分布在两个numa
节点上(CoProc: 加速卡,这里是AMD RadeonOpenCL计算卡) -
另外还有两个网卡,连接外部网络,OpenFabrics: InfiniBand或者RoCE网络接口
-
磁盘是
894GB
的串行ATA 盘
部分参数信息:
-
Machine: 显示整体服务器硬件信息,总内存为 126GB
-
Package: CPU Socket,这里是两个Socket
-
NUMANode: NUMA节点,每个CPU Socket对应的是一个NUMA节点
-
L3: L3缓存,每个CPU有20MB L3缓存
-
PCI: PCIe插槽信息
-
L2: L2缓存,每个核有256KB L2缓存
-
OpenFabrics: InfiniBand或者RoCE网络接口
-
CoProc: 加速卡,这里是AMD RadeonOpenCL计算卡
-
L1d/L1i: L1数据缓存和指令缓存,每个核32KB
-
Core: 物理CPU核
-
PU: 指令处理单元,每个物理核内部资源分配
-
Block: 磁盘磁道
-
Net: 网络接口信息
lshw
命令可以列出机器硬件相关详细信息,lshw
命令可以列出内存,固件,主板,CPU,总线速度
等信息
lshw 命令将硬件组件分多个类别,如系统,内存,网络
等
查看信息的 class
分类
-
[root@workstation ~]# lshw -short
-
H/W path Device Class Description
-
========================================================
-
system KVM
-
/0 bus Motherboard
-
/0/0 memory 96KiB BIOS
-
/0/400 processor QEMU Virtual CPU version 2.5+
-
/0/401 processor QEMU Virtual CPU version 2.5+
-
/0/1000 memory 2GiB System Memory
-
/0/1000/0 memory 2GiB DIMM RAM
-
/0/100 bridge 82G33/G31/P35/P31 Express DRAM Controller
-
/0/100/1 bridge QEMU PCIe Root port
-
/0/100/1/0 network Virtio network device
-
/0/100/1/0/0 eth0 network Ethernet interface
-
/0/100/1.1 bridge QEMU PCIe Root port
-
/0/100/1.1/0 bus QEMU XHCI Host Controller
-
/0/100/1.1/0/0 usb1 bus xHCI Host Controller
-
/0/100/1.1/0/1 usb2 bus xHCI Host Controller
-
/0/100/1.2 bridge QEMU PCIe Root port
-
/0/100/1.2/0 communication Virtio console
-
/0/100/1.2/0/0 generic Virtual I/O device
-
/0/100/1.3 bridge QEMU PCIe Root port
-
/0/100/1.3/0 storage Virtio block device
-
/0/100/1.3/0/0 /dev/vda disk 10GB Virtual I/O device
-
/0/100/1.3/0/0/1 volume 10238MiB Linux filesystem partition
-
/0/100/1.4 bridge QEMU PCIe Root port
-
/0/100/1.4/0 generic Virtio memory balloon
-
/0/100/1.4/0/0 generic Virtual I/O device
-
/0/100/1.5 bridge QEMU PCIe Root port
-
/0/100/2 display Virtio GPU
-
/0/100/2/0 generic Virtual I/O device
-
/0/100/1f bridge 82801IB (ICH9) LPC Interface Controller
-
/0/100/1f.2 storage 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
-
/0/100/1f.3 bus 82801I (ICH9 Family) SMBus Controller
-
/0/1 system PnP device PNP0b00
-
/0/2 input PnP device PNP0303
-
/0/3 input PnP device PNP0f13
-
/0/4 communication PnP device PNP0501
-
/1 virbr0 network Ethernet interface
-
/2 virbr0-nic network Ethernet interface
查看特定类型的数据
-
┌──[root@hp-ProLiant-SL270s-Gen8-SE]-[~]
-
└─$ lshw -c network
-
*-network:0
-
description: Ethernet interface
-
product: I350 Gigabit Network Connection
-
vendor: Intel Corporation
-
physical id: 0
-
bus info: pci@0000:02:00.0
-
logical name: eno1
-
version: 01
-
serial: 9c:b6:54:b5:e2:64
-
size: 100Mbit/s #size 指的是网卡当前的连接速度(Negotiated Speed)。
-
capacity: 1Gbit/s #capacity 指的是网卡的最大理论连接速度能力(Maximum Speed)。
-
width: 32 bits
-
clock: 33MHz
-
capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
-
configuration: autonegotiation=on broadcast=yes driver=igb driverversion=5.19.0-32-generic duplex=full firmware=1.61, 0x80000cd5, 1.949.0 ip=10.255.0.101 latency=0 link=yes multicast=yes port=twisted pair speed=100Mbit/s
-
resources: irq:16 memory:efd00000-efdfffff ioport:5000(size=32) memory:efcf0000-efcf3fff memory:efa00000-efa7ffff memory:3cbfdfe0000-3cbfdffffff memory:3cbfdfc0000-3cbfdfdffff
-
*-network:1
-
description: Ethernet interface
-
product: I350 Gigabit Network Connection
-
vendor: Intel Corporation
-
physical id: 0.1
-
bus info: pci@0000:02:00.1
-
logical name: eno2
-
version: 01
-
serial: 9c:b6:54:b5:e2:65
-
capacity: 1Gbit/s
-
width: 32 bits
-
clock: 33MHz
-
capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
-
configuration: autonegotiation=on broadcast=yes driver=igb driverversion=5.19.0-32-generic firmware=1.61, 0x80000cd5, 1.949.0 latency=0 link=no multicast=yes port=twisted pair
-
resources: irq:17 memory:efb00000-efbfffff ioport:5020(size=32) memory:efaf0000-efaf3fff memory:efc00000-efc7ffff memory:3cbfdfa0000-3cbfdfbffff memory:3cbfdf80000-3cbfdf9ffff
-
*-network
-
description: Network controller
-
product: MT27520 Family [ConnectX-3 Pro]
-
vendor: Mellanox Technologies
-
physical id: 0
-
bus info: pci@0000:23:00.0
-
version: 00
-
width: 64 bits
-
clock: 33MHz
-
capabilities: pm vpd msix pciexpress bus_master cap_list rom
-
configuration: driver=mlx4_core latency=0
-
resources: iomemory:3e30-3e2f irq:46 memory:f6c00000-f6cfffff memory:3e3fc000000-3e3fdffffff memory:3e3dc000000-3e3fbffffff
-
┌──[root@hp-ProLiant-SL270s-Gen8-SE]-[~]
-
└─$
从lshw
的输出可以得知这个系统的网络结构:
有2块Intel I350
网卡,型号相同:
-
eno1在PCI插槽0000:02:00.0
-
eno2在PCI插槽0000:02:00.1
另外还有1块Mellanox ConnectX-3 Pro InfiniBand
卡:
-
在PCI插槽0000:23:00.0
-
使用mlx4_core驱动
主要特点:
-
Intel I350
是常见的1Gb以太网卡 -
MellanoxConnectX-3 Pro是InfiniBand卡,用于高性能计算集群
-
一个InfiniBand卡,两个以太网卡
-
以太网卡独立驱动与Mellanox卡单独驱动
-
┌──[root@hp-ProLiant-SL270s-Gen8-SE]-[~]
-
└─$ lshw -c disk
-
*-disk
-
description: SCSI Disk
-
product: LOGICAL VOLUME
-
vendor: HP
-
physical id: 1.0.0
-
bus info: scsi@1:1.0.0
-
logical name: /dev/sda
-
version: 3.22
-
serial: PBKTU0ARH3U005
-
size: 894GiB (960GB)
-
capabilities: 15000rpm gpt-1.00 partitioned partitioned:gpt
-
configuration: ansiversion=5 guid=525c7981-1a75-4223-a48b-cac0e40e44c2 logicalsectorsize=512 sectorsize=512
-
┌──[root@hp-ProLiant-SL270s-Gen8-SE]-[~]
-
└─$
也可以直接把硬件信息输出到 html
[root@workstation ~]# lshw -html > hw.hhtml
3报告硬件错误
用户可以通过rasdaemon
捕获并处理内核生成的错误事件
,这些信息记录在/sys/kernel/debug/tracing/
目录下,有 syslog/journald
报告使用rasdaemon
需要安装并启动服务
-
[root@workstation ~]# yum -y install rasdaemon
-
[root@workstation ~]# systemctl enable --now rasdaemon.service
可以在目录下查看相关的文件信息
-
┌──[root@hp-ProLiant-SL270s-Gen8-SE]-[~]
-
└─$ ls /sys/kernel/debug/tracing/
-
available_events dynamic_events hwlat_detector printk_formats set_event_pid snapshot trace_clock tracing_max_latency
-
available_filter_functions dyn_ftrace_total_info instances README set_ftrace_filter stack_max_size trace_marker tracing_on
-
available_tracers enabled_functions kprobe_events saved_cmdlines set_ftrace_notrace stack_trace trace_marker_raw tracing_thresh
-
buffer_percent error_log kprobe_profile saved_cmdlines_size set_ftrace_notrace_pid stack_trace_filter trace_options uprobe_events
-
buffer_size_kb events max_graph_depth saved_tgids set_ftrace_pid synthetic_events trace_pipe uprobe_profile
-
buffer_total_size_kb free_buffer options set_event set_graph_function timestamp_mode trace_stat
-
current_tracer function_profile_enabled per_cpu set_event_notrace_pid set_graph_notrace trace tracing_cpumask
-
┌──[root@hp-ProLiant-SL270s-Gen8-SE]-[~]
-
└─$
第一次运行可以会有如下报错信息,按照提示信息使用 rasdaemon --record
-
[root@workstation ~]# ras-mc-ctl --summary
-
DBD::SQLite::db prepare failed: no such table: mc_event at /usr/sbin/ras-mc-ctl line 1130.
-
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1131.
-
[root@workstation ~]# ras-mc-ctl --errors
-
DBD::SQLite::db prepare failed: no such table: mc_event at /usr/sbin/ras-mc-ctl line 1208.
-
ras-mc-ctl: Error: mc_event table missing from /var/lib/rasdaemon/ras-mc_event.db. Run 'rasdaemon --record'.
-
[root@workstation ~]# rasdaemon --record
运行 ras-mc-ctl --errors
命令后,可以检测到以下类型的错误:
-
内存错误(Memory errors)
-
PCIe AER 高速串行总线错误(PCIe Advanced Error Reporting errors)
-
Extlog 扩展日志错误(Extended Log errors)
-
MCE 机器检查异常错误(Machine Check Exception errors)
ras-mc-ctl --errors
命令用于列出当前系统中发生的机器检查错误
。它会显示内存错误、PCIe AER 错误、Extlog 错误和 MCE 错误
的详细信息(如果有的话)
-
[root@workstation ~]# ras-mc-ctl --errors
-
No Memory errors.
-
No PCIe AER errors.
-
No Extlog errors.
-
No MCE errors.
ras-mc-ctl --summary
命令用于提供机器检查错误的摘要信息。它会显示每个错误类型的总数,而不会提供每个具体错误的详细信息。这个命令适用于快速查看系统中错误的总体情况,以便对系统健康状况有一个概览。
-
[root@workstation ~]# ras-mc-ctl --summary
-
No Memory errors.
-
No PCIe AER errors.
-
No Extlog errors.
-
No MCE errors.
4查看虚拟环境和云环境的资源
KVM
是基于内核的虚拟机技术,是内核可加载的模块,KVM
运行在内核空间。
-
libvirt
工具是虚拟机和相关设备的管理工具,需要安装工具包apt install libvirt-daemon-system
-
virsh
命令使用libvit
库和 API 访问虚拟机,当前命令需要安装工具包apt install libvirt-clients
QEMU
是仿真器,可以将虚拟设备
提供给虚拟机操作系统
包括:
-
网络 PCI 控制器(virtio-net-pci)
-
存储控制器 (virtio-scsi-pci)
-
存储 PCI 块控制器 (virtio-blk)
-
内存 PCI 控制器 (virtio-balloon-pci)随机数发生器(virtio-rng-pci)
-
USB (ich9-usdb-uhci3 或 pciohci)网络设备动器 (e1000 或 virtio)
-
视频设备 (cirrus-vga-vga 或 qxl-vga)
QEMU 可以与 KVM(Kernel-based Virtual Machine)结合使用,以提供完整的虚拟化解决方案。KVM 提供硬件虚拟化支持,而 QEMU 提供了虚拟机监控器和硬件仿真能力。结合使用时,KVM 负责处理虚拟化的底层操作,而 QEMU 负责模拟虚拟机的硬件设备。
列出所有的虚拟机
-
┌──[root@liruilongs.github.io]-[~]
-
└─$virsh list --all
-
Id Name State
-
----------------------------------------------------
-
1 classroom running
-
2 workstation running
-
3 bastion running
-
- servera shut off
-
- serverb shut off
-
- serverc shut off
-
- serverd shut off
查看虚拟机的信息
-
┌──[root@liruilongs.github.io]-[~]
-
└─$virsh dominfo workstation
-
Id: 2
-
Name: workstation
-
UUID: 3f09a13c-94ad-4d97-8f76-17e9a81ae61f
-
OS Type: hvm
-
State: running
-
CPU(s): 2
-
CPU time: 1015.4s
-
Max memory: 2097152 KiB
-
Used memory: 2097152 KiB
-
Persistent: yes
-
Autostart: disable
-
Managed save: no
-
Security model: selinux
-
Security DOI: 0
-
Security label: system_u:system_r:svirt_t:s0:c249,c656 (enforcing)
查看虚拟机 CPU 信息
-
┌──[root@liruilongs.github.io]-[~]
-
└─$cat /etc/libvirt/qemu/workstation.xml | grep cpu
-
<vcpu placement='static'>2</vcpu>
查看宿主机和虚拟机的 CPU 时间
-
┌──[root@liruilongs.github.io]-[~]
-
└─$virsh cpu-stats workstation
-
CPU0:
-
cpu_time 165.460132723 seconds
-
vcpu_time 128.590700314 seconds
-
CPU1:
-
cpu_time 134.540034318 seconds
-
vcpu_time 98.868638312 seconds
-
CPU2:
-
cpu_time 81.308894671 seconds
-
vcpu_time 49.496840393 seconds
-
CPU3:
-
cpu_time 95.084166725 seconds
-
vcpu_time 57.692533169 seconds
-
CPU4:
-
cpu_time 127.711824011 seconds
-
vcpu_time 93.085340637 seconds
-
CPU5:
-
cpu_time 208.676280988 seconds
-
vcpu_time 172.028112759 seconds
-
CPU6:
-
cpu_time 68.894659228 seconds
-
vcpu_time 38.683257218 seconds
-
CPU7:
-
cpu_time 139.665723281 seconds
-
vcpu_time 103.899829328 seconds
-
Total:
-
cpu_time 1021.341715945 seconds
-
user_time 1.360000000 seconds
-
system_time 144.920000000 seconds
获取名为 "workstation" 的虚拟机的 VCPU 信息
-
┌──[root@liruilongs.github.io]-[~]
-
└─$virsh vcpuinfo workstation
-
VCPU: 0
-
CPU: 5
-
State: running
-
CPU time: 573.3s
-
CPU Affinity: yyyyyyyy
-
VCPU: 1
-
CPU: 7
-
State: running
-
CPU time: 169.2s
-
CPU Affinity: yyyyyyyy
-
┌──[root@liruilongs.github.io]-[~]
-
└─$
内存气泡 memballoon/belun/
Memory Ballooning(内存气泡)
是一种虚拟化技术,用于动态调整虚拟机的内存分配。它允许在运行的虚拟机之间共享和重新分配内存,以提高资源利用率。
Memory Ballooning 的工作原理如下:
-
在虚拟机中安装并启动
Virtio-Balloon
驱动程序。Virtio-Balloon
是一种虚拟设备驱动程序,通过与宿主机(通常是QEMU
)通信来进行内存管理。 -
虚拟机启动后,
Virtio-Balloon
驱动程序会向宿主机报告虚拟机当前的内存使用情况。 -
如果宿主机上的其他虚拟机需要更多内存,宿主机会发送请求给
Virtio-Balloon
驱动程序,要求虚拟机释放一部分内存。 -
虚拟机的
Virtio-Balloon
驱动程序会响应请求,通过将一些内存页面释放回宿主机,从而减少虚拟机的内存使用量。 -
宿主机收到释放的内存后,可以将其分配给其他虚拟机使用,从而实现内存的复用。
查看 linux 内核是否具备 virtio-ballon
模块
-
┌──[root@liruilongs.github.io]-[~]
-
└─$ lsmod | grep virtio_balloon
-
┌──[root@liruilongs.github.io]-[~]
-
└─$ modprobe virtio_balloon
-
┌──[root@liruilongs.github.io]-[~]
-
└─$ lsmod | grep virtio_balloon
-
virtio_balloon 18015 0
-
virtio_ring 22991 1 virtio_balloon
-
virtio 14959 1 virtio_balloon
-
┌──[root@liruilongs.github.io]-[~]
-
└─$ modinfo virtio-balloon
-
filename: /lib/modules/3.10.0-1160.76.1.el7.x86_64/kernel/drivers/virtio/virtio_balloon.ko.xz
-
license: GPL
-
description: Virtio balloon driver
-
retpoline: Y
-
rhelversion: 7.9
-
srcversion: 52EDF3EAD03F14A066CA3BC
-
alias: virtio:d00000005v*
-
depends: virtio,virtio_ring
-
intree: Y
-
vermagic: 3.10.0-1160.76.1.el7.x86_64 SMP mod_unload modversions
-
signer: CentOS Linux kernel signing key
-
sig_key: C6:93:65:52:C5:A1:E9:97:0B:A2:4C:98:1A:C4:51:A6:BC:11:09:B9
-
sig_hashalgo: sha256
-
parm: oom_pages:pages to free on OOM (int)
-
┌──[root@liruilongs.github.io]-[~]
-
└─$
kvm_stat
命令是一个 python 脚本,通过读取内核的计数器信息,查看虚拟机数据,使用 ctrl+c 或者 q 键退出 kvm stat 命令,使用 x 键,查看 kvmexit 事件的细节
-
┌──[root@liruilongs.github.io]-[~]
-
└─$kvm_stat
-
kvm statistics - summary
-
Event Total %Total CurAvg/s
-
kvm_entry 8663 17.3 377
-
kvm_exit 8663 17.3 377
-
kvm_apic 6442 12.8 285
-
kvm_msr 6377 12.7 283
-
kvm_hv_timer_state 6295 12.5 282
-
kvm_pv_tlb_flush 2069 4.1 90
-
kvm_inj_virq 1960 3.9 85
-
kvm_apic_accept_irq 1960 3.9 85
-
kvm_eoi 1960 3.9 85
-
kvm_pv_eoi 1943 3.9 85
-
kvm_vcpu_wakeup 1870 3.7 80
-
kvm_fpu 402 0.8 20
-
kvm_halt_poll_ns 311 0.6 12
-
kvm_msi_set_irq 232 0.5 11
-
kvm_emulate_insn 212 0.4 10
-
vcpu_match_mmio 201 0.4 10
-
kvm_userspace_exit 201 0.4 10
-
kvm_mmio 201 0.4 10
-
kvm_apic_ipi 65 0.1 1
-
kvm_ack_irq 10 0.0 0
-
kvm_page_fault 83 0.2 0
-
kvm_cpuid 39 0.1 0
-
kvm_fast_mmio 11 0.0 0
-
kvm_ple_window_update 2 0.0 0
-
kvm_pvclock_update 1 0.0 0
-
Total 50173 2198
行动吧,在路上总比一直观望的要好,未来的你肯定会感 谢现在拼搏的自己!如果想学习提升找不到资料,没人答疑解惑时,请及时加入扣群: 320231853,里面有各种软件测试+开发资料和技术可以一起交流学习哦。
最后感谢每一个认真阅读我文章的人,礼尚往来总是要有的,虽然不是什么很值钱的东西,如果你用得到的话可以直接拿走:
这些资料,对于【软件测试】的朋友来说应该是最全面最完整的备战仓库,这个仓库也陪伴上万个测试工程师们走过最艰难的路程,希望也能帮助到你!