最近集群机器总是莫名其妙的Down机,出问题的时候机器不响应,远程管理卡也连接不上,只能联系IDC硬重启,后来找到厂商说要在BIOS关闭c-stats,问题不知道是否解决,先了解下c-stats是干什么的:

Intel的一个网页上看到:

CPU C-states arecore power states requested by the Operating System Directed Power Management(OSPM) infrastructure that define the degree to which the processor is"sleeping". C0 indicates a normal operation. All other C-states(C1-Cn) describe states where the processor clock is inactive (cannot executeinstructions) and different parts of the processor are powered down. DeeperC-states have longer exit latencies (the time to transition back to C0) butsave more power. The processor can override an OSPM request and automaticallydemote a specific C-state request. For example, the OSPM may request C6 but theprocessor may actually use C3.

主要的级别如下:

C0– Active: CPU is on. C0 is the operating state.

C1– Auto Halt: core clock is off. C1 is a state where the processor is notexecuting instructions, but can return to an executing state essentiallyinstantaneously. Some processors also support an Enhanced C1 state (C1E) forlower power consumption.

C2– Stop Clock: core and bus clocks are off. C2 is a state where the processormaintains all software-visible state, but may take longer to wake up.

C3– Deep Sleep: clock generator is off. C3 is a state where the processor doesnot need to keep its cache coherent, but maintains other states. Someprocessors have variations on the C3 state (Deep Sleep, Deeper Sleep, etc.)that differ in how long it takes to wake the processor.

C4– Deeper Sleep: reduced VCC

DC4– Deeper C4 Sleep: further reduced VCC

总而言之是用来在非CPU-boundserver上节能用的,在一定条件下进入休眠状态;

搜了一下确实是有这方面的Bug

http://en.community.dell.com/support-forums/servers/f/956/t/19433716.aspx

另外,和节能相关的在CentOS6.2上还有个Bugpower_saving导致高负载:

https://lkml.org/lkml/2012/6/13/458

http://en.community.dell.com/support-forums/servers/f/1466/p/19456558/20387384.aspx

想起来前天看到的一篇因为内存问题导致bit变化进而导致db crash的问题,真是到处都有坑啊