[知其然不知其所以然-27] throttling after resume from suspend

Previously Len has asked me about the solution for Clock Modulation changement

on some platforms, and he recomment to restore the Clock Modulation register

in specific driver, rather than restore the value regardless of resuming context,

here's his original email:

+mjg59, who may be seeing this issue on a skylake laptop

Chen-yu,

Great debugging, but I think there is a more general fix possible than
this DMI quirk.

I agree that in this example, a grantley server, it seems the BIOS is
erroneously
returning a bogus value of MSR_IA32_THERM_CONTROL on resume from S3.

But another scenario is also possible.  Consider a laptop that is resuming HOT
and the BIOS correctly enables throttling.  If this code were invoked, it would
restore the COLD setting.

Instead, it seems to me that the ACPI processor driver should upon .resume
check if throttling should be enabled or not, and proceed accordingly.
That would always do the "right thing", and would not need a DMI list.
Does that make sense?

thanks,
Len Brown, Intel Open Source Technology Center

I agree, to let the related drivers customize their restoring process
would be more robust,
and we can not only take care of boot CPU but also nonboot CPUs in this way.
I think we can add something like acpi_processor_reevaluate_tstate in the resume
hook.

So apparently I need to find a way to revaluate the Clock Modulation, and this is what processor_throttling.c

should achieve for this scenario.

Let's first look at ACPI 6 spec and figure out what throttling is.

Say, in page 473, 8.4.5 Processor Throttling Controls:

ACPI defines two processor throttling (T state) control interfaces. These are:
• The Processor Register Block’s (P_BLK’s) P_CNT register.
• The combined _PTC, _TSS, and _TPC objects in the processor’s object list.

Only when all of _PTC, _TSS and _TPC exist in the processor's object list, will use the second throttling mechanism,

otherwise, try to use P_CNT throttling,

Let's first talk about the latter:

Throttling only takes effect when in C0 state:

While the processor is in the C0 power state, it executes instructions. While in the C0 power state,
OSPM can generate a policy to run the processor at less than maximum performance. The clock
throttling mechanism provides OSPM with the functionality to perform this task in addition to
thermal control. The mechanism allows OSPM to program a value into a register that reduces the
processor’s performance to a percentage of maximum performance.

The FADT contains the duty offset and duty width values. The duty offset value determines the
offset within the P_CNT register of the duty value. The duty width value determines the number of
bits used by the duty value (which determines the granularity of the throttling logic). The
performance of the processor by the clock logic can be expressed with the following equation:

after introducing the fundamental principle, let's check how linux kernel implement it.

during acpi processor driver probe phase, the throttling info will also be initialized:

static struct device_driver acpi_processor_driver = {
.name = "processor",
.bus = &cpu_subsys,
.acpi_match_table = processor_device_ids,
.probe = acpi_processor_start,
.remove = acpi_processor_stop,
};

during probe,
int acpi_processor_get_throttling_info(struct acpi_processor *pr)
{
/*
* Evaluate _PTC, _TSS and _TPC
* They must all be present or none of them can be used.
*/
if (acpi_processor_get_throttling_control(pr) ||
acpi_processor_get_throttling_states(pr) ||
acpi_processor_get_platform_limit(pr))
{
pr->throttling.acpi_processor_get_throttling =
pr->throttling.acpi_processor_set_throttling =
return 0;
} else {
pr->throttling.acpi_processor_get_throttling =
&acpi_processor_get_throttling_ptc;
pr->throttling.acpi_processor_set_throttling =
&acpi_processor_set_throttling_ptc;
}

}

on our mac pro 12, we don't have all the _PTC, _TSS or _TPC, thus

pr->throttling.state_count = 1 << acpi_gbl_FADT.duty_width;
step = (1000 / pr->throttling.state_count);

for (i = 0; i < pr->throttling.state_count; i++) {
pr->throttling.states[i].performance = 1000 - step * i;
pr->throttling.states[i].power = 1000 - step * i;
}

according to the figure depicted above, the duty_width is the bit width for the throttling value,

so we can have state_count = 1 << duty_width of different value for it, then we implicitly set max performance to 1000,

thus the step size is 1000/state_count, for state =0, the performance is 1000, state=1 is 1000-step,etc.

later when we try to read/set the actual throttling value, we use acpi_processor_get_throttling_fadt and

duty_mask = pr->throttling.state_count - 1;
if (value & 0x10) {
duty_value >>= pr->throttling.duty_offset;
if (duty_value)
state = pr->throttling.state_count - duty_value;
}
pr->throttling.state = state;

above is the reading for duty value, one word, it tries to extract value from bit  duty_offset to

duty_offset+duty_width, and caculate the corresponding state then set to current cpu's throttling state variable,

that is to say, state is the substract delta between duty_mask and actual duty_value.

And it is also the same with acpi_processor_set_throttling_fadt:

static int acpi_processor_set_throttling_fadt(struct acpi_processor *pr,
int state, bool force)
{
duty_value = pr->throttling.state_count - state;
duty_value <<= pr->throttling.duty_offset;

/* Used to clear all duty_value bits */

value |= duty_value;
value |= 0x00000010;

pr->throttling.state = state;
}
say, duty_value is the value to be set into register, if state is provided.

and duty_mask is something like 0xffff0000ffff , and bit16-bit32 is the mask

to clear the duty value, then finally set to throttling.address.

Then let's take another throttling mechanism for example,

say, if there are _PTC, _TSS and _TPC provided, we use

acpi_processor_get_throttling_ptc and acpi_processor_set_throttling_ptc.

for ACPI_ADR_SPACE_FIXED_HARDWARE, it is related to

MSR_IA32_THERM_CONTROL, so acpi_processor_get_throttling_ptc

firstly get the value from MSR_IA32_THERM_CONTROL, then compare

this value to the CPU's _PSS array, if there is a match, return the index(thus state)

in the _PSS array, store it in pr->throttling.state, thus this is what throttling state

mean. And likewise, for acpi_processor_set_throttling_ptc, it get the

_PSS[state].control and write it to MSR_IA32_THERM_CONTROL. and for _TSS based

throttling, the state max number is calculated by:

pr->throttling.state_count = tss->package.count;

OK, so when the throttling mechanism is triggered? Actually we mainly

rely on acpi_processor_set_throttling, this is an API exposed to users,

to set throttling value according to the param, say, if we want to change

the throttling state of CPU3, this function leverages work_on_cpu(3,

acpi_processor_throttling_fn) to do this, and it finally use acpi_processor_set_throttling_fadt

or acpi_processor_set_throttling_ptc depending on whether there is

full collection of _PTC, _TSS and _TPC.

So what we care most is who invokes acpi_processor_set_throttling

and how they want to set the throttling state? there are three possible places:

1. acpi_processor_tstate_has_changed

2. processor_set_cur_state

3.acpi_processor_reevaluate_tstate

acpi_processor_tstate_has_changed is invoked when the processor is notified with

ACPI_PROCESSOR_NOTIFY_POWER=0x81, and also invoked when a CPU is online.

and for processor_set_cur_state, it is invoked when processor is registered

as a cooling device, and the designed state for this cooling device is set beyond

max possible state(which means, cooling device can not ganrateen to run at the

fasted speed or lowest freq to cool the system down), thus we restrict the processor

speed to put thermal down. as for acpi_processor_reevaluate_tstate , it is invoked when

CPU_ONLINE.

So for our case, we want to revalute tstate when system is resumed from suspend to

memory. Since resuming from suspend to memory would online the nonboot CPUs,

so the nonboot CPUs all have a chance to reset the throttling state, thus Clock Modulation

register to zero. But there are two problems remaining:

1. the boot CPU does not have a chance to clear Clock Modulation

2. if the system does not have all of _PTC, _TSS and _TPC, it has no chance to clear Clock Modulation,

because they can only use duty_width duty_offset to set throttling.

So, the solution would be, add resume hooks to processor_driver, and

explictly clear all the Clock Modulation regardless of system throttling mode.

• 本文已收录于以下专栏：

举报原因： 您举报文章：[知其然不知其所以然-27] throttling after resume from suspend 色情 政治 抄袭 广告 招聘 骂人 其他 (最多只允许输入30个字)