How Low Power mode controls CPU cores
Low Power mode can have significant effects on performance, power and energy use in CPU cores of some Apple silicon chips that support it. In previous tests on the M4 Pro, this mode was found to impose a ceiling on P core frequency, even when just a single thread is running. When power used exceeds a threshold, a more complex response resembling that seen in ‘thermal throttling’ was evoked, with progressive reduction in P core frequencies until total power use fell below a threshold, then maintained that level at a significant cost to performance. This article explores how that works.
Methods
These follow the procedures I’ve already described (links at end). For these tests I ran 8, 9 and 10 NEON threads at high QoS on the M4 Pro in a Mac mini 2024, and collected data using the cpu_power sampler in powermetrics. For an additional 10-thread test, the thermal sampler was added to monitor thermal pressure. The duration of each sample requested was 10 ms, but those actually used were about 24 ms for 8- and 9-core tests, and 29 ms for 10-core tests.
The M4 Pro chip has two P clusters of 5 cores each. Running 8-10 threads fills one cluster, and takes active residencies in the second to 100% for 3, 4 or all cores. With 8 or 9 test threads running, one other core in that second cluster runs powermetrics and other overhead at 50-60% active residency. With all 10 P cores running test threads at 100% active residency, that overhead is moved to run in the E cluster instead, where it contributes little to total power consumption. Power use during each of these three tests far exceeds the threshold observed in Low Power mode, so evokes frequency regulation behaviour and significant reductions in test performance.
Frequency and active residency
The following three graphs show P cluster frequencies and total active residencies from the start of each test. In each case, there was a sharp rise in frequencies as the P cores were loaded with test threads, to the Low Power ceiling of 3,624 MHz. As tests progressed, cluster frequencies were steadily reduced over the following 0.2 seconds or so, until they reached a lower frequency of around 3,000 MHz, and then started to oscillate or hunt within a narrow band.
Linear regressions are fitted for the period of frequency reduction, shown in blue, with their equations given in the key.
Frequency reduction from the peak of 3,624 MHz became steeper with increasing number of threads, and the stable lower frequency fell towards 2,800 MHz.
Frequency and power
On the next set of three graphs, the same frequency data are shown below total CPU power. These show high peaks of power when frequency reached the ceiling of 3,624 MHz, with that peak proportional to the number of test threads. When frequency was reduced, that was accompanied by a marked reduction in power to reach a stable level, from which there was a slow fall with oscillations matching those seen in frequency.
Linear regressions are fitted for the period of power reduction, shown in blue, with their equations given in the key.
Power reduction from the peak became steeper with increasing numbers of threads, but the stable lower power was similar, at about 14 W in each case.
These suggest a relationship between frequency and power, at least when power exceeds 12 W. The following three graphs show linear regressions on those data points for each of the tests. Equations are again given in the key.
Similarities and differences
These three tests showed great qualitative similarity, with some significant quantitative differences. They are summarised in the following table.
Peak frequency was almost identical across the three tests, at the Low Power ceiling of 3,624 MHz, except in the 10-thread test, where it was slightly lower. This is consistent with core frequency being limited to that ceiling throughout testing in Low Power mode. Stable or plateau power use was also almost identical, at just over 14 W.
However, because of the different numbers of threads running in each test, stable power use was attained using different frequencies, from as high as 3,096 MHz for 8 threads down to as low as 2,733 MHz with 10 threads. This is consistent with frequency control being used to limit power use.
Because higher numbers of threads resulted in higher peaks in power, the rate of fall of frequency increased with increasing number of threads. With 8 threads, frequency fell at a rate of about 2,300 MHz/s, while 10 threads saw a rate of fall of more than 6,000 MHz/s. With 8 threads in particular, that reduction in frequency appeared less linear, and higher-order polynomial fits had much lower residuals, suggesting that frequency control wasn’t linear. More test runs would be needed to establish that.
Changes in power were closely linked to changes in frequency, and the ratio of their rates of fall appeared surprisingly constant, at 12-13 mW/MHz. That was also reflected in similarities in the linear regressions for the power-frequency relationships, in the intercepts and gradients at the foot of the table.
At no time during the additional test with 10 threads was thermal pressure in the chip reported as anything other than “Nominal”, which I take to mean that no elevated thermal pressure was detected at any time.
Frequency, power and thermal control
As threads are being loaded, P cluster frequencies are rapidly increased to the Low Power ceiling of 3,624 MHz, and power use rises above a set limit of about 14 W. As that will have resulted in an increase in CPU core temperature, which isn’t available in powermetrics samplers, it’s not possible to determine whether power or temperature are being controlled.
Following that peak, cluster frequency is reduced slowly until power used reaches a set point of just over 14 W after about 0.2 seconds. The rate of reduction of frequency varies according to the reduction in power required from peak to set point. Once power reaches that set point, small cyclical changes are made to frequency so that it and power remain stable in oscillation about that set point.
This slow reduction of frequency reduces the effect of limiting power on short bursts of high CPU demand, and may be intended to minimise performance impact of Low Power mode while avoiding increase in fan speed, one of the objectives of Low Power mode. This appears to be pre-emptive control designed to result in the best compromise between low power use and performance.
Summary
Low Power mode sets a maximum frequency of 3,624 MHz in M4 Pro CPU cores.
It also sets a total CPU core power threshold of 14 W.
When power exceeds that threshold, cluster frequencies are reduced over a period of about 0.2 seconds to reduce power use to that level.
That slow reduction in frequency allows short bursts of high CPU core demand to complete with limited delay due to frequency restriction.
More sustained periods of high demand are significantly slowed, though.
The aim of this pre-emptive strategy is to limit heat production to avoid spinning up cooling fans, without adversely affecting all CPU core performance.
The strategy also limits energy use, so extending battery endurance.
References
Power Modes and Apple silicon CPUs
Last Week on My Mac: Power throttle
Inside M4 chips: CPU power, energy and mystery
Inside M4 chips: Matrix processing and Power Modes
Power Modes and Apple Silicon GPUs
Evaluating M3 Pro CPU cores: 1 General performance
Explainer
Residency is the percentage of time a core is in a specific state. Idle residency is thus the percentage of time that core is idle and not processing instructions. Active residency is the percentage of time it isn’t idle, but is actively processing instructions. Down residency is the percentage of time the core is shut down. All these are independent of the core’s frequency or clock speed.