Files
linux/drivers/cpufreq/cppc_cpufreq.c
Linus Torvalds d7c8087a9c Merge tag 'pm-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
 "Once again, cpufreq is the most active development area, mostly
  because of the new feature additions and documentation updates in the
  amd-pstate driver, but there are also changes in the cpufreq core
  related to boost support and other assorted updates elsewhere.

  Next up are power capping changes due to the major cleanup of the
  Intel RAPL driver.

  On the cpuidle front, a new C-states table for Intel Panther Lake is
  added to the intel_idle driver, the stopped tick handling in the menu
  and teo governors is updated, and there are a couple of cleanups.

  Apart from the above, support for Tegra114 is added to devfreq and
  there are assorted cleanups of that code, there are also two updates
  of the operating performance points (OPP) library, two minor updates
  related to hibernation, and cpupower utility man pages updates and
  cleanups.

  Specifics:

   - Update qcom-hw DT bindings to include Eliza hardware (Abel Vesa)

   - Update cpufreq-dt-platdev blocklist (Faruque Ansari)

   - Minor updates to driver and dt-bindings for Tegra (Thierry Reding,
     Rosen Penev)

   - Add MAINTAINERS entry for CPPC driver (Viresh Kumar)

   - Add support for new features: CPPC performance priority, Dynamic
     EPP, Raw EPP, and new unit tests for them to amd-pstate (Gautham
     Shenoy, Mario Limonciello)

   - Fix sysfs files being present when HW missing and broken/outdated
     documentation in the amd-pstate driver (Ninad Naik, Gautham Shenoy)

   - Pass the policy to cpufreq_driver->adjust_perf() to avoid using
     cpufreq_cpu_get() in the .adjust_perf() callback in amd-pstate
     which leads to a scheduling-while-atomic bug (K Prateek Nayak)

   - Clean up dead code in Kconfig for cpufreq (Julian Braha)

   - Remove max_freq_req update for pre-existing cpufreq policy and add
     a boost_freq_req QoS request to save the boost constraint instead
     of overwriting the last scaling_max_freq constraint (Pierre
     Gondois)

   - Embed cpufreq QoS freq_req objects in cpufreq policy so they all
     are allocated in one go along with the policy to simplify lifetime
     rules and avoid error handling issues (Viresh Kumar)

   - Use DMI max speed when CPPC is unavailable in the acpi-cpufreq
     scaling driver (Henry Tseng)

   - Switch policy_is_shared() in cpufreq to using cpumask_nth() instead
     of cpumask_weight() because the former is more efficient (Yury
     Norov)

   - Use sysfs_emit() in sysfs show functions for cpufreq governor
     attributes (Thorsten Blum)

   - Update intel_pstate to stop returning an error when "off" is
     written to its status sysfs attribute while the driver is already
     off (Fabio De Francesco)

   - Include current frequency in the debug message printed by
     __cpufreq_driver_target() (Pengjie Zhang)

   - Refine stopped tick handling in the menu cpuidle governor and
     rearrange stopped tick handling in the teo cpuidle governor (Rafael
     Wysocki)

   - Add Panther Lake C-states table to the intel_idle driver (Artem
     Bityutskiy)

   - Clean up dead dependencies on CPU_IDLE in Kconfig (Julian Braha)

   - Simplify cpuidle_register_device() with guard() (Huisong Li)

   - Use performance level if available to distinguish between rates in
     OPP debugfs (Manivannan Sadhasivam)

   - Fix scoped_guard in dev_pm_opp_xlate_required_opp() (Viresh Kumar)

   - Return -ENODATA if the snapshot image is not loaded (Alberto
     Garcia)

   - Remove inclusion of crypto/hash.h from hibernate_64.c on x86 (Eric
     Biggers)

   - Clean up and rearrange the intel_rapl power capping driver to make
     the respective interface drivers (TPMI, MSR, and MMOI) hold their
     own settings and primitives and consolidate PL4 and PMU support
     flags into rapl_defaults (Kuppuswamy Sathyanarayanan)

   - Correct kernel-doc function parameter names in the power capping
     core code (Randy Dunlap)

   - Remove unneeded casting for HZ_PER_KHZ in devfreq (Andy Shevchenko)

   - Use _visible attribute to replace create/remove_sysfs_files() in
     devfreq (Pengjie Zhang)

   - Add Tegra114 support to activity monitor device in tegra30-devfreq
     as a preparation to upcoming EMC controller support (Svyatoslav
     Ryhel)

   - Fix mistakes in cpupower man pages, add the boost and epp options
     to the cpupower-frequency-info man page, and add the perf-bias
     option to the cpupower-info man page (Roberto Ricci)

   - Remove unnecessary extern declarations from getopt.h in arguments
     parsing functions in cpufreq-set, cpuidle-info, cpuidle-set,
     cpupower-info, and cpupower-set utilities (Kaushlendra Kumar)"

* tag 'pm-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits)
  cpufreq/amd-pstate: Add POWER_SUPPLY select for dynamic EPP
  cpupower: remove extern declarations in cmd functions
  cpuidle: Simplify cpuidle_register_device() with guard()
  PM / devfreq: tegra30-devfreq: add support for Tegra114
  PM / devfreq: use _visible attribute to replace create/remove_sysfs_files()
  PM / devfreq: Remove unneeded casting for HZ_PER_KHZ
  MAINTAINERS: amd-pstate: Step down as maintainer, add Prateek as reviewer
  cpufreq: Pass the policy to cpufreq_driver->adjust_perf()
  cpufreq/amd-pstate: Pass the policy to amd_pstate_update()
  cpufreq/amd-pstate-ut: Add a unit test for raw EPP
  cpufreq/amd-pstate: Add support for raw EPP writes
  cpufreq/amd-pstate: Add support for platform profile class
  cpufreq/amd-pstate: add kernel command line to override dynamic epp
  cpufreq/amd-pstate: Add dynamic energy performance preference
  Documentation: amd-pstate: fix dead links in the reference section
  cpufreq/amd-pstate: Cache the max frequency in cpudata
  Documentation/amd-pstate: Add documentation for amd_pstate_floor_{freq,count}
  Documentation/amd-pstate: List amd_pstate_prefcore_ranking sysfs file
  Documentation/amd-pstate: List amd_pstate_hw_prefcore sysfs file
  amd-pstate-ut: Add a testcase to validate the visibility of driver attributes
  ...
2026-04-13 19:47:52 -07:00

1052 lines
27 KiB
C

// SPDX-License-Identifier: GPL-2.0-only
/*
* CPPC (Collaborative Processor Performance Control) driver for
* interfacing with the CPUfreq layer and governors. See
* cppc_acpi.c for CPPC specific methods.
*
* (C) Copyright 2014, 2015 Linaro Ltd.
* Author: Ashwin Chaugule <ashwin.chaugule@linaro.org>
*/
#define pr_fmt(fmt) "CPPC Cpufreq:" fmt
#include <linux/arch_topology.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/delay.h>
#include <linux/cpu.h>
#include <linux/cpufreq.h>
#include <linux/irq_work.h>
#include <linux/kthread.h>
#include <linux/time.h>
#include <linux/vmalloc.h>
#include <uapi/linux/sched/types.h>
#include <linux/unaligned.h>
#include <acpi/cppc_acpi.h>
static struct cpufreq_driver cppc_cpufreq_driver;
#ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE
static enum {
FIE_UNSET = -1,
FIE_ENABLED,
FIE_DISABLED
} fie_disabled = FIE_UNSET;
module_param(fie_disabled, int, 0444);
MODULE_PARM_DESC(fie_disabled, "Disable Frequency Invariance Engine (FIE)");
/* Frequency invariance support */
struct cppc_freq_invariance {
int cpu;
struct irq_work irq_work;
struct kthread_work work;
struct cppc_perf_fb_ctrs prev_perf_fb_ctrs;
struct cppc_cpudata *cpu_data;
};
static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_freq_inv);
static struct kthread_worker *kworker_fie;
static int cppc_perf_from_fbctrs(u64 reference_perf,
struct cppc_perf_fb_ctrs *fb_ctrs_t0,
struct cppc_perf_fb_ctrs *fb_ctrs_t1);
/**
* __cppc_scale_freq_tick - CPPC arch_freq_scale updater for frequency invariance
* @cppc_fi: per-cpu CPPC FIE data.
*
* The CPPC driver registers itself with the topology core to provide its own
* implementation (cppc_scale_freq_tick()) of topology_scale_freq_tick() which
* gets called by the scheduler on every tick.
*
* Note that the arch specific counters have higher priority than CPPC counters,
* if available, though the CPPC driver doesn't need to have any special
* handling for that.
*/
static void __cppc_scale_freq_tick(struct cppc_freq_invariance *cppc_fi)
{
struct cppc_perf_fb_ctrs fb_ctrs = {0};
struct cppc_cpudata *cpu_data;
unsigned long local_freq_scale;
u64 perf, ref_perf;
cpu_data = cppc_fi->cpu_data;
if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
pr_warn("%s: failed to read perf counters\n", __func__);
return;
}
ref_perf = cpu_data->perf_caps.reference_perf;
perf = cppc_perf_from_fbctrs(ref_perf,
&cppc_fi->prev_perf_fb_ctrs, &fb_ctrs);
if (!perf)
return;
cppc_fi->prev_perf_fb_ctrs = fb_ctrs;
perf <<= SCHED_CAPACITY_SHIFT;
local_freq_scale = div64_u64(perf, cpu_data->perf_caps.highest_perf);
/* This can happen due to counter's overflow */
if (unlikely(local_freq_scale > 1024))
local_freq_scale = 1024;
per_cpu(arch_freq_scale, cppc_fi->cpu) = local_freq_scale;
}
static void cppc_scale_freq_tick(void)
{
__cppc_scale_freq_tick(&per_cpu(cppc_freq_inv, smp_processor_id()));
}
static struct scale_freq_data cppc_sftd = {
.source = SCALE_FREQ_SOURCE_CPPC,
.set_freq_scale = cppc_scale_freq_tick,
};
static void cppc_scale_freq_workfn(struct kthread_work *work)
{
struct cppc_freq_invariance *cppc_fi;
cppc_fi = container_of(work, struct cppc_freq_invariance, work);
__cppc_scale_freq_tick(cppc_fi);
}
static void cppc_irq_work(struct irq_work *irq_work)
{
struct cppc_freq_invariance *cppc_fi;
cppc_fi = container_of(irq_work, struct cppc_freq_invariance, irq_work);
kthread_queue_work(kworker_fie, &cppc_fi->work);
}
/*
* Reading perf counters may sleep if the CPC regs are in PCC. Thus, we
* schedule an irq work in scale_freq_tick (since we reach here from hard-irq
* context), which then schedules a normal work item cppc_scale_freq_workfn()
* that updates the per_cpu arch_freq_scale variable based on the counter
* updates since the last tick.
*/
static void cppc_scale_freq_tick_pcc(void)
{
struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_freq_inv, smp_processor_id());
/*
* cppc_get_perf_ctrs() can potentially sleep, call that from the right
* context.
*/
irq_work_queue(&cppc_fi->irq_work);
}
static struct scale_freq_data cppc_sftd_pcc = {
.source = SCALE_FREQ_SOURCE_CPPC,
.set_freq_scale = cppc_scale_freq_tick_pcc,
};
static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
{
struct scale_freq_data *sftd = &cppc_sftd;
struct cppc_freq_invariance *cppc_fi;
int cpu, ret;
if (fie_disabled)
return;
for_each_cpu(cpu, policy->cpus) {
cppc_fi = &per_cpu(cppc_freq_inv, cpu);
cppc_fi->cpu = cpu;
cppc_fi->cpu_data = policy->driver_data;
if (cppc_perf_ctrs_in_pcc_cpu(cpu)) {
kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn);
init_irq_work(&cppc_fi->irq_work, cppc_irq_work);
sftd = &cppc_sftd_pcc;
}
ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs);
/*
* Don't abort as the CPU was offline while the driver was
* getting registered.
*/
if (ret && cpu_online(cpu)) {
pr_debug("%s: failed to read perf counters for cpu:%d: %d\n",
__func__, cpu, ret);
return;
}
}
/* Register for freq-invariance */
topology_set_scale_freq_source(sftd, policy->cpus);
}
/*
* We free all the resources on policy's removal and not on CPU removal as the
* irq-work are per-cpu and the hotplug core takes care of flushing the pending
* irq-works (hint: smpcfd_dying_cpu()) on CPU hotplug. Even if the kthread-work
* fires on another CPU after the concerned CPU is removed, it won't harm.
*
* We just need to make sure to remove them all on policy->exit().
*/
static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
{
struct cppc_freq_invariance *cppc_fi;
int cpu;
if (fie_disabled)
return;
/* policy->cpus will be empty here, use related_cpus instead */
topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus);
for_each_cpu(cpu, policy->related_cpus) {
if (!cppc_perf_ctrs_in_pcc_cpu(cpu))
continue;
cppc_fi = &per_cpu(cppc_freq_inv, cpu);
irq_work_sync(&cppc_fi->irq_work);
kthread_cancel_work_sync(&cppc_fi->work);
}
}
static void cppc_fie_kworker_init(void)
{
struct sched_attr attr = {
.size = sizeof(struct sched_attr),
.sched_policy = SCHED_DEADLINE,
.sched_nice = 0,
.sched_priority = 0,
/*
* Fake (unused) bandwidth; workaround to "fix"
* priority inheritance.
*/
.sched_runtime = NSEC_PER_MSEC,
.sched_deadline = 10 * NSEC_PER_MSEC,
.sched_period = 10 * NSEC_PER_MSEC,
};
int ret;
kworker_fie = kthread_run_worker(0, "cppc_fie");
if (IS_ERR(kworker_fie)) {
pr_warn("%s: failed to create kworker_fie: %ld\n", __func__,
PTR_ERR(kworker_fie));
fie_disabled = FIE_DISABLED;
kworker_fie = NULL;
return;
}
ret = sched_setattr_nocheck(kworker_fie->task, &attr);
if (ret) {
pr_warn("%s: failed to set SCHED_DEADLINE: %d\n", __func__,
ret);
kthread_destroy_worker(kworker_fie);
fie_disabled = FIE_DISABLED;
kworker_fie = NULL;
}
}
static void __init cppc_freq_invariance_init(void)
{
bool perf_ctrs_in_pcc = cppc_perf_ctrs_in_pcc();
if (fie_disabled == FIE_UNSET) {
if (perf_ctrs_in_pcc) {
pr_info("FIE not enabled on systems with registers in PCC\n");
fie_disabled = FIE_DISABLED;
} else {
fie_disabled = FIE_ENABLED;
}
}
if (fie_disabled || !perf_ctrs_in_pcc)
return;
cppc_fie_kworker_init();
}
static void cppc_freq_invariance_exit(void)
{
if (kworker_fie)
kthread_destroy_worker(kworker_fie);
}
#else
static inline void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy)
{
}
static inline void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy)
{
}
static inline void cppc_freq_invariance_init(void)
{
}
static inline void cppc_freq_invariance_exit(void)
{
}
#endif /* CONFIG_ACPI_CPPC_CPUFREQ_FIE */
static void cppc_cpufreq_update_perf_limits(struct cppc_cpudata *cpu_data,
struct cpufreq_policy *policy)
{
struct cppc_perf_caps *caps = &cpu_data->perf_caps;
u32 min_perf, max_perf;
min_perf = cppc_khz_to_perf(caps, policy->min);
max_perf = cppc_khz_to_perf(caps, policy->max);
cpu_data->perf_ctrls.min_perf =
clamp_t(u32, min_perf, caps->lowest_perf, caps->highest_perf);
cpu_data->perf_ctrls.max_perf =
clamp_t(u32, max_perf, caps->lowest_perf, caps->highest_perf);
}
static int cppc_cpufreq_set_target(struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation)
{
struct cppc_cpudata *cpu_data = policy->driver_data;
unsigned int cpu = policy->cpu;
struct cpufreq_freqs freqs;
int ret = 0;
cpu_data->perf_ctrls.desired_perf =
cppc_khz_to_perf(&cpu_data->perf_caps, target_freq);
cppc_cpufreq_update_perf_limits(cpu_data, policy);
freqs.old = policy->cur;
freqs.new = target_freq;
cpufreq_freq_transition_begin(policy, &freqs);
ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
cpufreq_freq_transition_end(policy, &freqs, ret != 0);
if (ret)
pr_debug("Failed to set target on CPU:%d. ret:%d\n",
cpu, ret);
return ret;
}
static unsigned int cppc_cpufreq_fast_switch(struct cpufreq_policy *policy,
unsigned int target_freq)
{
struct cppc_cpudata *cpu_data = policy->driver_data;
unsigned int cpu = policy->cpu;
u32 desired_perf;
int ret;
desired_perf = cppc_khz_to_perf(&cpu_data->perf_caps, target_freq);
cpu_data->perf_ctrls.desired_perf = desired_perf;
cppc_cpufreq_update_perf_limits(cpu_data, policy);
ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
if (ret) {
pr_debug("Failed to set target on CPU:%d. ret:%d\n",
cpu, ret);
return 0;
}
return target_freq;
}
static int cppc_verify_policy(struct cpufreq_policy_data *policy)
{
cpufreq_verify_within_cpu_limits(policy);
return 0;
}
static unsigned int __cppc_cpufreq_get_transition_delay_us(unsigned int cpu)
{
int transition_latency_ns = cppc_get_transition_latency(cpu);
if (transition_latency_ns < 0)
return CPUFREQ_DEFAULT_TRANSITION_LATENCY_NS / NSEC_PER_USEC;
return transition_latency_ns / NSEC_PER_USEC;
}
/*
* The PCC subspace describes the rate at which platform can accept commands
* on the shared PCC channel (including READs which do not count towards freq
* transition requests), so ideally we need to use the PCC values as a fallback
* if we don't have a platform specific transition_delay_us
*/
#ifdef CONFIG_ARM64
#include <asm/cputype.h>
static unsigned int cppc_cpufreq_get_transition_delay_us(unsigned int cpu)
{
unsigned long implementor = read_cpuid_implementor();
unsigned long part_num = read_cpuid_part_number();
switch (implementor) {
case ARM_CPU_IMP_QCOM:
switch (part_num) {
case QCOM_CPU_PART_FALKOR_V1:
case QCOM_CPU_PART_FALKOR:
return 10000;
}
}
return __cppc_cpufreq_get_transition_delay_us(cpu);
}
#else
static unsigned int cppc_cpufreq_get_transition_delay_us(unsigned int cpu)
{
return __cppc_cpufreq_get_transition_delay_us(cpu);
}
#endif
#if defined(CONFIG_ARM64) && defined(CONFIG_ENERGY_MODEL)
static DEFINE_PER_CPU(unsigned int, efficiency_class);
/* Create an artificial performance state every CPPC_EM_CAP_STEP capacity unit. */
#define CPPC_EM_CAP_STEP (20)
/* Increase the cost value by CPPC_EM_COST_STEP every performance state. */
#define CPPC_EM_COST_STEP (1)
/* Add a cost gap correspnding to the energy of 4 CPUs. */
#define CPPC_EM_COST_GAP (4 * SCHED_CAPACITY_SCALE * CPPC_EM_COST_STEP \
/ CPPC_EM_CAP_STEP)
static unsigned int get_perf_level_count(struct cpufreq_policy *policy)
{
struct cppc_perf_caps *perf_caps;
unsigned int min_cap, max_cap;
struct cppc_cpudata *cpu_data;
int cpu = policy->cpu;
cpu_data = policy->driver_data;
perf_caps = &cpu_data->perf_caps;
max_cap = arch_scale_cpu_capacity(cpu);
min_cap = div_u64((u64)max_cap * perf_caps->lowest_perf,
perf_caps->highest_perf);
if ((min_cap == 0) || (max_cap < min_cap))
return 0;
return 1 + max_cap / CPPC_EM_CAP_STEP - min_cap / CPPC_EM_CAP_STEP;
}
/*
* The cost is defined as:
* cost = power * max_frequency / frequency
*/
static inline unsigned long compute_cost(int cpu, int step)
{
return CPPC_EM_COST_GAP * per_cpu(efficiency_class, cpu) +
step * CPPC_EM_COST_STEP;
}
static int cppc_get_cpu_power(struct device *cpu_dev,
unsigned long *power, unsigned long *KHz)
{
unsigned long perf_step, perf_prev, perf, perf_check;
unsigned int min_step, max_step, step, step_check;
unsigned long prev_freq = *KHz;
unsigned int min_cap, max_cap;
struct cpufreq_policy *policy;
struct cppc_perf_caps *perf_caps;
struct cppc_cpudata *cpu_data;
policy = cpufreq_cpu_get_raw(cpu_dev->id);
if (!policy)
return -EINVAL;
cpu_data = policy->driver_data;
perf_caps = &cpu_data->perf_caps;
max_cap = arch_scale_cpu_capacity(cpu_dev->id);
min_cap = div_u64((u64)max_cap * perf_caps->lowest_perf,
perf_caps->highest_perf);
perf_step = div_u64((u64)CPPC_EM_CAP_STEP * perf_caps->highest_perf,
max_cap);
min_step = min_cap / CPPC_EM_CAP_STEP;
max_step = max_cap / CPPC_EM_CAP_STEP;
perf_prev = cppc_khz_to_perf(perf_caps, *KHz);
step = perf_prev / perf_step;
if (step > max_step)
return -EINVAL;
if (min_step == max_step) {
step = max_step;
perf = perf_caps->highest_perf;
} else if (step < min_step) {
step = min_step;
perf = perf_caps->lowest_perf;
} else {
step++;
if (step == max_step)
perf = perf_caps->highest_perf;
else
perf = step * perf_step;
}
*KHz = cppc_perf_to_khz(perf_caps, perf);
perf_check = cppc_khz_to_perf(perf_caps, *KHz);
step_check = perf_check / perf_step;
/*
* To avoid bad integer approximation, check that new frequency value
* increased and that the new frequency will be converted to the
* desired step value.
*/
while ((*KHz == prev_freq) || (step_check != step)) {
perf++;
*KHz = cppc_perf_to_khz(perf_caps, perf);
perf_check = cppc_khz_to_perf(perf_caps, *KHz);
step_check = perf_check / perf_step;
}
/*
* With an artificial EM, only the cost value is used. Still the power
* is populated such as 0 < power < EM_MAX_POWER. This allows to add
* more sense to the artificial performance states.
*/
*power = compute_cost(cpu_dev->id, step);
return 0;
}
static int cppc_get_cpu_cost(struct device *cpu_dev, unsigned long KHz,
unsigned long *cost)
{
unsigned long perf_step, perf_prev;
struct cppc_perf_caps *perf_caps;
struct cpufreq_policy *policy;
struct cppc_cpudata *cpu_data;
unsigned int max_cap;
int step;
policy = cpufreq_cpu_get_raw(cpu_dev->id);
if (!policy)
return -EINVAL;
cpu_data = policy->driver_data;
perf_caps = &cpu_data->perf_caps;
max_cap = arch_scale_cpu_capacity(cpu_dev->id);
perf_prev = cppc_khz_to_perf(perf_caps, KHz);
perf_step = CPPC_EM_CAP_STEP * perf_caps->highest_perf / max_cap;
step = perf_prev / perf_step;
*cost = compute_cost(cpu_dev->id, step);
return 0;
}
static void cppc_cpufreq_register_em(struct cpufreq_policy *policy)
{
struct cppc_cpudata *cpu_data;
struct em_data_callback em_cb =
EM_ADV_DATA_CB(cppc_get_cpu_power, cppc_get_cpu_cost);
cpu_data = policy->driver_data;
em_dev_register_perf_domain(get_cpu_device(policy->cpu),
get_perf_level_count(policy), &em_cb,
cpu_data->shared_cpu_map, 0);
}
static void populate_efficiency_class(void)
{
struct acpi_madt_generic_interrupt *gicc;
DECLARE_BITMAP(used_classes, 256) = {};
int class, cpu, index;
for_each_possible_cpu(cpu) {
gicc = acpi_cpu_get_madt_gicc(cpu);
class = gicc->efficiency_class;
bitmap_set(used_classes, class, 1);
}
if (bitmap_weight(used_classes, 256) <= 1) {
pr_debug("Efficiency classes are all equal (=%d). "
"No EM registered", class);
return;
}
/*
* Squeeze efficiency class values on [0:#efficiency_class-1].
* Values are per spec in [0:255].
*/
index = 0;
for_each_set_bit(class, used_classes, 256) {
for_each_possible_cpu(cpu) {
gicc = acpi_cpu_get_madt_gicc(cpu);
if (gicc->efficiency_class == class)
per_cpu(efficiency_class, cpu) = index;
}
index++;
}
cppc_cpufreq_driver.register_em = cppc_cpufreq_register_em;
}
#else
static void populate_efficiency_class(void)
{
}
#endif
static struct cppc_cpudata *cppc_cpufreq_get_cpu_data(unsigned int cpu)
{
struct cppc_cpudata *cpu_data;
int ret;
cpu_data = kzalloc_obj(struct cppc_cpudata);
if (!cpu_data)
goto out;
if (!zalloc_cpumask_var(&cpu_data->shared_cpu_map, GFP_KERNEL))
goto free_cpu;
ret = acpi_get_psd_map(cpu, cpu_data);
if (ret) {
pr_debug("Err parsing CPU%d PSD data: ret:%d\n", cpu, ret);
goto free_mask;
}
ret = cppc_get_perf_caps(cpu, &cpu_data->perf_caps);
if (ret) {
pr_debug("Err reading CPU%d perf caps: ret:%d\n", cpu, ret);
goto free_mask;
}
ret = cppc_get_perf(cpu, &cpu_data->perf_ctrls);
if (ret) {
pr_debug("Err reading CPU%d perf ctrls: ret:%d\n", cpu, ret);
goto free_mask;
}
return cpu_data;
free_mask:
free_cpumask_var(cpu_data->shared_cpu_map);
free_cpu:
kfree(cpu_data);
out:
return NULL;
}
static void cppc_cpufreq_put_cpu_data(struct cpufreq_policy *policy)
{
struct cppc_cpudata *cpu_data = policy->driver_data;
free_cpumask_var(cpu_data->shared_cpu_map);
kfree(cpu_data);
policy->driver_data = NULL;
}
static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
{
unsigned int cpu = policy->cpu;
struct cppc_cpudata *cpu_data;
struct cppc_perf_caps *caps;
int ret;
cpu_data = cppc_cpufreq_get_cpu_data(cpu);
if (!cpu_data) {
pr_err("Error in acquiring _CPC/_PSD data for CPU%d.\n", cpu);
return -ENODEV;
}
caps = &cpu_data->perf_caps;
policy->driver_data = cpu_data;
/*
* Set min to lowest nonlinear perf to avoid any efficiency penalty (see
* Section 8.4.7.1.1.5 of ACPI 6.1 spec)
*/
policy->min = cppc_perf_to_khz(caps, caps->lowest_nonlinear_perf);
policy->max = cppc_perf_to_khz(caps, policy->boost_enabled ?
caps->highest_perf : caps->nominal_perf);
/*
* Set cpuinfo.min_freq to Lowest to make the full range of performance
* available if userspace wants to use any perf between lowest & lowest
* nonlinear perf
*/
policy->cpuinfo.min_freq = cppc_perf_to_khz(caps, caps->lowest_perf);
policy->cpuinfo.max_freq = policy->max;
policy->transition_delay_us = cppc_cpufreq_get_transition_delay_us(cpu);
policy->shared_type = cpu_data->shared_type;
switch (policy->shared_type) {
case CPUFREQ_SHARED_TYPE_HW:
case CPUFREQ_SHARED_TYPE_NONE:
/* Nothing to be done - we'll have a policy for each CPU */
break;
case CPUFREQ_SHARED_TYPE_ANY:
/*
* All CPUs in the domain will share a policy and all cpufreq
* operations will use a single cppc_cpudata structure stored
* in policy->driver_data.
*/
cpumask_copy(policy->cpus, cpu_data->shared_cpu_map);
break;
default:
pr_debug("Unsupported CPU co-ord type: %d\n",
policy->shared_type);
ret = -EFAULT;
goto out;
}
policy->fast_switch_possible = cppc_allow_fast_switch();
policy->dvfs_possible_from_any_cpu = true;
/*
* If 'highest_perf' is greater than 'nominal_perf', we assume CPU Boost
* is supported.
*/
if (caps->highest_perf > caps->nominal_perf)
policy->boost_supported = true;
/* Set policy->cur to max now. The governors will adjust later. */
policy->cur = cppc_perf_to_khz(caps, caps->highest_perf);
cpu_data->perf_ctrls.desired_perf = caps->highest_perf;
ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
if (ret) {
pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
caps->highest_perf, cpu, ret);
goto out;
}
cppc_cpufreq_cpu_fie_init(policy);
return 0;
out:
cppc_cpufreq_put_cpu_data(policy);
return ret;
}
static void cppc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
{
struct cppc_cpudata *cpu_data = policy->driver_data;
struct cppc_perf_caps *caps = &cpu_data->perf_caps;
unsigned int cpu = policy->cpu;
int ret;
cppc_cpufreq_cpu_fie_exit(policy);
cpu_data->perf_ctrls.desired_perf = caps->lowest_perf;
ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
if (ret)
pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
caps->lowest_perf, cpu, ret);
cppc_cpufreq_put_cpu_data(policy);
}
static inline u64 get_delta(u64 t1, u64 t0)
{
if (t1 > t0 || t0 > ~(u32)0)
return t1 - t0;
return (u32)t1 - (u32)t0;
}
static int cppc_perf_from_fbctrs(u64 reference_perf,
struct cppc_perf_fb_ctrs *fb_ctrs_t0,
struct cppc_perf_fb_ctrs *fb_ctrs_t1)
{
u64 delta_reference, delta_delivered;
delta_reference = get_delta(fb_ctrs_t1->reference,
fb_ctrs_t0->reference);
delta_delivered = get_delta(fb_ctrs_t1->delivered,
fb_ctrs_t0->delivered);
/*
* Avoid divide-by zero and unchanged feedback counters.
* Leave it for callers to handle.
*/
if (!delta_reference || !delta_delivered)
return 0;
return (reference_perf * delta_delivered) / delta_reference;
}
static int cppc_get_perf_ctrs_sample(int cpu,
struct cppc_perf_fb_ctrs *fb_ctrs_t0,
struct cppc_perf_fb_ctrs *fb_ctrs_t1)
{
int ret;
ret = cppc_get_perf_ctrs(cpu, fb_ctrs_t0);
if (ret)
return ret;
udelay(2); /* 2usec delay between sampling */
return cppc_get_perf_ctrs(cpu, fb_ctrs_t1);
}
static unsigned int cppc_cpufreq_get_rate(unsigned int cpu)
{
struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpu);
struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
struct cppc_cpudata *cpu_data;
u64 delivered_perf, reference_perf;
int ret;
if (!policy)
return 0;
cpu_data = policy->driver_data;
ret = cppc_get_perf_ctrs_sample(cpu, &fb_ctrs_t0, &fb_ctrs_t1);
if (ret) {
if (ret == -EFAULT)
/* Any of the associated CPPC regs is 0. */
goto out_invalid_counters;
else
return 0;
}
reference_perf = cpu_data->perf_caps.reference_perf;
delivered_perf = cppc_perf_from_fbctrs(reference_perf,
&fb_ctrs_t0, &fb_ctrs_t1);
if (!delivered_perf)
goto out_invalid_counters;
return cppc_perf_to_khz(&cpu_data->perf_caps, delivered_perf);
out_invalid_counters:
/*
* Feedback counters could be unchanged or 0 when a cpu enters a
* low-power idle state, e.g. clock-gated or power-gated.
* Use desired perf for reflecting frequency. Get the latest register
* value first as some platforms may update the actual delivered perf
* there; if failed, resort to the cached desired perf.
*/
if (cppc_get_desired_perf(cpu, &delivered_perf))
delivered_perf = cpu_data->perf_ctrls.desired_perf;
return cppc_perf_to_khz(&cpu_data->perf_caps, delivered_perf);
}
static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
{
struct cppc_cpudata *cpu_data = policy->driver_data;
struct cppc_perf_caps *caps = &cpu_data->perf_caps;
if (state)
policy->cpuinfo.max_freq = cppc_perf_to_khz(caps, caps->highest_perf);
else
policy->cpuinfo.max_freq = cppc_perf_to_khz(caps, caps->nominal_perf);
return 0;
}
static ssize_t show_freqdomain_cpus(struct cpufreq_policy *policy, char *buf)
{
struct cppc_cpudata *cpu_data = policy->driver_data;
return cpufreq_show_cpus(cpu_data->shared_cpu_map, buf);
}
static ssize_t show_auto_select(struct cpufreq_policy *policy, char *buf)
{
bool val;
int ret;
ret = cppc_get_auto_sel(policy->cpu, &val);
/* show "<unsupported>" when this register is not supported by cpc */
if (ret == -EOPNOTSUPP)
return sysfs_emit(buf, "<unsupported>\n");
if (ret)
return ret;
return sysfs_emit(buf, "%d\n", val);
}
static ssize_t store_auto_select(struct cpufreq_policy *policy,
const char *buf, size_t count)
{
struct cppc_cpudata *cpu_data = policy->driver_data;
bool val;
int ret;
ret = kstrtobool(buf, &val);
if (ret)
return ret;
ret = cppc_set_auto_sel(policy->cpu, val);
if (ret)
return ret;
cpu_data->perf_ctrls.auto_sel = val;
if (val) {
u32 old_min_perf = cpu_data->perf_ctrls.min_perf;
u32 old_max_perf = cpu_data->perf_ctrls.max_perf;
/*
* When enabling autonomous selection, program MIN_PERF and
* MAX_PERF from current policy limits so that the platform
* uses the correct performance bounds immediately.
*/
cppc_cpufreq_update_perf_limits(cpu_data, policy);
ret = cppc_set_perf(policy->cpu, &cpu_data->perf_ctrls);
if (ret) {
cpu_data->perf_ctrls.min_perf = old_min_perf;
cpu_data->perf_ctrls.max_perf = old_max_perf;
cppc_set_auto_sel(policy->cpu, false);
cpu_data->perf_ctrls.auto_sel = false;
return ret;
}
}
return count;
}
static ssize_t cppc_cpufreq_sysfs_show_u64(unsigned int cpu,
int (*get_func)(int, u64 *),
char *buf)
{
u64 val;
int ret = get_func((int)cpu, &val);
if (ret == -EOPNOTSUPP)
return sysfs_emit(buf, "<unsupported>\n");
if (ret)
return ret;
return sysfs_emit(buf, "%llu\n", val);
}
static ssize_t cppc_cpufreq_sysfs_store_u64(unsigned int cpu,
int (*set_func)(int, u64),
const char *buf, size_t count)
{
u64 val;
int ret;
ret = kstrtou64(buf, 0, &val);
if (ret)
return ret;
ret = set_func((int)cpu, val);
return ret ? ret : count;
}
#define CPPC_CPUFREQ_ATTR_RW_U64(_name, _get_func, _set_func) \
static ssize_t show_##_name(struct cpufreq_policy *policy, char *buf) \
{ \
return cppc_cpufreq_sysfs_show_u64(policy->cpu, _get_func, buf);\
} \
static ssize_t store_##_name(struct cpufreq_policy *policy, \
const char *buf, size_t count) \
{ \
return cppc_cpufreq_sysfs_store_u64(policy->cpu, _set_func, \
buf, count); \
}
CPPC_CPUFREQ_ATTR_RW_U64(auto_act_window, cppc_get_auto_act_window,
cppc_set_auto_act_window)
static ssize_t
show_energy_performance_preference_val(struct cpufreq_policy *policy, char *buf)
{
return cppc_cpufreq_sysfs_show_u64(policy->cpu, cppc_get_epp_perf, buf);
}
static ssize_t
store_energy_performance_preference_val(struct cpufreq_policy *policy,
const char *buf, size_t count)
{
struct cppc_cpudata *cpu_data = policy->driver_data;
u64 val;
int ret;
ret = kstrtou64(buf, 0, &val);
if (ret)
return ret;
ret = cppc_set_epp(policy->cpu, val);
if (ret)
return ret;
cpu_data->perf_ctrls.energy_perf = val;
return count;
}
CPPC_CPUFREQ_ATTR_RW_U64(perf_limited, cppc_get_perf_limited,
cppc_set_perf_limited)
cpufreq_freq_attr_ro(freqdomain_cpus);
cpufreq_freq_attr_rw(auto_select);
cpufreq_freq_attr_rw(auto_act_window);
cpufreq_freq_attr_rw(energy_performance_preference_val);
cpufreq_freq_attr_rw(perf_limited);
static struct freq_attr *cppc_cpufreq_attr[] = {
&freqdomain_cpus,
&auto_select,
&auto_act_window,
&energy_performance_preference_val,
&perf_limited,
NULL,
};
static struct cpufreq_driver cppc_cpufreq_driver = {
.flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
.verify = cppc_verify_policy,
.target = cppc_cpufreq_set_target,
.get = cppc_cpufreq_get_rate,
.fast_switch = cppc_cpufreq_fast_switch,
.init = cppc_cpufreq_cpu_init,
.exit = cppc_cpufreq_cpu_exit,
.set_boost = cppc_cpufreq_set_boost,
.attr = cppc_cpufreq_attr,
.name = "cppc_cpufreq",
};
static int __init cppc_cpufreq_init(void)
{
int ret;
if (!acpi_cpc_valid())
return -ENODEV;
cppc_freq_invariance_init();
populate_efficiency_class();
ret = cpufreq_register_driver(&cppc_cpufreq_driver);
if (ret)
cppc_freq_invariance_exit();
return ret;
}
static void __exit cppc_cpufreq_exit(void)
{
cpufreq_unregister_driver(&cppc_cpufreq_driver);
cppc_freq_invariance_exit();
}
module_exit(cppc_cpufreq_exit);
MODULE_AUTHOR("Ashwin Chaugule");
MODULE_DESCRIPTION("CPUFreq driver based on the ACPI CPPC v5.0+ spec");
MODULE_LICENSE("GPL");
late_initcall(cppc_cpufreq_init);
static const struct acpi_device_id cppc_acpi_ids[] __used = {
{ACPI_PROCESSOR_DEVICE_HID, },
{}
};
MODULE_DEVICE_TABLE(acpi, cppc_acpi_ids);