Commit Graph

949 Commits

Author SHA1 Message Date
Rafael J. Wysocki
d557640e4c sched: idle: Make skipping governor callbacks more consistent
If the cpuidle governor .select() callback is skipped because there
is only one idle state in the cpuidle driver, the .reflect() callback
should be skipped as well, at least for consistency (if not for
correctness), so do it.

Fixes: e5c9ffc6ae ("cpuidle: Skip governor when only one idle state is available")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Reviewed-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://patch.msgid.link/12857700.O9o76ZdvQC@rafael.j.wysocki
2026-03-10 16:03:02 +01:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
c3c1e98533 Merge tag 'pm-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
 "These are mostly fixes on top of the power management updates merged
  recently in cpuidle governors, in the Intel RAPL power capping driver
  and in the wake IRQ management code:

   - Fix the handling of package-scope MSRs in the intel_rapl power
     capping driver when called from the PMU subsystem and make it add
     all package CPUs to the PMU cpumask to allow tools to read RAPL
     events from any CPU in the package (Kuppuswamy Satharayananyan)

   - Rework the invalid version check in the intel_rapl_tpmi power
     capping driver to account for the fact that on partitioned systems,
     multiple TPMI instances may exist per package, but RAPL registers
     are only valid on one instance (Kuppuswamy Satharayananyan)

   - Describe the new intel_idle.table command line option in the
     admin-guide intel_idle documentation (Artem Bityutskiy)

   - Fix a crash in the ladder cpuidle governor on systems with only one
     (polling) idle state available by making the cpuidle core bypass
     the governor in those cases and adjust the other existing governors
     to that change (Aboorva Devarajan, Christian Loehle)

   - Update kerneldoc comments for wake IRQ management functions that
     have not been matching the code (Wang Jiayue)"

* tag 'pm-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: menu: Remove single state handling
  cpuidle: teo: Remove single state handling
  cpuidle: haltpoll: Remove single state handling
  cpuidle: Skip governor when only one idle state is available
  powercap: intel_rapl_tpmi: Remove FW_BUG from invalid version check
  PM: sleep: wakeirq: Update outdated documentation comments
  Documentation: PM: Document intel_idle.table command line option
  powercap: intel_rapl: Expose all package CPUs in PMU cpumask
  powercap: intel_rapl: Remove incorrect CPU check in PMU context
2026-02-18 14:11:47 -08:00
Christian Loehle
93983a9f3b cpuidle: menu: Remove single state handling
cpuidle systems where the governor has no choice because there's only
a single idle state are now handled by cpuidle core and bypass the
governor, so remove the related handling.

Signed-off-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Rebase on top of the cpuidle changes merged recently ]
Link: https://patch.msgid.link/20260216185005.1131593-5-aboorvad@linux.ibm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-02-17 15:49:53 +01:00
Christian Loehle
825d5d3479 cpuidle: teo: Remove single state handling
cpuidle systems where the governor has no choice because there's only
a single idle state are now handled by cpuidle core and bypass the
governor, so remove the related handling.

Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/20260216185005.1131593-4-aboorvad@linux.ibm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-02-17 15:49:53 +01:00
Aboorva Devarajan
9b9c0ff095 cpuidle: haltpoll: Remove single state handling
cpuidle systems where the governor has no choice because there's only
a single idle state are now handled by cpuidle core and bypass the
governor, so remove the related handling.

Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Extended the change to drop a redundant local variable ]
Link: https://patch.msgid.link/20260216185005.1131593-3-aboorvad@linux.ibm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-02-17 15:49:24 +01:00
Aboorva Devarajan
e5c9ffc6ae cpuidle: Skip governor when only one idle state is available
On certain platforms (PowerNV systems without a power-mgt DT node),
cpuidle may register only a single idle state. In cases where that
single state is a polling state (state 0), the ladder governor may
incorrectly treat state 1 as the first usable state and pass an
out-of-bounds index. This can lead to a NULL enter callback being
invoked, ultimately resulting in a system crash.

[   13.342636] cpuidle-powernv : Only Snooze is available
[   13.351854] Faulting instruction address: 0x00000000
[   13.376489] NIP [0000000000000000] 0x0
[   13.378351] LR  [c000000001e01974] cpuidle_enter_state+0x2c4/0x668

Fix this by adding a bail-out in cpuidle_select() that returns state 0
directly when state_count <= 1, bypassing the governor and keeping the
tick running.

Fixes: dc2251bf98 ("cpuidle: Eliminate the CPUIDLE_DRIVER_STATE_START symbol")
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/20260216185005.1131593-2-aboorvad@linux.ibm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-02-17 15:49:02 +01:00
Linus Torvalds
1c2b4a4c2b Merge tag 'pci-v7.0-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Don't try to enable Extended Tags on VFs since that bit is Reserved
     and causes misleading log messages (Håkon Bugge)

   - Initialize Endpoint Read Completion Boundary to match Root Port,
     regardless of ACPI _HPX (Håkon Bugge)

   - Apply _HPX PCIe Setting Record only to AER configuration, and only
     when OS owns PCIe hotplug but not AER, to avoid clobbering Extended
     Tag and Relaxed Ordering settings (Håkon Bugge)

  Resource management:

   - Move CardBus code to setup-cardbus.c and only build it when
     CONFIG_CARDBUS is set (Ilpo Järvinen)

   - Fix bridge window alignment with optional resources, where
     additional alignment requirement was previously lost (Ilpo
     Järvinen)

   - Stop over-estimating bridge window size since they are now assigned
     without any gaps between them (Ilpo Järvinen)

   - Increase resource MAX_IORES_LEVEL to avoid /proc/iomem flattening
     for nested bridges and endpoints (Ilpo Järvinen)

   - Add pbus_mem_size_optional() to handle sizes of optional resources
     (SR-IOV VF BARs, expansion ROMs, bridge windows) (Ilpo Järvinen)

   - Don't claim disabled bridge windows to avoid spurious claim
     failures (Ilpo Järvinen)

  Driver binding:

   - Fix device reference leak in pcie_port_remove_service() (Uwe
     Kleine-König)

   - Move pcie_port_bus_match() and pcie_port_bus_type to PCIe-specific
     portdrv.c (Uwe Kleine-König)

   - Convert portdrv to use pcie_port_bus_type.probe() and .remove()
     callbacks so .probe() and .remove() can eventually be removed from
     struct device_driver (Uwe Kleine-König)

  Error handling:

   - Clear stale errors on reporting agents upon probe so they don't
     look like recent errors (Lukas Wunner)

   - Add generic RAS tracepoint for hotplug events (Shuai Xue)

   - Add RAS tracepoint for link speed changes (Shuai Xue)

  Power management:

   - Avoid redundant delay on transition from D3hot to D3cold if the
     device was already in D3hot (Brian Norris)

   - Prevent runtime suspend until devices are fully initialized to
     avoid saving incompletely configured device state (Brian Norris)

  Power control:

   - Add power_on/off callbacks with generic signature to pwrseq,
     tc9563, and slot drivers so they can be used by pwrctrl core
     (Manivannan Sadhasivam)

   - Add PCIe M.2 connector support to the slot pwrctrl driver
     (Manivannan Sadhasivam)

   - Switch to pwrctrl interfaces to create, destroy, and power on/off
     devices, calling them from host controller drivers instead of the
     PCI core (Manivannan Sadhasivam)

   - Drop qcom .assert_perst() callbacks since this is now done by the
     controller driver instead of the pwrctrl driver (Manivannan
     Sadhasivam)

  Virtualization:

   - Remove an incorrect unlock in pci_slot_trylock() error handling
     (Jinhui Guo)

   - Lock the bridge device for slot reset (Keith Busch)

   - Enable ACS after IOMMU configuration on OF platforms so ACS is
     enabled an all devices; previously the first device enumerated
     (typically a Root Port) didn't have ACS enabled (Manivannan
     Sadhasivam)

   - Disable ACS Source Validation for IDT 0x80b5 and 0x8090 switches to
     work around hardware erratum; previously ACS SV was only
     temporarily disabled, which worked for enumeration but not after
     reset (Manivannan Sadhasivam)

  Peer-to-peer DMA:

   - Release per-CPU pgmap ref when vm_insert_page() fails to avoid hang
     when removing the PCI device (Hou Tao)

   - Remove incorrect p2pmem_alloc_mmap() warning about page refcount
     (Hou Tao)

  Endpoint framework:

   - Add configfs sub-groups synchronously to avoid NULL pointer
     dereference when racing with removal (Liu Song)

   - Fix swapped parameters in pci_{primary/secondary}_epc_epf_unlink()
     functions (Manikanta Maddireddy)

  ASPEED PCIe controller driver:

   - Add ASPEED Root Complex DT binding and driver (Jacky Chou)

  Freescale i.MX6 PCIe controller driver:

   - Add DT binding and driver support for an optional external refclock
     in addition to the refclock from the internal PLL (Richard Zhu)

   - Fix CLKREQ# control so host asserts it during enumeration and
     Endpoints can use it afterwards to exit the L1.2 link state
     (Richard Zhu)

  NVIDIA Tegra PCIe controller driver:

   - Export irq_domain_free_irqs() to allow PCI/MSI drivers that tear
     down MSI domains to be built as modules (Aaron Kling)

   - Allow pci-tegra to be built as a module (Aaron Kling)

  NVIDIA Tegra194 PCIe controller driver:

   - Relax Kconfig so tegra194 can be built for platforms beyond
     Tegra194 (Vidya Sagar)

  Qualcomm PCIe controller driver:

   - Merge SC8180x DT binding into SM8150 (Krzysztof Kozlowski)

   - Move SDX55, SDM845, QCS404, IPQ5018, IPQ6018, IPQ8074 Gen3,
     IPQ8074, IPQ4019, IPQ9574, APQ8064, MSM8996, APQ8084 to dedicated
     schema (Krzysztof Kozlowski)

   - Add DT binding and driver support for SA8255p Endpoint being
     configured by firmware (Mrinmay Sarkar)

   - Parse PERST# from all PCIe bridge nodes for future platforms that
     will have PERST# in Switch Downstream Ports as well as in Root
     Ports (Manivannan Sadhasivam)

  Renesas RZ/G3S PCIe controller driver:

   - Use pci_generic_config_write() since the writability provided by
     the custom wrapper is unnecessary (Claudiu Beznea)

  SOPHGO PCIe controller driver:

   - Disable ASPM L0s and L1 on Sophgo 2044 PCIe Root Ports (Inochi
     Amaoto)

  Synopsys DesignWare PCIe controller driver:

   - Extend PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() to return a
     pointer to the preceding Capability, to allow removal of
     Capabilities that are advertised but not fully implemented (Qiang
     Yu)

   - Remove MSI and MSI-X Capabilities in platforms that can't support
     them, so the PCI core automatically falls back to INTx (Qiang Yu)

   - Add ASPM L1.1 and L1.2 Substates context to debugfs ltssm_status
     for drivers that support this (Shawn Lin)

   - Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if
     link is not up to avoid an unnecessary timeout (Manivannan
     Sadhasivam)

   - Revert dw-rockchip, qcom, and DWC core changes that used link-up
     IRQs to trigger enumeration instead of waiting for link to be up
     because the PCI core doesn't allocate bus number space for
     hierarchies that might be attached (Niklas Cassel)

   - Make endpoint iATU entry for MSI permanent instead of programming
     it dynamically, which is slow and racy with respect to other
     concurrent traffic, e.g., eDMA (Koichiro Den)

   - Use iMSI-RX MSI target address when possible to fix endpoints using
     32-bit MSI (Shawn Lin)

   - Allow DWC host controller driver probe to continue if device is not
     found or found but inactive; only fail when there's an error with
     the link (Manivannan Sadhasivam)

   - For controllers like NXP i.MX6QP and i.MX7D, where LTSSM registers
     are not accessible after PME_Turn_Off, simply wait 10ms instead of
     polling for L2/L3 Ready (Richard Zhu)

   - Use multiple iATU entries to map large bridge windows and DMA
     ranges when necessary instead of failing (Samuel Holland)

   - Add EPC dynamic_inbound_mapping feature bit for Endpoint
     Controllers that can update BAR inbound address translation without
     requiring EPF driver to clear/reset the BAR first, and advertise it
     for DWC-based Endpoints (Koichiro Den)

   - Add EPC subrange_mapping feature bit for Endpoint Controllers that
     can map multiple independent inbound regions in a single BAR,
     implement subrange mapping, advertise it for DWC-based Endpoints,
     and add Endpoint selftests for it (Koichiro Den)

   - Make resizable BARs work for Endpoint multi-PF configurations;
     previously it only worked for PF 0 (Aksh Garg)

   - Fix Endpoint non-PF 0 support for BAR configuration, ATU mappings,
     and Address Match Mode (Aksh Garg)

   - Set up iATU when ECAM is enabled; previously IO and MEM outbound
     windows weren't programmed, and ECAM-related iATU entries weren't
     restored after suspend/resume, so config accesses failed (Krishna
     Chaitanya Chundru)

  Miscellaneous:

   - Use system_percpu_wq and WQ_PERCPU to explicitly request per-CPU
     work so WQ_UNBOUND can eventually be removed (Marco Crivellari)"

* tag 'pci-v7.0-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (176 commits)
  PCI/bwctrl: Disable BW controller on Intel P45 using a quirk
  PCI: Disable ACS SV for IDT 0x8090 switch
  PCI: Disable ACS SV for IDT 0x80b5 switch
  PCI: Cache ACS Capabilities register
  PCI: Enable ACS after configuring IOMMU for OF platforms
  PCI: Add ACS quirk for Pericom PI7C9X2G404 switches [12d8:b404]
  PCI: Add ACS quirk for Qualcomm Hamoa & Glymur
  PCI: Use device_lock_assert() to verify device lock is held
  PCI: Use lockdep_assert_held(pci_bus_sem) to verify lock is held
  PCI: Fix pci_slot_lock () device locking
  PCI: Fix pci_slot_trylock() error handling
  PCI: Mark Nvidia GB10 to avoid bus reset
  PCI: Mark ASM1164 SATA controller to avoid bus reset
  PCI: host-generic: Avoid reporting incorrect 'missing reg property' error
  PCI/PME: Replace RMW of Root Status register with direct write
  PCI/AER: Clear stale errors on reporting agents upon probe
  PCI: Don't claim disabled bridge windows
  PCI: rzg3s-host: Fix device node reference leak in rzg3s_pcie_host_parse_port()
  PCI: dwc: Fix missing iATU setup when ECAM is enabled
  PCI: dwc: Clean up iATU index usage in dw_pcie_iatu_setup()
  ...
2026-02-11 17:20:38 -08:00
Linus Torvalds
bdbddf72a2 Merge tag 'soc-drivers-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC driver updates from Arnd Bergmann:
 "There are are a number of to firmware drivers, in particular the TEE
  subsystem:

   - a bus callback for TEE firmware that device drivers can register to

   - sysfs support for tee firmware information

   - minor updates to platform specific TEE drivers for AMD, NXP,
     Qualcomm and the generic optee driver

   - ARM SCMI firmware refactoring to improve the protocol discover
     among other fixes and cleanups

   - ARM FF-A firmware interoperability improvements

  The reset controller and memory controller subsystems gain support for
  additional hardware platforms from Mediatek, Renesas, NXP, Canaan and
  SpacemiT.

  Most of the other changes are for random drivers/soc code. Among a
  number of cleanups and newly added hardware support, including:

   - Mediatek MT8196 DVFS power management and mailbox support

   - Qualcomm SCM firmware and MDT loader refactoring, as part of the
     new Glymur platform support.

   - NXP i.MX9 System Manager firmware support for accessing the syslog

   - Minor updates for TI, Renesas, Samsung, Apple, Marvell and AMD
     SoCs"

* tag 'soc-drivers-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (171 commits)
  bus: fsl-mc: fix an error handling in fsl_mc_device_add()
  reset: spacemit: Add SpacemiT K3 reset driver
  reset: spacemit: Extract common K1 reset code
  reset: Create subdirectory for SpacemiT drivers
  dt-bindings: soc: spacemit: Add K3 reset support and IDs
  reset: canaan: k230: drop OF dependency and enable by default
  reset: rzg2l-usbphy-ctrl: Add suspend/resume support
  reset: rzg2l-usbphy-ctrl: Propagate the return value of regmap_field_update_bits()
  reset: gpio: check the return value of gpiod_set_value_cansleep()
  reset: imx8mp-audiomix: Support i.MX8ULP SIM LPAV
  reset: imx8mp-audiomix: Extend the driver usage
  reset: imx8mp-audiomix: Switch to using regmap API
  reset: imx8mp-audiomix: Drop unneeded macros
  soc: fsl: qe: qe_ports_ic: Consolidate chained IRQ handler install/remove
  soc: mediatek: mtk-cmdq: Add mminfra_offset adjustment for DRAM addresses
  soc: mediatek: mtk-cmdq: Extend cmdq_pkt_write API for SoCs without subsys ID
  soc: mediatek: mtk-cmdq: Add pa_base parsing for hardware without subsys ID support
  soc: mediatek: mtk-cmdq: Add cmdq_get_mbox_priv() in cmdq_pkt_create()
  mailbox: mtk-cmdq: Add driver data to support for MT8196
  mailbox: mtk-cmdq: Add mminfra_offset configuration for DRAM transaction
  ...
2026-02-10 20:45:30 -08:00
Rafael J. Wysocki
a971f984b8 cpuidle: governors: teo: Refine intercepts-based idle state lookup
There are cases in which decisions made by the teo governor are
arguably overly conservative.

For instance, suppose that there are 4 idle states and the values of
the intercepts metric for the first 3 of them are 400, 250, and 251,
respectively.  If the total sum computed in teo_update() is 1000, the
governor will select idle state 1 (provided that all idle states are
enabled and the scheduler tick has not been stopped) although arguably
idle state 0 would be a better choice because the likelihood of getting
an idle duration below the target residency of idle state 1 is greater
than the likelihood of getting an idle duration between the target
residency of idle state 1 and the target residency of idle state 2.

To address this, refine the candidate idle state lookup based on
intercepts to start at the state with the maximum intercepts metric,
below the deepest enabled one, to avoid the cases in which the search
may stop before reaching that state.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Fixed typo "intercetps" in new comments (3 places) ]
Link: https://patch.msgid.link/2417298.ElGaqSPkdT@rafael.j.wysocki
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-30 20:15:52 +01:00
Rafael J. Wysocki
f36de72673 cpuidle: governors: teo: Adjust the classification of wakeup events
If differences between target residency values of adjacent idle states
of a given CPU are relatively large, the corresponding idle state bins
used by the teo governors are large either and the rule by which hits
are distinguished from intercepts is inaccurate.

Namely, by that rule, a wakeup event is classified as a hit if the
sleep length (the time till the closest timer other than the tick)
and the measured idle duration, adjusted for the entered idle state
exit latency, fall into the same idle state bin.  However, if that bin
is large enough, the actual difference between the sleep length and
the measured idle duration may be significant.  It may in fact be
significantly greater than the analogous difference for an event where
the sleep length and the measured idle duration fall into different
bins.

For this reason, amend the rule in question with a check that will only
allow a wakeup event to be counted as a hit if the sleep length is less
than the "raw" measured idle duration (which means that the wakeup
appears to have occurred after the anticipated timer event).  Otherwise,
the event will be counted as an intercept.

Also update the documentation part explaining the difference between
"hits" and "intercepts" to take the above change into account.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/5093379.31r3eYUQgx@rafael.j.wysocki
2026-01-30 20:15:43 +01:00
Rafael J. Wysocki
475ca3470b cpuidle: governors: teo: Refine tick_intercepts vs total events check
Use 2/3 as the proportion coefficient in the check comparing
cpu_data->tick_intercepts with cpu_data->total because it is close
enough to the current one (5/8) and it allows of more straightforward
interpretation (on average, intercepts within the tick period length
are twice as frequent as other events).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/10793374.nUPlyArG6x@rafael.j.wysocki
2026-01-23 21:50:38 +01:00
Rafael J. Wysocki
60836533b4 cpuidle: governors: teo: Avoid fake intercepts produced by tick
Tick wakeups can lead to fake intercepts that may skew idle state
selection towards shallow states, so it is better to avoid counting
them as intercepts.

For this purpose, add a check causing teo_update() to only count
tick wakeups as intercepts if intercepts within the tick period
range are at least twice as frequent as any other events.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/3404606.44csPzL39Z@rafael.j.wysocki
2026-01-23 21:50:38 +01:00
Rafael J. Wysocki
4bd2221f23 cpuidle: governors: teo: Avoid selecting states with zero-size bins
If the last two enabled idle states have the same target residency which
is at least equal to TICK_NSEC, teo may select the next-to-last one even
though the size of that state's bin is 0, which is confusing.

Prevent that from happening by adding a target residency check to the
relevant code path.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Fixed a typo in the changelog ]
Link: https://patch.msgid.link/3033265.e9J7NaK4W3@rafael.j.wysocki
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-23 21:49:54 +01:00
Rafael J. Wysocki
80606f4eb8 cpuidle: governors: menu: Always check timers with tick stopped
After commit 5484e31bbb ("cpuidle: menu: Skip tick_nohz_get_sleep_length()
call in some cases"), if the return value of get_typical_interval()
multiplied by NSEC_PER_USEC is not greater than RESIDENCY_THRESHOLD_NS,
the menu governor will skip computing the time till the closest timer.
If that happens when the tick has been stopped already, the selected
idle state may be too deep due to the subsequent check comparing
predicted_ns with TICK_NSEC and causing its value to be replaced with
the expected time till the closest timer, which is KTIME_MAX in that
case.  That will cause the deepest enabled idle state to be selected,
but the time till the closest timer very well may be shorter than the
target residency of that state, in which case a shallower state should
be used.

Address this by making menu_select() always compute the time till the
closest timer when the tick has been stopped.

Also move the predicted_ns check mentioned above into the branch in
which the time till the closest timer is determined because it only
needs to be done in that case.

Fixes: 5484e31bbb ("cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/5959091.DvuYhMxLoT@rafael.j.wysocki
2026-01-23 21:22:42 +01:00
Michal Simek
1f58ad77a8 cpuidle: zynq: Switch Michal Simek's email to new one
@xilinx.com is still working but better to switch to new amd.com after
AMD/Xilinx acquisition.

Signed-off-by: Michal Simek <michal.simek@amd.com>
Link: https://lore.kernel.org/r/ebfbf945d90b0efff3ce0dc17fb7f1f0db5b6628.1765787278.git.michal.simek@amd.com
2026-01-12 10:21:30 +01:00
Breno Leitao
fcbd7897b8 cpuidle: menu: Remove incorrect unlikely() annotation
The unlikely() annotation on the early-return condition in menu_select()
is incorrect on systems with only one idle state (e.g., ARM64 servers
with a single ACPI LPI state). Branch profiling shows 100% misprediction
on such systems since drv->state_count <= 1 is always true.

On platforms where only state0 is available, this path is the common
case, not an unlikely edge case. Remove the misleading annotation to
let the branch predictor learn the actual behavior.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260105-annotated_idle-v1-1-10ddf0771b58@debian.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-09 21:52:54 +01:00
Aaron Kling
eefff3d9f6 cpuidle: tegra: Export tegra_cpuidle_pcie_irqs_in_use()
Export tegra_cpuidle_pcie_irqs_in_use() to allow pci-tegra to be built as a
module.

Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250731-pci-tegra-module-v7-2-cad4b088b8fb@gmail.com
2025-12-26 10:57:17 -06:00
Linus Torvalds
208eed95fc Merge tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC driver updates from Arnd Bergmann:
 "This is the first half of the driver changes:

   - A treewide interface change to the "syscore" operations for power
     management, as a preparation for future Tegra specific changes

   - Reset controller updates with added drivers for LAN969x, eic770 and
     RZ/G3S SoCs

   - Protection of system controller registers on Renesas and Google
     SoCs, to prevent trivially triggering a system crash from e.g.
     debugfs access

   - soc_device identification updates on Nvidia, Exynos and Mediatek

   - debugfs support in the ST STM32 firewall driver

   - Minor updates for SoC drivers on AMD/Xilinx, Renesas, Allwinner, TI

   - Cleanups for memory controller support on Nvidia and Renesas"

* tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (114 commits)
  memory: tegra186-emc: Fix missing put_bpmp
  Documentation: reset: Remove reset_controller_add_lookup()
  reset: fix BIT macro reference
  reset: rzg2l-usbphy-ctrl: Fix a NULL vs IS_ERR() bug in probe
  reset: th1520: Support reset controllers in more subsystems
  reset: th1520: Prepare for supporting multiple controllers
  dt-bindings: reset: thead,th1520-reset: Add controllers for more subsys
  dt-bindings: reset: thead,th1520-reset: Remove non-VO-subsystem resets
  reset: remove legacy reset lookup code
  clk: davinci: psc: drop unused reset lookup
  reset: rzg2l-usbphy-ctrl: Add support for RZ/G3S SoC
  reset: rzg2l-usbphy-ctrl: Add support for USB PWRRDY
  dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document RZ/G3S support
  reset: eswin: Add eic7700 reset driver
  dt-bindings: reset: eswin: Documentation for eic7700 SoC
  reset: sparx5: add LAN969x support
  dt-bindings: reset: microchip: Add LAN969x support
  soc: rockchip: grf: Add select correct PWM implementation on RK3368
  soc/tegra: pmc: Add USB wake events for Tegra234
  amba: tegra-ahb: Fix device leak on SMMU enable
  ...
2025-12-05 17:29:04 -08:00
Linus Torvalds
6044a1ee9d Merge tag 'devicetree-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
 "DT bindings:

   - Convert lattice,ice40-fpga-mgr, apm,xgene-storm-dma,
     brcm,sr-thermal, amazon,al-thermal, brcm,ocotp, mt8173-mdp, Actions
     Owl SPS, Marvell AP80x System Controller, Marvell CP110 System
     Controller, cznic,moxtet, and apm,xgene-slimpro-mbox to DT schema
     format

   - Add i.MX95 fsl,irqsteer, MT8365 Mali Bifrost GPU, Anvo ANV32C81W
     EEPROM, and Microchip pic64gx PLIC

   - Add missing LGE, AMD Seattle, and APM X-Gene SoC platform
     compatibles

   - Updates to brcm,bcm2836-l1-intc, brcm,bcm2835-hvs, and bcm2711-hdmi
     bindings to fix warnings on BCM2712 platforms

   - Drop obsolete db8500-thermal.txt

   - Treewide clean-up of extra blank lines and inconsistent quoting

   - Ensure all .dtbo targets are applied to a base .dtb

   - Speed up dt_binding_check by skipping running validation on empty
     examples

  DT core:

   - Add of_machine_device_match() and of_machine_get_match_data()
     helpers and convert users treewide

   - Fix bounds checking of address properties in FDT code. Rework the
     code to have a single implementation of the bounds checks.

   - Rework of_irq_init() to ignore any implicit interrupt-parent (i.e.
     in a parent node) on nodes without an interrupt. This matches the
     spec description and fixes some RISC-V platforms.

   - Avoid a spurious message on overlay removal

   - Skip DT kunit tests on RISCV+ACPI"

* tag 'devicetree-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (55 commits)
  dt-bindings: kbuild: Skip validating empty examples
  dt-bindings: interrupt-controller: brcm,bcm2836-l1-intc: Drop interrupt-controller requirement
  dt-bindings: display: Fix brcm,bcm2835-hvs bindings for BCM2712
  dt-bindings: display: bcm2711-hdmi: Add interrupt details for BCM2712
  of: Skip devicetree kunit tests when RISCV+ACPI doesn't populate root node
  soc: tegra: Simplify with of_machine_device_match()
  soc: qcom: ubwc: Simplify with of_machine_get_match_data()
  powercap: dtpm: Simplify with of_machine_get_match_data()
  platform: surface: Simplify with of_machine_get_match_data()
  irqchip/atmel-aic: Simplify with of_machine_get_match_data()
  firmware: qcom: scm: Simplify with of_machine_device_match()
  cpuidle: big_little: Simplify with of_machine_device_match()
  cpufreq: sun50i: Simplify with of_machine_device_match()
  cpufreq: mediatek: Simplify with of_machine_get_match_data()
  cpufreq: dt-platdev: Simplify with of_machine_get_match_data()
  of: Add wrappers to match root node with OF device ID tables
  dt-bindings: eeprom: at25: Add Anvo ANV32C81W
  of/reserved_mem: Simplify the logic of __reserved_mem_alloc_size()
  of/reserved_mem: Simplify the logic of fdt_scan_reserved_mem_reg_nodes()
  of/reserved_mem: Simplify the logic of __reserved_mem_reserve_reg()
  ...
2025-12-04 15:50:37 -08:00
Linus Torvalds
52206f82d9 Merge tag 'pmdomain-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain updates from Ulf Hansson:
 "pmdomain core:
   - Allow power-off for out-of-band wakeup-capable devices
   - Drop the redundant call to dev_pm_domain_detach() for the amba bus
   - Extend the genpd governor for CPUs to account for IPIs

  pmdomain providers:
   - bcm: Add support for BCM2712
   - mediatek: Add support for MFlexGraphics power domains
   - mediatek: Add support for MT8196 power domains
   - qcom: Add RPMh power domain support for Kaanapali
   - rockchip: Add support for RV1126B

  pmdomain consumers:
   - usb: dwc3: Enable out of band wakeup for i.MX95
   - usb: chipidea: Enable out of band wakeup for i.MX95"

* tag 'pmdomain-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: (26 commits)
  pmdomain: Extend the genpd governor for CPUs to account for IPIs
  smp: Introduce a helper function to check for pending IPIs
  pmdomain: mediatek: convert from clk round_rate() to determine_rate()
  amba: bus: Drop dev_pm_domain_detach() call
  pmdomain: bcm: bcm2835-power: Prepare to support BCM2712
  pmdomain: mediatek: mtk-mfg: select MAILBOX in Kconfig
  pmdomain: mediatek: Add support for MFlexGraphics
  pmdomain: mediatek: Fix build-errors
  cpuidle: psci: Replace deprecated strcpy in psci_idle_init_cpu
  pmdomain: rockchip: Add support for RV1126B
  pmdomain: mediatek: Add support for MT8196 HFRPSYS power domains
  pmdomain: mediatek: Add support for MT8196 SCPSYS power domains
  pmdomain: mediatek: Add support for secure HWCCF infra power on
  pmdomain: mediatek: Add support for Hardware Voter power domains
  pmdomain: qcom: rpmhpd: Add RPMh power domain support for Kaanapali
  usb: dwc3: imx8mp: Set out of band wakeup for i.MX95
  usb: chipidea: ci_hdrc_imx: Set out of band wakeup for i.MX95
  usb: chipidea: core: detach power domain for ci_hdrc platform device
  pmdomain: core: Allow power-off for out-of-band wakeup-capable devices
  PM: wakeup: Add out-of-band system wakeup support for devices
  ...
2025-12-04 13:50:39 -08:00
Rafael J. Wysocki
7cede21e9f Merge branches 'pm-qos' and 'pm-tools'
Merge PM QoS updates and a cpupower utility update for 6.19-rc1:

 - Introduce and document a QoS limit on CPU exit latency during wakeup
   from suspend-to-idle (Ulf Hansson)

 - Add support for building libcpupower statically (Zuo An)

* pm-qos:
  Documentation: power/cpuidle: Document the CPU system wakeup latency QoS
  cpuidle: Respect the CPU system wakeup QoS limit for cpuidle
  sched: idle: Respect the CPU system wakeup QoS limit for s2idle
  pmdomain: Respect the CPU system wakeup QoS limit for cpuidle
  pmdomain: Respect the CPU system wakeup QoS limit for s2idle
  PM: QoS: Introduce a CPU system wakeup QoS limit

* pm-tools:
  tools/power/cpupower: Support building libcpupower statically
2025-11-28 16:50:45 +01:00
Rafael J. Wysocki
bf7ae1773e Merge branches 'pm-cpuidle' and 'pm-powercap'
Merge cpuidle and power capping updates for 6.19-rc1:

 - Use residency threshold in polling state override decisions in the
   menu cpuidle governor (Aboorva Devarajan)

 - Add sanity check for exit latency and target residency in the cpufreq
   core (Rafael Wysocki)

 - Use this_cpu_ptr() where possible in the teo governor (Christian
   Loehle)

 - Rework the handling of tick wakeups in the teo cpuidle governor to
   increase the likelihood of stopping the scheduler tick in the cases
   when tick wakeups can be counted as non-timer ones (Rafael Wysocki)

 - Fix a reverse condition in the teo cpuidle governor and drop a
   misguided target residency check from it (Rafael Wysocki)

 - Clean up muliple minor defects in the teo cpuidle governor (Rafael
   Wysocki)

 - Update header inclusion to make it follow the Include What You Use
   principle (Andy Shevchenko)

 - Enable MSR-based RAPL PMU support in the intel_rapl power capping
   driver and arrange for using it on the Panther Lake and Wildcat Lake
   processors (Kuppuswamy Sathyanarayanan)

 - Add support for Nova Lake and Wildcat Lake processors to the
   intel_rapl power capping driver (Kaushlendra Kumar, Srinivas
   Pandruvada)

* pm-cpuidle:
  cpuidle: Warn instead of bailing out if target residency check fails
  cpuidle: Update header inclusion
  cpuidle: governors: teo: Add missing space to the description
  cpuidle: governors: teo: Simplify intercepts-based state lookup
  cpuidle: governors: teo: Fix tick_intercepts handling in teo_update()
  cpuidle: governors: teo: Rework the handling of tick wakeups
  cpuidle: governors: teo: Decay metrics below DECAY_SHIFT threshold
  cpuidle: governors: teo: Use s64 consistently in teo_update()
  cpuidle: governors: teo: Drop redundant function parameter
  cpuidle: governors: teo: Drop misguided target residency check
  cpuidle: teo: Use this_cpu_ptr() where possible
  cpuidle: Add sanity check for exit latency and target residency
  cpuidle: menu: Use residency threshold in polling state override decisions

* pm-powercap:
  powercap: intel_rapl: Enable MSR-based RAPL PMU support
  powercap: intel_rapl: Prepare read_raw() interface for atomic-context callers
  powercap: intel_rapl: Add support for Nova Lake processors
  powercap: intel_rapl: Add support for Wildcat Lake platform
2025-11-28 16:29:41 +01:00
Krzysztof Kozlowski
4b94d21fac cpuidle: big_little: Simplify with of_machine_device_match()
Replace open-coded getting root OF node and matching against it with
new of_machine_device_match() helper.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patch.msgid.link/20251112-b4-of-match-matchine-data-v2-5-d46b72003fd6@linaro.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-11-26 19:42:40 -06:00
Rafael J. Wysocki
4bf944f3fc cpuidle: Warn instead of bailing out if target residency check fails
It turns out that the change in commit 76934e495c ("cpuidle: Add
sanity check for exit latency and target residency") goes too far
because there are systems in the field on which the check introduced
by that commit does not pass.

For this reason, change __cpuidle_driver_init() return type back to void
and make it print a warning when the check mentioned above does not
pass.

Fixes: 76934e495c ("cpuidle: Add sanity check for exit latency and target residency")
Reported-by: Val Packett <val@packett.cool>
Closes: https://lore.kernel.org/linux-pm/20251121010756.6687-1-val@packett.cool/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/2808566.mvXUDI8C0e@rafael.j.wysocki
2025-11-25 19:06:38 +01:00
Andy Shevchenko
6d96ceff9a cpuidle: Update header inclusion
While cleaning up some headers, I got a build error on this file:

drivers/cpuidle/poll_state.c:52:2: error: call to undeclared library function 'snprintf' with type 'int (char *restrict, unsigned long, const char *restrict, ...)'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]

Update header inclusions to follow IWYU (Include What You Use)
principle.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251124205752.1328701-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25 19:04:29 +01:00
Ulf Hansson
2b8d594742 cpuidle: Respect the CPU system wakeup QoS limit for cpuidle
The CPU system wakeup QoS limit must be respected for the regular cpuidle
state selection. Therefore, let's extend the common governor helper
cpuidle_governor_latency_req(), to take the constraint into account.

Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com>
Tested-by: Kevin Hilman (TI) <khilman@baylibre.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/20251125112650.329269-6-ulf.hansson@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25 19:01:29 +01:00
Ulf Hansson
99b42445f4 sched: idle: Respect the CPU system wakeup QoS limit for s2idle
A CPU system wakeup QoS limit may have been requested by user space. To
avoid breaking this constraint when entering a low power state during
s2idle, let's start to take into account the QoS limit.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com>
Tested-by: Kevin Hilman (TI) <khilman@baylibre.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/20251125112650.329269-5-ulf.hansson@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25 19:01:29 +01:00
Rafael J. Wysocki
15bfdadd61 cpuidle: governors: teo: Add missing space to the description
There is a missing space in the governor description comment, so add it.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/5059034.31r3eYUQgx@rafael.j.wysocki
2025-11-24 20:43:32 +01:00
Rafael J. Wysocki
d834e68a0e cpuidle: governors: teo: Simplify intercepts-based state lookup
Simplify the loop looking up a candidate idle state in the case when an
intercept is likely to occur by adding a search for the state index limit
if the tick is stopped before it.

First, call tick_nohz_tick_stopped() just once and if it returns true,
look for the shallowest state index below the current candidate one with
target residency at least equal to the tick period length.

Next, simply look for a state that is not shallower than the one found
in the previous step and satisfies the intercepts majority condition (if
there are no such states, the shallowest state that is not shallower
than the one found in the previous step becomes the new candidate).

Since teo_state_ok() has no callers any more after the above changes,
drop it.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Changelog clarification and code comment edit ]
Link: https://patch.msgid.link/2418792.ElGaqSPkdT@rafael.j.wysocki
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-20 16:32:48 +01:00
Rafael J. Wysocki
50db438231 cpuidle: governors: teo: Fix tick_intercepts handling in teo_update()
The condition deciding whether or not to increase cpu_data->tick_intercepts
in teo_update() is reverse, so fix it.

Fixes: d619b5cc67 ("cpuidle: teo: Simplify counting events used for tick management")
Cc: 6.14+ <stable@vger.kernel.org> # 6.14+: 0796ddf4a7: cpuidle: teo: Use this_cpu_ptr() where possible
Cc: 6.14+ <stable@vger.kernel.org> # 6.14+: 8f3f01082d: cpuidle: governors: teo: Use s64 consistently in teo_update()
Cc: 6.14+ <stable@vger.kernel.org> # 6.14+: b54df61c74: cpuidle: governors: teo: Decay metrics below DECAY_SHIFT threshold
Cc: 6.14+ <stable@vger.kernel.org> 6.14+: 083654ded5: cpuidle: governors: teo: Rework the handling of tick wakeups
Cc: 6.14+ <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/5085160.31r3eYUQgx@rafael.j.wysocki
2025-11-20 14:54:08 +01:00
Rafael J. Wysocki
083654ded5 cpuidle: governors: teo: Rework the handling of tick wakeups
If the wakeup pattern is clearly dominated by tick wakeups, count those
wakeups as hits on the deepest available idle state to increase the
likelihood of stopping the tick, especially on systems where there are
only 2 usable idle states and the tick can only be stopped when the
deeper state is selected.

This change is expected to reduce power on some systems where state 0 is
selected relatively often even though they are almost idle.  Without it,
the governor may end up selecting the shallowest idle state all the time
even if the system is almost completely idle due all tick wakeups being
counted as hits on that state and preventing the tick from being stopped
at all.

Fixes: 4b20b07ce7 ("cpuidle: teo: Don't count non-existent intercepts")
Reported-by: Reka Norman <rekanorman@chromium.org>
Closes: https://lore.kernel.org/linux-pm/CAEmPcwsNMNnNXuxgvHTQ93Mx-q3Oz9U57THQsU_qdcCx1m4w5g@mail.gmail.com/
Tested-by: Reka Norman <rekanorman@chromium.org>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: 92ce5c07b7: cpuidle: teo: Reorder candidate state index checks
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: ea185406d1: cpuidle: teo: Combine candidate state index checks against 0
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: b9a6af26bd: cpuidle: teo: Drop local variable prev_intercept_idx
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: e24f8a55de: cpuidle: teo: Clarify two code comments
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: d619b5cc67: cpuidle: teo: Simplify counting events used for tick management
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: 13ed5c4a6d: cpuidle: teo: Skip getting the sleep length if wakeups are very frequent
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: ddcfa79646: cpuidle: teo: Simplify handling of total events count
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: 65e18e6544: cpuidle: teo: Replace time_span_ns with a flag
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: 0796ddf4a7: cpuidle: teo: Use this_cpu_ptr() where possible
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: 8f3f01082d: cpuidle: governors: teo: Use s64 consistently in teo_update()
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: b54df61c74: cpuidle: governors: teo: Decay metrics below DECAY_SHIFT threshold
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[ rjw: Rebase on commit 0796ddf4a7, changelog update ]
Link: https://patch.msgid.link/6228387.lOV4Wx5bFT@rafael.j.wysocki
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-20 14:49:57 +01:00
Thorsten Blum
e938ef83a0 cpuidle: psci: Replace deprecated strcpy in psci_idle_init_cpu
strcpy() is deprecated; use strscpy() instead.

Link: https://github.com/KSPP/linux/issues/88
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-11-19 18:06:50 +01:00
Rafael J. Wysocki
b54df61c74 cpuidle: governors: teo: Decay metrics below DECAY_SHIFT threshold
If a given governor metric falls below a certain value (8 for
DECAY_SHIFT equal to 3), it will not decay any more due to the
simplistic decay implementation.  This may in some cases lead to
subtle inconsistencies in the governor behavior, so change the
decay implementation to take it into account and set the metric
at hand to 0 in that case.

Suggested-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/2819353.mvXUDI8C0e@rafael.j.wysocki
2025-11-14 15:20:01 +01:00
Rafael J. Wysocki
8f3f01082d cpuidle: governors: teo: Use s64 consistently in teo_update()
Two local variables in teo_update() are defined as u64, but their
values are then compared with s64 values, so it is more consistent
to use s64 as their data type.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/3026616.e9J7NaK4W3@rafael.j.wysocki
2025-11-14 15:20:01 +01:00
Rafael J. Wysocki
17673f64a0 cpuidle: governors: teo: Drop redundant function parameter
The last no_poll parameter of teo_find_shallower_state() is always
false, so drop it.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/2253109.irdbgypaU6@rafael.j.wysocki
2025-11-14 15:20:01 +01:00
Rafael J. Wysocki
a03b201180 cpuidle: governors: teo: Drop misguided target residency check
When the target residency of the current candidate idle state is
greater than the expected time till the closest timer (the sleep
length), it does not matter whether or not the tick has already been
stopped or if it is going to be stopped.  The closest timer will
trigger anyway at its due time, so if an idle state with target
residency above the sleep length is selected, energy will be wasted
and there may be excess latency.

Of course, if the closest timer were canceled before it could trigger,
a deeper idle state would be more suitable, but this is not expected
to happen (generally speaking, hrtimers are not expected to be
canceled as a rule).

Accordingly, the teo_state_ok() check done in that case causes energy to
be wasted more often than it allows any energy to be saved (if it allows
any energy to be saved at all), so drop it and let the governor use the
teo_find_shallower_state() return value as the new candidate idle state
index.

Fixes: 21d28cd2fa ("cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/5955081.DvuYhMxLoT@rafael.j.wysocki
2025-11-14 15:19:39 +01:00
Thierry Reding
a97fbc3ee3 syscore: Pass context data to callbacks
Several drivers can benefit from registering per-instance data along
with the syscore operations. To achieve this, move the modifiable fields
out of the syscore_ops structure and into a separate struct syscore that
can be registered with the framework. Add a void * driver data field for
drivers to store contextual data that will be passed to the syscore ops.

Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-14 10:01:52 +01:00
Christian Loehle
0796ddf4a7 cpuidle: teo: Use this_cpu_ptr() where possible
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.

This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0:	hint	#0x19
ffffffc080ffcff4:	stp	x29, x30, [sp, #-48]!
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8:	adrp	x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc:	add	x29, sp, #0x0
ffffffc080ffd000:	stp	x19, x20, [sp, #16]
ffffffc080ffd004:	orr	x20, xzr, x0
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008:	add	x0, x2, #0xc20
{
ffffffc080ffd00c:	stp	x21, x22, [sp, #32]
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010:	adrp	x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014:	add	x19, x19, #0xbb0
ffffffc080ffd018:	ldr	w3, [x20, #4]

	dev->last_state_idx = state;

to

static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034:	hint	#0x19
ffffffc080ffd038:	stp	x29, x30, [sp, #-48]!
ffffffc080ffd03c:	add	x29, sp, #0x0
ffffffc080ffd040:	stp	x19, x20, [sp, #16]
ffffffc080ffd044:	orr	x20, xzr, x0
	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048:	adrp	x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c:	stp	x21, x22, [sp, #32]
	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050:	add	x19, x19, #0xbb0

	dev->last_state_idx = state;

This saves us:
	adrp    x2, ffffffc0848c0000 <gicv5_global_data+0x28>
	add     x0, x2, #0xc20
	ldr     w3, [x20, #4]

Signed-off-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20251110120819.714560-1-christian.loehle@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-12 21:02:07 +01:00
Rafael J. Wysocki
76934e495c cpuidle: Add sanity check for exit latency and target residency
Make __cpuidle_driver_init() fail if the exit latency of one of the
driver's idle states is less than its target residency which would
break cpuidle assumptions.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Changelog fix ]
Link: https://patch.msgid.link/12779486.O9o76ZdvQC@rafael.j.wysocki
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-12 21:00:00 +01:00
Linus Torvalds
225a97d6d4 Merge tag 'riscv-for-linus-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:

 - A fix to disable KASAN checks while walking a non-current task's
   stackframe (following x86)

 - A fix for a kvrealloc()-related memory leak in
   module_frob_arch_sections()

 - Two replacements of strcpy() with strscpy()

 - A change to use the RISC-V .insn assembler directive when possible to
   assemble instructions from hex opcodes

 - Some low-impact fixes in the ptdump code and kprobes test code

* tag 'riscv-for-linus-6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  cpuidle: riscv-sbi: Replace deprecated strcpy in sbi_cpuidle_init_cpu
  riscv: KGDB: Replace deprecated strcpy in kgdb_arch_handle_qxfer_pkt
  riscv: asm: use .insn for making custom instructions
  riscv: tests: Make RISCV_KPROBES_KUNIT tristate
  riscv: tests: Rename kprobes_test_riscv to kprobes_riscv
  riscv: Fix memory leak in module_frob_arch_sections()
  riscv: ptdump: use seq_puts() in pt_dump_seq_puts() macro
  riscv: stacktrace: Disable KASAN checks for non-current tasks
2025-11-06 15:44:18 -08:00
Thorsten Blum
2e44856783 cpuidle: riscv-sbi: Replace deprecated strcpy in sbi_cpuidle_init_cpu
strcpy() is deprecated; use strscpy() instead.

Link: https://github.com/KSPP/linux/issues/88
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://lore.kernel.org/r/20251021135155.1409-2-thorsten.blum@linux.dev
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-10-27 23:38:13 -06:00
Aboorva Devarajan
07d8157012 cpuidle: menu: Use residency threshold in polling state override decisions
On virtualized PowerPC (pseries) systems, where only one polling state
(Snooze) and one deep state (CEDE) are available, selecting CEDE when
the predicted idle duration is less than the target residency of CEDE
state can hurt performance. In such cases, the entry/exit overhead of
CEDE outweighs the power savings, leading to unnecessary state
transitions and higher latency.

Menu governor currently contains a special-case rule that prioritizes
the first non-polling state over polling, even when its target residency
is much longer than the predicted idle duration. On PowerPC/pseries,
where the gap between the polling state (Snooze) and the first non-polling
state (CEDE) is large, this behavior causes performance regressions.

Refine that special case by adding an extra requirement: the first
non-polling state can only be chosen if its target residency is below
the defined RESIDENCY_THRESHOLD_NS. If this condition is not satisfied,
polling is allowed instead, avoiding suboptimal non-polling state
entries.

This change is limited to the single special-case rule for the first
non-polling state. The general non-polling state selection logic in the
menu governor remains unchanged.

Performance improvement observed with pgbench on PowerPC (pseries)
system:
+---------------------------+------------+------------+------------+
| Metric                    | Baseline   | Patched    | Change (%) |
+---------------------------+------------+------------+------------+
| Transactions/sec (TPS)    | 495,210    | 536,982    | +8.45%     |
| Avg latency (ms)          | 0.163      | 0.150      | -7.98%     |
+---------------------------+------------+------------+------------+

CPUIdle state usage:
+--------------+--------------+-------------+
| Metric       | Baseline     | Patched     |
+--------------+--------------+-------------+
| Total usage  | 12,735,820   | 13,918,442  |
| Above usage  | 11,401,520   | 1,598,210   |
| Below usage  | 20,145       | 702,395     |
+--------------+--------------+-------------+

Above/Total and Below/Total usage percentages:
+------------------------+-----------+---------+
| Metric                 | Baseline  | Patched |
+------------------------+-----------+---------+
| Above % (Above/Total)  | 89.56%    | 11.49%  |
| Below % (Below/Total)  | 0.16%     | 5.05%   |
| Total cpuidle miss (%) | 89.72%    | 16.54%  |
+------------------------+-----------+---------+

The results indicate that restricting CEDE selection to cases where
its residency matches the predicted idle time reduces mispredictions,
lowers unnecessary state transitions, and improves overall throughput.

Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
[ rjw: Changelog edits, rebase ]
Link: https://patch.msgid.link/20251006013954.17972-1-aboorvad@linux.ibm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-27 14:53:46 +01:00
Rafael J. Wysocki
db86f55bf8 cpuidle: governors: menu: Select polling state in some more cases
A throughput regression of 11% introduced by commit 779b1a1cb1 ("cpuidle:
governors: menu: Avoid selecting states with too much latency") has been
reported and it is related to the case when the menu governor checks if
selecting a proper idle state instead of a polling one makes sense.

In particular, it is questionable to do so if the exit latency of the
idle state in question exceeds the predicted idle duration, so add a
check for that, which is sufficient to make the reported regression go
away, and update the related code comment accordingly.

Fixes: 779b1a1cb1 ("cpuidle: governors: menu: Avoid selecting states with too much latency")
Closes: https://lore.kernel.org/linux-pm/004501dc43c9$ec8aa930$c59ffb90$@telus.net/
Reported-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/12786727.O9o76ZdvQC@rafael.j.wysocki
2025-10-27 14:41:27 +01:00
Rafael J. Wysocki
10fad40122 Revert "cpuidle: menu: Avoid discarding useful information"
It is reported that commit 85975daeaa ("cpuidle: menu: Avoid discarding
useful information") led to a performance regression on Intel Jasper Lake
systems because it reduced the time spent by CPUs in idle state C7 which
is correlated to the maximum frequency the CPUs can get to because of an
average running power limit [1].

Before that commit, get_typical_interval() would have returned UINT_MAX
whenever it had been unable to make a high-confidence prediction which
had led to selecting the deepest available idle state too often and
both power and performance had been inadequate as a result of that on
some systems.  However, this had not been a problem on systems with
relatively aggressive average running power limits, like the Jasper Lake
systems in question, because on those systems it was compensated by the
ability to run CPUs faster.

It was addressed by causing get_typical_interval() to return a number
based on the recent idle duration information available to it even if it
could not make a high-confidence prediction, but that clearly did not
take the possible correlation between idle power and available CPU
capacity into account.

For this reason, revert most of the changes made by commit 85975daeaa,
except for one cosmetic cleanup, and add a comment explaining the
rationale for returning UINT_MAX from get_typical_interval() when it
is unable to make a high-confidence prediction.

Fixes: 85975daeaa ("cpuidle: menu: Avoid discarding useful information")
Closes: https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/ [1]
Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3663603.iIbC2pHGDl@rafael.j.wysocki
2025-10-20 21:27:16 +02:00
Rafael J. Wysocki
7b1b796117 cpuidle: Fail cpuidle device registration if there is one already
Refuse to register a cpuidle device if the given CPU has a cpuidle
device already and print a message regarding it.

Without this, an attempt to register a new cpuidle device without
unregistering the existing one leads to the removal of the existing
cpuidle device without removing its sysfs interface.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-20 13:08:54 +02:00
Vivek Yadav
8dba0fd9ce cpuidle: sysfs: Use sysfs_emit()/sysfs_emit_at() instead of sprintf()/scnprintf()
The ->show() callbacks in sysfs should use sysfs_emit() or
sysfs_emit_at() when formatting values for user space.

These helpers are the recommended way to ensure correct buffer
handling and consistency across the kernel.

See Documentation/filesystems/sysfs.rst for details.

No functional change intended.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Vivek Yadav <vivekyadav1207731111@gmail.com>
Link: https://patch.msgid.link/20250919165657.233349-1-vivekyadav1207731111@gmail.com
[ rjw: Minor subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-20 13:06:37 +02:00
Johan Hovold
a2d100c47f cpuidle: qcom-spm: drop unnecessary initialisations
Drop the unnecessary initialisations of the platform device and driver
data pointers which are assigned on first use when registering the
cpuidle device during probe.

Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-10 12:49:52 +02:00
Johan Hovold
cdc06f9126 cpuidle: qcom-spm: fix device and OF node leaks at probe
Make sure to drop the reference to the saw device taken by
of_find_device_by_node() after retrieving its driver data during
probe().

Also drop the reference to the CPU node sooner to avoid leaking it in
case there is no saw node or device.

Fixes: 60f3692b5f ("cpuidle: qcom_spm: Detach state machine from main SPM handling")
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-10 12:49:52 +02:00