Pull MM updates from Andrew Morton:
- "maple_tree: Replace big node with maple copy" (Liam Howlett)
Mainly prepararatory work for ongoing development but it does reduce
stack usage and is an improvement.
- "mm, swap: swap table phase III: remove swap_map" (Kairui Song)
Offers memory savings by removing the static swap_map. It also yields
some CPU savings and implements several cleanups.
- "mm: memfd_luo: preserve file seals" (Pratyush Yadav)
File seal preservation to LUO's memfd code
- "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan
Chen)
Additional userspace stats reportng to zswap
- "arch, mm: consolidate empty_zero_page" (Mike Rapoport)
Some cleanups for our handling of ZERO_PAGE() and zero_pfn
- "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu
Han)
A robustness improvement and some cleanups in the kmemleak code
- "Improve khugepaged scan logic" (Vernon Yang)
Improve khugepaged scan logic and reduce CPU consumption by
prioritizing scanning tasks that access memory frequently
- "Make KHO Stateless" (Jason Miu)
Simplify Kexec Handover by transitioning KHO from an xarray-based
metadata tracking system with serialization to a radix tree data
structure that can be passed directly to the next kernel
- "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas
Ballasi and Steven Rostedt)
Enhance vmscan's tracepointing
- "mm: arch/shstk: Common shadow stack mapping helper and
VM_NOHUGEPAGE" (Catalin Marinas)
Cleanup for the shadow stack code: remove per-arch code in favour of
a generic implementation
- "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin)
Fix a WARN() which can be emitted the KHO restores a vmalloc area
- "mm: Remove stray references to pagevec" (Tal Zussman)
Several cleanups, mainly udpating references to "struct pagevec",
which became folio_batch three years ago
- "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl
Shutsemau)
Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail
pages encode their relationship to the head page
- "mm/damon/core: improve DAMOS quota efficiency for core layer
filters" (SeongJae Park)
Improve two problematic behaviors of DAMOS that makes it less
efficient when core layer filters are used
- "mm/damon: strictly respect min_nr_regions" (SeongJae Park)
Improve DAMON usability by extending the treatment of the
min_nr_regions user-settable parameter
- "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka)
The proper fix for a previously hotfixed SMP=n issue. Code
simplifications and cleanups ensued
- "mm: cleanups around unmapping / zapping" (David Hildenbrand)
A bunch of cleanups around unmapping and zapping. Mostly
simplifications, code movements, documentation and renaming of
zapping functions
- "support batched checking of the young flag for MGLRU" (Baolin Wang)
Batched checking of the young flag for MGLRU. It's part cleanups; one
benchmark shows large performance benefits for arm64
- "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner)
memcg cleanup and robustness improvements
- "Allow order zero pages in page reporting" (Yuvraj Sakshith)
Enhance free page reporting - it is presently and undesirably order-0
pages when reporting free memory.
- "mm: vma flag tweaks" (Lorenzo Stoakes)
Cleanup work following from the recent conversion of the VMA flags to
a bitmap
- "mm/damon: add optional debugging-purpose sanity checks" (SeongJae
Park)
Add some more developer-facing debug checks into DAMON core
- "mm/damon: test and document power-of-2 min_region_sz requirement"
(SeongJae Park)
An additional DAMON kunit test and makes some adjustments to the
addr_unit parameter handling
- "mm/damon/core: make passed_sample_intervals comparisons
overflow-safe" (SeongJae Park)
Fix a hard-to-hit time overflow issue in DAMON core
- "mm/damon: improve/fixup/update ratio calculation, test and
documentation" (SeongJae Park)
A batch of misc/minor improvements and fixups for DAMON
- "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David
Hildenbrand)
Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code
movement was required.
- "zram: recompression cleanups and tweaks" (Sergey Senozhatsky)
A somewhat random mix of fixups, recompression cleanups and
improvements in the zram code
- "mm/damon: support multiple goal-based quota tuning algorithms"
(SeongJae Park)
Extend DAMOS quotas goal auto-tuning to support multiple tuning
algorithms that users can select
- "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao)
Fix the khugpaged sysfs handling so we no longer spam the logs with
reams of junk when starting/stopping khugepaged
- "mm: improve map count checks" (Lorenzo Stoakes)
Provide some cleanups and slight fixes in the mremap, mmap and vma
code
- "mm/damon: support addr_unit on default monitoring targets for
modules" (SeongJae Park)
Extend the use of DAMON core's addr_unit tunable
- "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache)
Cleanups to khugepaged and is a base for Nico's planned khugepaged
mTHP support
- "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand)
Code movement and cleanups in the memhotplug and sparsemem code
- "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup
CONFIG_MIGRATION" (David Hildenbrand)
Rationalize some memhotplug Kconfig support
- "change young flag check functions to return bool" (Baolin Wang)
Cleanups to change all young flag check functions to return bool
- "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh
Law and SeongJae Park)
Fix a few potential DAMON bugs
- "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo
Stoakes)
Convert a lot of the existing use of the legacy vm_flags_t data type
to the new vma_flags_t type which replaces it. Mainly in the vma
code.
- "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes)
Expand the mmap_prepare functionality, which is intended to replace
the deprecated f_op->mmap hook which has been the source of bugs and
security issues for some time. Cleanups, documentation, extension of
mmap_prepare into filesystem drivers
- "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes)
Simplify and clean up zap_huge_pmd(). Additional cleanups around
vm_normal_folio_pmd() and the softleaf functionality are performed.
* tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
mm: fix deferred split queue races during migration
mm/khugepaged: fix issue with tracking lock
mm/huge_memory: add and use has_deposited_pgtable()
mm/huge_memory: add and use normal_or_softleaf_folio_pmd()
mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio()
mm/huge_memory: separate out the folio part of zap_huge_pmd()
mm/huge_memory: use mm instead of tlb->mm
mm/huge_memory: remove unnecessary sanity checks
mm/huge_memory: deduplicate zap deposited table call
mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
mm/huge_memory: add a common exit path to zap_huge_pmd()
mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
mm/huge: avoid big else branch in zap_huge_pmd()
mm/huge_memory: simplify vma_is_specal_huge()
mm: on remap assert that input range within the proposed VMA
mm: add mmap_action_map_kernel_pages[_full]()
uio: replace deprecated mmap hook with mmap_prepare in uio_info
drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare
mm: allow handling of stacked mmap_prepare hooks in more drivers
...
Switch the driver to using the proper sysfs_emit("%*pbl") where
appropriate.
Suggested-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Yury Norov <ynorov@nvidia.com>
This converts some of the visually simpler cases that have been split
over multiple lines. I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.
Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script. I probably had made it a bit _too_ trivial.
So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.
The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
Pull char/misc/IIO driver updates from Greg KH:
"Here is the big set of char/misc/iio and other smaller driver
subsystem changes for 7.0-rc1. Lots of little things in here,
including:
- Loads of iio driver changes and updates and additions
- gpib driver updates
- interconnect driver updates
- i3c driver updates
- hwtracing (coresight and intel) driver updates
- deletion of the obsolete mwave driver
- binder driver updates (rust and c versions)
- mhi driver updates (causing a merge conflict, see below)
- mei driver updates
- fsi driver updates
- eeprom driver updates
- lots of other small char and misc driver updates and cleanups
All of these have been in linux-next for a while, with no reported
issues"
* tag 'char-misc-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (297 commits)
mux: mmio: fix regmap leak on probe failure
rust_binder: return p from rust_binder_transaction_target_node()
drivers: android: binder: Update ARef imports from sync::aref
rust_binder: fix needless borrow in context.rs
iio: magn: mmc5633: Fix Kconfig for combination of I3C as module and driver builtin
iio: sca3000: Fix a resource leak in sca3000_probe()
iio: proximity: rfd77402: Add interrupt handling support
iio: proximity: rfd77402: Document device private data structure
iio: proximity: rfd77402: Use devm-managed mutex initialization
iio: proximity: rfd77402: Use kernel helper for result polling
iio: proximity: rfd77402: Align polling timeout with datasheet
iio: cros_ec: Allow enabling/disabling calibration mode
iio: frequency: ad9523: correct kernel-doc bad line warning
iio: buffer: buffer_impl.h: fix kernel-doc warnings
iio: gyro: itg3200: Fix unchecked return value in read_raw
MAINTAINERS: add entry for ADE9000 driver
iio: accel: sca3000: remove unused last_timestamp field
iio: accel: adxl372: remove unused int2_bitmask field
iio: adc: ad7766: Use iio_trigger_generic_data_rdy_poll()
iio: magnetometer: Remove IRQF_ONESHOT
...
Merge updates related to runtime PM for 6.20-rc1/7.0-rc1:
- Make several drivers discard pm_runtime_put() return value in
preparation for converting that function to a void one (Rafael
Wysocki)
* pm-runtime:
drm: Discard pm_runtime_put() return value
genirq/chip: Change irq_chip_pm_put() return type to void
scsi: ufs: core: Discard pm_runtime_put() return values
platform/chrome: cros_hps_i2c: Discard pm_runtime_put() return value
coresight: Discard pm_runtime_put() return values
hwspinlock: omap: Discard pm_runtime_put() return value
watchdog: rzv2h_wdt: Discard pm_runtime_put() return value
watchdog: rz: Discard pm_runtime_put() return values
media: ccs: Discard pm_runtime_put() return value
drm/imagination: Discard pm_runtime_put() return value
USB: core: Discard pm_runtime_put() return value
When trying to run perf and sysfs mode simultaneously, the WARN_ON()
in tmc_etr_enable_hw() is triggered sometimes:
WARNING: CPU: 42 PID: 3911571 at drivers/hwtracing/coresight/coresight-tmc-etr.c:1060 tmc_etr_enable_hw+0xc0/0xd8 [coresight_tmc]
[..snip..]
Call trace:
tmc_etr_enable_hw+0xc0/0xd8 [coresight_tmc] (P)
tmc_enable_etr_sink+0x11c/0x250 [coresight_tmc] (L)
tmc_enable_etr_sink+0x11c/0x250 [coresight_tmc]
coresight_enable_path+0x1c8/0x218 [coresight]
coresight_enable_sysfs+0xa4/0x228 [coresight]
enable_source_store+0x58/0xa8 [coresight]
dev_attr_store+0x20/0x40
sysfs_kf_write+0x4c/0x68
kernfs_fop_write_iter+0x120/0x1b8
vfs_write+0x2c8/0x388
ksys_write+0x74/0x108
__arm64_sys_write+0x24/0x38
el0_svc_common.constprop.0+0x64/0x148
do_el0_svc+0x24/0x38
el0_svc+0x3c/0x130
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x1ac/0x1b0
---[ end trace 0000000000000000 ]---
Since the enablement of sysfs mode is separeted into two critical regions,
one for sysfs buffer allocation and another for hardware enablement, it's
possible to race with the perf mode. Fix this by double check whether
the perf mode's been used before enabling the hardware in sysfs mode.
mode:
[sysfs mode] [perf mode]
tmc_etr_get_sysfs_buffer()
spin_lock(&drvdata->spinlock)
[sysfs buffer allocation]
spin_unlock(&drvdata->spinlock)
spin_lock(&drvdata->spinlock)
tmc_etr_enable_hw()
drvdata->etr_buf = etr_perf->etr_buf
spin_unlock(&drvdata->spinlock)
spin_lock(&drvdata->spinlock)
tmc_etr_enable_hw()
WARN_ON(drvdata->etr_buf) // WARN sicne etr_buf initialized at
the perf side
spin_unlock(&drvdata->spinlock)
With this fix, we retain the check for CS_MODE_PERF in get_etr_sysfs_buf.
This ensures we verify whether the perf mode's already running before we
actually allocate the buffer. Then we can save the time of
allocating/freeing the sysfs buffer if race with the perf mode.
Fixes: 296b01fd10 ("coresight: Refactor out buffer allocation function for ETR")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Junhao He <hejunhao3@h-partners.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20260121101543.2017014-3-wangyushan12@huawei.com
This patch adds platform driver support for the CoreSight Interconnect
TNOC, Interconnect TNOC is a CoreSight link that forwards trace data
from a subsystem to the Aggregator TNOC. Compared to Aggregator TNOC,
it does not have aggregation and ATID functionality.
Key changes:
- Add platform driver `coresight-itnoc` with device tree match support.
- Refactor probe logic into a common `_tnoc_probe()` function.
- Conditionally initialize ATID only for AMBA-based TNOC blocks.
Signed-off-by: Yuanfang Zhang <yuanfang.zhang@oss.qualcomm.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20251203-itnoc-v5-2-5b97c63f2268@oss.qualcomm.com
When changes [1] and [2] have been applied to the driver etm4x, the
same modifications have been also collapsed in [3] and applied in
one shot to the driver etm3x.
While doing this, the driver etm3x has not been aligned to etm4x on
the use of non cpuslocked version of cpuhp callback setup APIs.
The current code triggers two run-time warnings when the kernel is
compiled with CONFIG_PROVE_LOCKING=y.
Use non cpuslocked version of cpuhp callback setup APIs in driver
etm3x, aligning it to the driver etm4x.
[1] commit 2d1a8bfb61 ("coresight: etm4x: Fix etm4_count race by
moving cpuhp callbacks to init")
[2] commit 22a550a306 ("coresight: etm4x: Allow etm4x to be built
as a module")
[3] commit 97fe626ce6 ("coresight: etm3x: Allow etm3x to be built
as a module")
Fixes: 97fe626ce6 ("coresight: etm3x: Allow etm3x to be built as a module")
Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20260108152427.357379-1-antonio.borneo@foss.st.com
Failing a debugfs write due to pm_runtime_put() returning a negative
value is not particularly useful.
Returning an error code from pm_runtime_put() merely means that it has
not queued up a work item to check whether or not the device can be
suspended and there are many perfectly valid situations in which that
can happen, like after writing "on" to the devices' runtime PM "control"
attribute in sysfs for one example. It also happens when the kernel
has been configured with CONFIG_PM unset, in which case
debug_disable_func() in the coresight driver will always return an
error.
For this reason, update debug_disable_func() to simply discard the
return value of pm_runtime_put(), change its return type to void, and
propagate that change to debug_func_knob_write().
This will facilitate a planned change of the pm_runtime_put() return
type to void in the future.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://patch.msgid.link/2058657.yKVeVyVuyW@rafael.j.wysocki
'timestamp' is currently 1 bit wide for on/off. To enable setting
different intervals, extend it to 4 bits wide. Keep the old bit position
for backward compatibility ("deprecated_timestamp") but don't publish in
the format/ folder. It will be removed from the documentation and can be
removed completely after enough time has passed.
ETM3x doesn't support different intervals, so validate that the value is
either 0 or 1.
Tools that read the bit positions from the format/ folder will continue
to work as before, setting either 0 or 1 for off/on. Tools that
incorrectly didn't do this and set the ETM_OPT_TS bit directly will also
continue to work because that old bit is still checked.
This avoids adding a second timestamp attribute for setting the
interval. This would be awkward to use because tools would have to be
updated to ensure that the timestamps are always enabled when an
interval is set, and the driver would have to validate that both options
are provided together. All this does is implement the semantics of a
single enum but spread over multiple fields.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Tested-by: Jie Gan <jie.gan@oss.qualcomm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20251128-james-cs-syncfreq-v8-12-4d319764cc58@linaro.org
Timestamps are currently emitted at the maximum rate possible, which is
much too frequent for most use cases. In the next commit, the timestamp
field will be widened to take a value, so set the interval using the
value now. Granular control is not required, so save space in the config
by interpreting it as 2 ^ timestamp. And then 4 bits (0 - 15) will be
enough to set the interval to be larger than the existing SYNC timestamp
interval.
No sysfs mode support is needed for this attribute because counter
generated timestamps are only configured for Perf mode.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20251128-james-cs-syncfreq-v8-11-4d319764cc58@linaro.org
Currently we're programming attr->config directly into ETMCR after some
validation. This obscures which fields are being used, and also makes it
impossible to move fields around or use other configN fields in the
future.
Improve it by only reading the fields that are valid and then setting
the appropriate ETMCR bits based on each one.
The ETMCR_CTXID_SIZE part can be removed as it was never a valid option
because it's not in ETM3X_SUPPORTED_OPTIONS.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20251128-james-cs-syncfreq-v8-6-4d319764cc58@linaro.org
Remove some of the magic numbers and try to clarify some of the
documentation so it's clearer how this sets up the timestamp interval.
Return errors directly instead of jumping to out and returning ret,
nothing needs to be cleaned up at the end and it only obscures the flow
and return value.
Add utilities for programming resource selectors that do compile time
checks for constants or WARN_ONs for non-constant values. FIELD_PREP
includes compile time checks so we only need to add an additional
BUILD_BUG_ON for resource == 0 in pair mode.
Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20251128-james-cs-syncfreq-v8-3-4d319764cc58@linaro.org
Pull char/misc/IIO driver updates from Greg KH:
"Here is the big set of char/misc/iio driver updates for 6.19-rc1. Lots
of stuff in here including:
- lots of IIO driver updates, cleanups, and additions
- large interconnect driver changes as they get converted over to a
dynamic system of ids
- coresight driver updates
- mwave driver updates
- binder driver updates and changes
- comedi driver fixes now that the fuzzers are being set loose on
them
- nvmem driver updates
- new uio driver addition
- lots of other small char/misc driver updates, full details in the
shortlog
All of these have been in linux-next for a while now"
* tag 'char-misc-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (304 commits)
char: applicom: fix NULL pointer dereference in ac_ioctl
hangcheck-timer: fix coding style spacing
hangcheck-timer: Replace %Ld with %lld
hangcheck-timer: replace printk(KERN_CRIT) with pr_crit
uio: Add SVA support for PCI devices via uio_pci_generic_sva.c
dt-bindings: slimbus: fix warning from example
intel_th: Fix error handling in intel_th_output_open
misc: rp1: Fix an error handling path in rp1_probe()
char: xillybus: add WQ_UNBOUND to alloc_workqueue users
misc: bh1770glc: use pm_runtime_resume_and_get() in power_state_store
misc: cb710: Fix a NULL vs IS_ERR() check in probe()
mux: mmio: Add suspend and resume support
virt: acrn: split acrn_mmio_dev_res out of acrn_mmiodev
greybus: gb-beagleplay: Fix timeout handling in bootloader functions
greybus: add WQ_PERCPU to alloc_workqueue users
char/mwave: drop typedefs
char/mwave: drop printk wrapper
char/mwave: remove printk tracing
char/mwave: remove unneeded fops
char/mwave: remove MWAVE_FUTZ_WITH_OTHER_DEVICES ifdeffery
...
intel_th_output_open() calls bus_find_device_by_devt() which
internally increments the device reference count via get_device(), but
this reference is not properly released in several error paths. When
device driver is unavailable, file operations cannot be obtained, or
the driver's open method fails, the function returns without calling
put_device(), leading to a permanent device reference count leak. This
prevents the device from being properly released and could cause
resource exhaustion over time.
Found by code review.
Cc: stable <stable@kernel.org>
Fixes: 39f4034693 ("intel_th: Add driver infrastructure for Intel(R) Trace Hub devices")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Link: https://patch.msgid.link/20251112091723.35963-1-make24@iscas.ac.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suzuki writes:
coresight: Updates for Linux v6.19
The changes for Linux v6.19 include :
- Support for static TPDM
- Fixes to TMC-ETR with CATU where buffer wasn't available to CATU in perf mode
- Clean ups to the component operations to accept coresight_path
- Fixes to the ETM4x/ETM3x driver
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
* tag 'coresight-next-v6.19' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/coresight/linux:
coresight: etm4x: Remove the state_needs_restore flag
coresight: etm4x: Remove the redundant DSB
coresight: etm4x: Properly control filter in CPU idle with FEAT_TRF
coresight: etm4x: Add context synchronization before enabling trace
coresight: etm4x: Correct polling IDLE bit
coresight: etm3x: Always set tracer's device mode on target CPU
coresight: etm4x: Always set tracer's device mode on target CPU
coresight: Change device mode to atomic type
coresight: change the sink_ops to accept coresight_path
coresight: change helper_ops to accept coresight_path
coresight: tmc: add the handle of the event to the path
coresight: tpdm: remove redundant check for drvdata
coresight: tpdm: add static tpdm support
dt-bindings: arm: document the static TPDM compatible
coresight: ETR: Fix ETR buffer use-after-free issue
When the restore flow is invoked, it means no error occurred during the
save phase. Otherwise, if any errors happened while saving the context,
the function would return an error and abort the suspend sequence.
Therefore, the state_needs_restore flag is unnecessary. The save and
restore functions are changed to check two conditions:
1) The global flag pm_save_enable is SELF_HOSTED mode;
2) The device is in active mode (non DISABLED).
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20251111-arm_coresight_power_management_fix-v6-8-f55553b6c8b3@arm.com
As recommended in section 4.3.7 "Synchronization when using the
memory-mapped interface" of ARM IHI0064H.b:
When using the memory-mapped interface to program the trace unit, the
trace analyzer must ensure that writes have completed, to ensure that
the trace unit is fully programmed and either enabled or disabled.
To ensure writes have completed, the trace analyzer can do ...
If the memory marked is as Device-nGnRE or stronger, read back the
value of any register in the trace unit. This relies on peripheral
coherence order defined in the Arm architecture.
Polling TRCSTATR ensures the previous write has completed. Therefore,
removes the redundant DSB barrier in the enabling flow.
Update the comment in the disable flow for consistency.
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20251111-arm_coresight_power_management_fix-v6-7-f55553b6c8b3@arm.com
If a CPU supports FEAT_TRF, as described in the section K5.5 "Context
switching", Arm ARM (ARM DDI 0487 L.a), it defines a flow to prohibit
program-flow trace, execute a TSB CSYNC instruction for flushing,
followed by clearing TRCPRGCTLR.EN bit.
To restore the state, the reverse sequence is required.
This differs from the procedure described in the section 3.4.1 "The
procedure when powering down the PE" of ARM IHI0064H.b, which involves
the OS Lock to prevent external debugger accesses and implicitly
disables trace.
To be compatible with different ETM versions, explicitly control trace
unit using etm4_disable_trace_unit() and etm4_enable_trace_unit()
during CPU idle to comply with FEAT_TRF.
As a result, the save states for TRFCR_ELx and trcprgctlr are redundant,
remove them.
Fixes: f188b5e76a ("coresight: etm4x: Save/restore state across CPU low power states")
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Tested-by: James Clark <james.clark@linaro.org>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20251111-arm_coresight_power_management_fix-v6-6-f55553b6c8b3@arm.com
According to the software usage PKLXF in Arm ARM (ARM DDI 0487 L.a), a
Context synchronization event is required before enabling the trace
unit.
An ISB is added to meet this requirement, particularly for guarding the
operations in the flow:
etm4x_allow_trace()
`> kvm_tracing_set_el1_configuration()
`> write_sysreg_s(trfcr_while_in_guest, SYS_TRFCR_EL12)
Improved the barrier comments to provide more accurate information.
Fixes: 1ab3bb9df5 ("coresight: etm4x: Add necessary synchronization for sysreg access")
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Reviewed-by: Yeoreun Yun <yeoreum.yun@arm.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20251111-arm_coresight_power_management_fix-v6-5-f55553b6c8b3@arm.com
When enabling a tracer via SysFS interface, the device mode may be set
by any CPU - not necessarily the target CPU. This can lead to race
condition in SMP, and may result in incorrect mode values being read.
Consider the following example, where CPU0 attempts to enable the tracer
on CPU1 (the target CPU):
CPU0 CPU1
etm4_enable()
` coresight_take_mode(SYSFS)
` etm4_enable_sysfs()
` smp_call_function_single() ----> etm4_enable_hw_smp_call()
/
/ CPU idle:
/ etm4_cpu_save()
/ ` coresight_get_mode()
Failed to enable h/w / ^^^
` coresight_set_mode(DISABLED) <-' Read the intermediate SYSFS mode
In this case, CPU0 initiates the operation by taking the SYSFS mode to
avoid conflicts with the Perf mode. It then sends an IPI to CPU1 to
configure the tracer registers. If any error occurs during this process,
CPU0 rolls back by setting the mode to DISABLED.
However, if CPU1 enters an idle state during this time, it might read
the intermediate SYSFS mode. As a result, the CPU PM flow could wrongly
save and restore tracer context that is actually disabled.
To resolve the issue, this commit moves the device mode setting logic on
the target CPU. This ensures that the device mode is only modified by
the target CPU, eliminating race condition between mode writes and reads
across CPUs.
An additional change introduces the etm4_disable_sysfs_smp_call()
function for SMP calls, which disables the tracer and explicitly set the
mode to DISABLED during SysFS operations. Rename
etm4_disable_hw_smp_call() to etm4_disable_sysfs_smp_call() for naming
consistency.
The flow is updated with this change:
CPU0 CPU1
etm4_enable()
` etm4_enable_sysfs()
` smp_call_function_single() ----> etm4_enable_hw_smp_call()
` coresight_take_mode(SYSFS)
Failed, set back to DISABLED
` coresight_set_mode(DISABLED)
CPU idle:
etm4_cpu_save()
` coresight_get_mode()
^^^
Read out the DISABLED mode
Fixes: c38a9ec2b2 ("coresight: etm4x: moving etm_drvdata::enable to atomic field")
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: mike Leach <mike.leach@linaro.org>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20251111-arm_coresight_power_management_fix-v6-2-f55553b6c8b3@arm.com
The handle is essential for retrieving the AUX_EVENT of each CPU and is
required in perf mode. It has been added to the coresight_path so that
dependent devices can access it from the path when needed.
The existing bug can be reproduced with:
perf record -e cs_etm//k -C 0-9 dd if=/dev/zero of=/dev/null
Showing an oops as follows:
Unable to handle kernel paging request at virtual address 000f6e84934ed19e
Call trace:
tmc_etr_get_buffer+0x30/0x80 [coresight_tmc] (P)
catu_enable_hw+0xbc/0x3d0 [coresight_catu]
catu_enable+0x70/0xe0 [coresight_catu]
coresight_enable_path+0xb0/0x258 [coresight]
Fixes: 080ee83cc3 ("Coresight: Change functions to accept the coresight_path")
Signed-off-by: Carl Worth <carl@os.amperecomputing.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Co-developed-by: Jie Gan <jie.gan@oss.qualcomm.com>
Signed-off-by: Jie Gan <jie.gan@oss.qualcomm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20250925-fix_helper_data-v2-1-edd8a07c1646@oss.qualcomm.com