Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Support HW queue leasing, allowing containers to be granted access
to HW queues for zero-copy operations and AF_XDP
- Number of code moves to help the compiler with inlining. Avoid
output arguments for returning drop reason where possible
- Rework drop handling within qdiscs to include more metadata about
the reason and dropping qdisc in the tracepoints
- Remove the rtnl_lock use from IP Multicast Routing
- Pack size information into the Rx Flow Steering table pointer
itself. This allows making the table itself a flat array of u32s,
thus making the table allocation size a power of two
- Report TCP delayed ack timer information via socket diag
- Add ip_local_port_step_width sysctl to allow distributing the
randomly selected ports more evenly throughout the allowed space
- Add support for per-route tunsrc in IPv6 segment routing
- Start work of switching sockopt handling to iov_iter
- Improve dynamic recvbuf sizing in MPTCP, limit burstiness and avoid
buffer size drifting up
- Support MSG_EOR in MPTCP
- Add stp_mode attribute to the bridge driver for STP mode selection.
This addresses concerns about call_usermodehelper() usage
- Remove UDP-Lite support (as announced in 2023)
- Remove support for building IPv6 as a module. Remove the now
unnecessary function calling indirection
Cross-tree stuff:
- Move Michael MIC code from generic crypto into wireless, it's
considered insecure but some WiFi networks still need it
Netfilter:
- Switch nft_fib_ipv6 module to no longer need temporary dst_entry
object allocations by using fib6_lookup() + RCU.
Florian W reports this gets us ~13% higher packet rate
- Convert IPVS's global __ip_vs_mutex to per-net service_mutex and
switch the service tables to be per-net. Convert some code that
walks the service lists to use RCU instead of the service_mutex
- Add more opinionated input validation to lower security exposure
- Make IPVS hash tables to be per-netns and resizable
Wireless:
- Finished assoc frame encryption/EPPKE/802.1X-over-auth
- Radar detection improvements
- Add 6 GHz incumbent signal detection APIs
- Multi-link support for FILS, probe response templates and client
probing
- New APIs and mac80211 support for NAN (Neighbor Aware Networking,
aka Wi-Fi Aware) so less work must be in firmware
Driver API:
- Add numerical ID for devlink instances (to avoid having to create
fake bus/device pairs just to have an ID). Support shared devlink
instances which span multiple PFs
- Add standard counters for reporting pause storm events (implement
in mlx5 and fbnic)
- Add configuration API for completion writeback buffering (implement
in mana)
- Support driver-initiated change of RSS context sizes
- Support DPLL monitoring input frequency (implement in zl3073x)
- Support per-port resources in devlink (implement in mlx5)
Misc:
- Expand the YAML spec for Netfilter
Drivers
- Software:
- macvlan: support multicast rx for bridge ports with shared
source MAC address
- team: decouple receive and transmit enablement for IEEE 802.3ad
LACP "independent control"
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- support high order pages in zero-copy mode (for payload
coalescing)
- support multiple packets in a page (for systems with 64kB
pages)
- Broadcom 25-400GE (bnxt):
- implement XDP RSS hash metadata extraction
- add software fallback for UDP GSO, lowering the IOMMU cost
- Broadcom 800GE (bnge):
- add link status and configuration handling
- add various HW and SW statistics
- Marvell/Cavium:
- NPC HW block support for cn20k
- Huawei (hinic3):
- add mailbox / control queue
- add rx VLAN offload
- add driver info and link management
- Ethernet NICs:
- Marvell/Aquantia:
- support reading SFP module info on some AQC100 cards
- Realtek PCI (r8169):
- add support for RTL8125cp
- Realtek USB (r8152):
- support for the RTL8157 5Gbit chip
- add 2500baseT EEE status/configuration support
- Ethernet NICs embedded and off-the-shelf IP:
- Synopsys (stmmac):
- cleanup and reorganize SerDes handling and PCS support
- cleanup descriptor handling and per-platform data
- cleanup and consolidate MDIO defines and handling
- shrink driver memory use for internal structures
- improve Tx IRQ coalescing
- improve TCP segmentation handling
- add support for Spacemit K3
- Cadence (macb):
- support PHYs that have inband autoneg disabled with GEM
- support IEEE 802.3az EEE
- rework usrio capabilities and handling
- AMD (xgbe):
- improve power management for S0i3
- improve TX resilience for link-down handling
- Virtual:
- Google cloud vNIC:
- support larger ring sizes in DQO-QPL mode
- improve HW-GRO handling
- support UDP GSO for DQO format
- PCIe NTB:
- support queue count configuration
- Ethernet PHYs:
- automatically disable PHY autonomous EEE if MAC is in charge
- Broadcom:
- add BCM84891/BCM84892 support
- Micrel:
- support for LAN9645X internal PHY
- Realtek:
- add RTL8224 pair order support
- support PHY LEDs on RTL8211F-VD
- support spread spectrum clocking (SSC)
- Maxlinear:
- add PHY-level statistics via ethtool
- Ethernet switches:
- Maxlinear (mxl862xx):
- support for bridge offloading
- support for VLANs
- support driver statistics
- Bluetooth:
- large number of fixes and new device IDs
- Mediatek:
- support MT6639 (MT7927)
- support MT7902 SDIO
- WiFi:
- Intel (iwlwifi):
- UNII-9 and continuing UHR work
- MediaTek (mt76):
- mt7996/mt7925 MLO fixes/improvements
- mt7996 NPU support (HW eth/wifi traffic offload)
- Qualcomm (ath12k):
- monitor mode support on IPQ5332
- basic hwmon temperature reporting
- support IPQ5424
- Realtek:
- add USB RX aggregation to improve performance
- add USB TX flow control by tracking in-flight URBs
- Cellular:
- IPA v5.2 support"
* tag 'net-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1561 commits)
net: pse-pd: fix kernel-doc function name for pse_control_find_by_id()
wireguard: device: use exit_rtnl callback instead of manual rtnl_lock in pre_exit
wireguard: allowedips: remove redundant space
tools: ynl: add sample for wireguard
wireguard: allowedips: Use kfree_rcu() instead of call_rcu()
MAINTAINERS: Add netkit selftest files
selftests/net: Add additional test coverage in nk_qlease
selftests/net: Split netdevsim tests from HW tests in nk_qlease
tools/ynl: Make YnlFamily closeable as a context manager
net: airoha: Add missing PPE configurations in airoha_ppe_hw_init()
net: airoha: Fix VIP configuration for AN7583 SoC
net: caif: clear client service pointer on teardown
net: strparser: fix skb_head leak in strp_abort_strp()
net: usb: cdc-phonet: fix skb frags[] overflow in rx_complete()
selftests/bpf: add test for xdp_master_redirect with bond not up
net, bpf: fix null-ptr-deref in xdp_master_redirect() for down master
net: airoha: Remove PCE_MC_EN_MASK bit in REG_FE_PCE_CFG configuration
sctp: disable BH before calling udp_tunnel_xmit_skb()
sctp: fix missing encap_port propagation for GSO fragments
net: airoha: Rely on net_device pointer in ETS callbacks
...
Pull powerpc updates from Madhavan Srinivasan:
- powerpc support for huge pfnmaps
- Cleanups to use masked user access
- Rework pnv_ioda_pick_m64_pe() to use better bitmap API
- Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC
- Backup region offset update to eflcorehdr
- Fixes for wii/ps3 platform
- Implement JIT support for private stack in powerpc
- Implement JIT support for fsession in powerpc64 trampoline
- Add support for instruction array and indirect jump in powerpc
- Misc selftest fixes and cleanups
Thanks to Abhishek Dubey, Aditya Gupta, Alex Williamson, Amit Machhiwal,
Andrew Donnellan, Bartosz Golaszewski, Cédric Le Goater, Chen Ni,
Christophe Leroy (CS GROUP), Hari Bathini, J. Neuschäfer, Mukesh Kumar
Chaurasiya (IBM), Nam Cao, Nilay Shroff, Pavithra Prakash, Randy Dunlap,
Ritesh Harjani (IBM), Shrikanth Hegde, Sourabh Jain, Vaibhav Jain,
Venkat Rao Bagalkote, and Yury Norov (NVIDIA)
* tag 'powerpc-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (47 commits)
mailmap: Add entry for Andrew Donnellan
powerpc32/bpf: fix loading fsession func metadata using PPC_LI32
selftest/bpf: Enable gotox tests for powerpc64
powerpc64/bpf: Add support for indirect jump
selftest/bpf: Enable instruction array test for powerpc
powerpc/bpf: Add support for instruction array
powerpc32/bpf: Add fsession support
powerpc64/bpf: Implement fsession support
selftests/bpf: Enable private stack tests for powerpc64
powerpc64/bpf: Implement JIT support for private stack
powerpc: pci-ioda: Optimize pnv_ioda_pick_m64_pe()
powerpc: pci-ioda: use bitmap_alloc() in pnv_ioda_pick_m64_pe()
powerpc/net: Inline checksum wrappers and convert to scoped user access
powerpc/sstep: Convert to scoped user access
powerpc/align: Convert emulate_spe() to scoped user access
powerpc/ptrace: Convert gpr32_set_common_user() to scoped user access
powerpc/futex: Use masked user access
powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC
cpuidle: powerpc: avoid double clear when breaking snooze
powerpc/ps3: spu.c: fix enum and Return kernel-doc warnings
...
Pull vdso updates from Thomas Gleixner:
- Make the handling of compat functions consistent and more robust
- Rework the underlying data store so that it is dynamically allocated,
which allows the conversion of the last holdout SPARC64 to the
generic VDSO implementation
- Rework the SPARC64 VDSO to utilize the generic implementation
- Mop up the left overs of the non-generic VDSO support in the core
code
- Expand the VDSO selftest and make them more robust
- Allow time namespaces to be enabled independently of the generic VDSO
support, which was not possible before due to SPARC64 not using it
- Various cleanups and improvements in the related code
* tag 'timers-vdso-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
timens: Use task_lock guard in timens_get*()
timens: Use mutex guard in proc_timens_set_offset()
timens: Simplify some calls to put_time_ns()
timens: Add a __free() wrapper for put_time_ns()
timens: Remove dependency on the vDSO
vdso/timens: Move functions to new file
selftests: vDSO: vdso_test_correctness: Add a test for time()
selftests: vDSO: vdso_test_correctness: Use facilities from parse_vdso.c
selftests: vDSO: vdso_test_correctness: Handle different tv_usec types
selftests: vDSO: vdso_test_correctness: Drop SYS_getcpu fallbacks
selftests: vDSO: vdso_test_gettimeofday: Remove nolibc checks
Revert "selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers"
random: vDSO: Remove ifdeffery
random: vDSO: Trim vDSO includes
vdso/datapage: Trim down unnecessary includes
vdso/datapage: Remove inclusion of gettimeofday.h
vdso/helpers: Explicitly include vdso/processor.h
vdso/gettimeofday: Add explicit includes
random: vDSO: Add explicit includes
MIPS: vdso: Explicitly include asm/vdso/vdso.h
...
Pull Kbuild/Kconfig updates from Nicolas Schier:
"Kbuild:
- reject unexpected values for LLVM=
- uapi: remove usage of toolchain headers
- switch from '-fms-extensions' to '-fms-anonymous-structs' when
available (currently: clang >= 23.0.0)
- reduce the number of compiler-generated suffixes for clang thin-lto
build
- reduce output spam ("GEN Makefile") when building out of tree
- improve portability for testing headers
- also test UAPI headers against C++ compilers
- drop build ID architecture allow-list in vdso_install
- only run checksyscalls when necessary
- update the debug information notes in reproducible-builds.rst
- expand inlining hints with -fdiagnostics-show-inlining-chain
Kconfig:
- forbid multiple entries with the same symbol in a choice
- error out on duplicated kconfig inclusion"
* tag 'kbuild-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: (35 commits)
kbuild: expand inlining hints with -fdiagnostics-show-inlining-chain
kconfig: forbid multiple entries with the same symbol in a choice
Documentation: kbuild: Update the debug information notes in reproducible-builds.rst
checksyscalls: move instance functionality into generic code
checksyscalls: only run when necessary
checksyscalls: fail on all intermediate errors
checksyscalls: move path to reference table to a variable
kbuild: vdso_install: drop build ID architecture allow-list
kbuild: vdso_install: gracefully handle images without build ID
kbuild: vdso_install: hide readelf warnings
kbuild: vdso_install: split out the readelf invocation
kbuild: uapi: also test UAPI headers against C++ compilers
kbuild: uapi: provide a C++ compatible dummy definition of NULL
kbuild: uapi: handle UML in architecture-specific exclusion lists
kbuild: uapi: move all include path flags together
kbuild: uapi: move some compiler arguments out of the command definition
check-uapi: use dummy libc includes
check-uapi: honor ${CROSS_COMPILE} setting
check-uapi: link into shared objects
kbuild: reduce output spam when building out of tree
...
Pull bitmap updates from Yury Norov:
- new API: bitmap_weight_from() and bitmap_weighted_xor() (Yury)
- drop unused __find_nth_andnot_bit() (Yury)
- new tests and test improvements (Andy, Akinobu, Yury)
- fixes for count_zeroes API (Yury)
- cleanup bitmap_print_to_pagebuf() mess (Yury)
- documentation updates (Andy, Kai, Kit).
* tag 'bitmap-for-v7.1' of https://github.com/norov/linux: (24 commits)
bitops: Update kernel-doc for sign_extendXX()
powerpc/xive: simplify xive_spapr_debug_show()
thermal: intel: switch cpumask_get() to using cpumask_print_to_pagebuf()
coresight: don't use bitmap_print_to_pagebuf()
lib/prime_numbers: drop temporary buffer in dump_primes()
drm/xe: switch xe_pagefault_queue_init() to using bitmap_weighted_or()
ice: use bitmap_empty() in ice_vf_has_no_qs_ena
ice: use bitmap_weighted_xor() in ice_find_free_recp_res_idx()
bitmap: introduce bitmap_weighted_xor()
bitmap: add test_zero_nbits()
bitmap: exclude nbits == 0 cases from bitmap test
bitmap: test bitmap_weight() for more
asm-generic/bitops: Fix a comment typo in instrumented-atomic.h
bitops: fix kernel-doc parameter name for parity8()
lib: count_zeros: unify count_{leading,trailing}_zeros()
lib: count_zeros: fix 32/64-bit inconsistency in count_trailing_zeros()
lib: crypto: fix comments for count_leading_zeros()
x86/topology: use bitmap_weight_from()
bitmap: add bitmap_weight_from()
lib/find_bit_benchmark: avoid clearing randomly filled bitmap in test_find_first_bit()
...
Pull gpio updates from Bartosz Golaszewski:
"For this merge window we have two new drivers: support for
GPIO-signalled ACPI events on Intel platforms and a generic
GPIO-over-pinctrl driver using the ARM SCMI protocol for
controlling pins.
Several things have been reworked in GPIO core: we unduplicated GPIO
hog handling, reduced the number of SRCU locks and dereferences,
improved support for software-node-based lookup and removed more
legacy code after converting remaining users to modern alternatives.
There's also a number of driver reworks and refactoring, documentation
updates, some bug-fixes and new tests.
GPIO core:
- defer probe on software node lookups when the remote software node
exists but has not been registered as a firmware node yet
- unify GPIO hog handling by moving code duplicated in OF and ACPI
modules into GPIO core and allow setting up hogs with software
nodes
- allow matching GPIO controllers by secondary firmware node if
matching by primary does not succeed
- demote deferral warnings to debug level as they are quite normal
when using software nodes which don't support fw_devlink yet
- disable the legacy GPIO character device uAPI v1 supprt in Kconfig
by default
- rework several core functions in preparation for the upcoming
Revocable helper library for protecting resources against sudden
removal, this reduces the number of SRCU dereferences in GPIO core
- simplify file descriptor logic in GPIO character device code by
using FD_PREPARE()
- introduce a header defining symbols used by both GPIO consumers and
providers to avoid having to include provider-specific headers from
drivers which only consume GPIOs
- replace snprintf() with strscpy() where formatting is not required
New drivers:
- add the gpio-by-pinctrl generic driver using the ARM SCMI protocol
to control GPIOs (along with SCMI changes pulled from the pinctrl
tree)
- add a driver providing support for handling of platform events via
GPIO-signalled ACPI events (used on Intel Nova Lake and later
platforms)
Driver changes:
- extend the gpio-kempld driver with support for more recent models,
interrupts and setting/getting multiple values at once
- improve interrupt handling in gpio-brcmstb
- add support for multi-SoC systems in gpio-tegra186
- make sure we return correct values from the .get() callbacks in
several GPIO drivers by normalizing any values other than 0, 1 or
negative error numbers
- use flexible arrays in several drivers to reduce the number of
required memory allocations
- simplify synchronous waiting for virtual drivers to probe and
remove the dedicated, a bit overengineered helper library
dev-sync-probe
- remove unneeded Kconfig dependencies on OF_GPIO in several drivers
and subsystems
- convert the two remaining users of of_get_named_gpio() to using
GPIO descriptors and remove the (no longer used) function along
with the header that declares it
- add missing includes in gpio-mmio
- shrink and simplify code in gpio-max732x by using guard(mutex)
- remove duplicated code handling the 'ngpios' property from
gpio-ts4800, it's already handled in GPIO core
- use correct variable type in gpio-aspeed
- add support for a new model in gpio-realtek-otto
- allow to specify the active-low setting of simulated hogs over the
configfs interface (in addition to existing devicetree support) in
gpio-sim
Bug fixes:
- clear the OF_POPULATED flag on hog nodes in GPIO chip remove path
on OF systems
- fix resource leaks in error path in gpiochip_add_data_with_key()
- drop redundant device reference in gpio-mpsse
Tests:
- add selftests for use-after-free cases in GPIO character device
code
DT bindings:
- add a DT binding document for SCMI based, gpio-over-pinctrl devices
- fix interrupt description in microchip,mpfs-gpio
- add new compatible for gpio-realtek-otto
- describe the resets of the mpfs-gpio controller
- fix maintainer's email in gpio-delay bindings
- remove the binding document for cavium,thunder-8890 as the
corresponding device is bound over PCI and not firmware nodes
Documentation:
- update the recommended way of converting legacy boards to using
software nodes for GPIO description
- describe GPIO line value semantics
- misc updates to kerneldocs
Misc:
- convert OMAP1 ams-delta board to using GPIO hogs described with
software nodes"
* tag 'gpio-updates-for-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (79 commits)
gpio: swnode: defer probe on references to unregistered software nodes
dt-bindings: gpio: cavium,thunder-8890: Remove DT binding
Documentation: gpio: update the preferred method for using software node lookup
gpio: gpio-by-pinctrl: s/used to do/is used to do/
gpio: aspeed: fix unsigned long int declaration
gpio: rockchip: convert to dynamic GPIO base allocation
gpio: remove dev-sync-probe
gpio: virtuser: stop using dev-sync-probe
gpio: aggregator: stop using dev-sync-probe
gpio: sim: stop using dev-sync-probe
gpio: Add Intel Nova Lake ACPI GPIO events driver
gpiolib: Make deferral warnings debug messages
gpiolib: fix hogs with multiple lines
gpio: fix up CONFIG_OF dependencies
gpio: gpio-by-pinctrl: add pinctrl based generic GPIO driver
gpio: dt-bindings: Add GPIO on top of generic pin control
firmware: arm_scmi: Allow PINCTRL_REQUEST to return EOPNOTSUPP
pinctrl: scmi: ignore PIN_CONFIG_PERSIST_STATE
pinctrl: scmi: Delete PIN_CONFIG_OUTPUT_IMPEDANCE_OHMS support
pinctrl: scmi: Add SCMI_PIN_INPUT_VALUE
...
Pull hardening updates from Kees Cook:
- randomize_kstack: Improve implementation across arches (Ryan Roberts)
- lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test
- refcount: Remove unused __signed_wrap function annotations
* tag 'hardening-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test
refcount: Remove unused __signed_wrap function annotations
randomize_kstack: Unify random source across arches
randomize_kstack: Maintain kstack_offset per task
Pull crypto library updates from Eric Biggers:
- Migrate more hash algorithms from the traditional crypto subsystem to
lib/crypto/
Like the algorithms migrated earlier (e.g. SHA-*), this simplifies
the implementations, improves performance, enables further
simplifications in calling code, and solves various other issues:
- AES CBC-based MACs (AES-CMAC, AES-XCBC-MAC, and AES-CBC-MAC)
- Support these algorithms in lib/crypto/ using the AES library
and the existing arm64 assembly code
- Reimplement the traditional crypto API's "cmac(aes)",
"xcbc(aes)", and "cbcmac(aes)" on top of the library
- Convert mac80211 to use the AES-CMAC library. Note: several
other subsystems can use it too and will be converted later
- Drop the broken, nonstandard, and likely unused support for
"xcbc(aes)" with key lengths other than 128 bits
- Enable optimizations by default
- GHASH
- Migrate the standalone GHASH code into lib/crypto/
- Integrate the GHASH code more closely with the very similar
POLYVAL code, and improve the generic GHASH implementation to
resist cache-timing attacks and use much less memory
- Reimplement the AES-GCM library and the "gcm" crypto_aead
template on top of the GHASH library. Remove "ghash" from the
crypto_shash API, as it's no longer needed
- Enable optimizations by default
- SM3
- Migrate the kernel's existing SM3 code into lib/crypto/, and
reimplement the traditional crypto API's "sm3" on top of it
- I don't recommend using SM3, but this cleanup is worthwhile
to organize the code the same way as other algorithms
- Testing improvements:
- Add a KUnit test suite for each of the new library APIs
- Migrate the existing ChaCha20Poly1305 test to KUnit
- Make the KUnit all_tests.config enable all crypto library tests
- Move the test kconfig options to the Runtime Testing menu
- Other updates to arch-optimized crypto code:
- Optimize SHA-256 for Zhaoxin CPUs using the Padlock Hash Engine
- Remove some MD5 implementations that are no longer worth keeping
- Drop big endian and voluntary preemption support from the arm64
code, as those configurations are no longer supported on arm64
- Make jitterentropy and samples/tsm-mr use the crypto library APIs
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (66 commits)
lib/crypto: arm64: Assume a little-endian kernel
arm64: fpsimd: Remove obsolete cond_yield macro
lib/crypto: arm64/sha3: Remove obsolete chunking logic
lib/crypto: arm64/sha512: Remove obsolete chunking logic
lib/crypto: arm64/sha256: Remove obsolete chunking logic
lib/crypto: arm64/sha1: Remove obsolete chunking logic
lib/crypto: arm64/poly1305: Remove obsolete chunking logic
lib/crypto: arm64/gf128hash: Remove obsolete chunking logic
lib/crypto: arm64/chacha: Remove obsolete chunking logic
lib/crypto: arm64/aes: Remove obsolete chunking logic
lib/crypto: Include <crypto/utils.h> instead of <crypto/algapi.h>
lib/crypto: aesgcm: Don't disable IRQs during AES block encryption
lib/crypto: aescfb: Don't disable IRQs during AES block encryption
lib/crypto: tests: Migrate ChaCha20Poly1305 self-test to KUnit
lib/crypto: sparc: Drop optimized MD5 code
lib/crypto: mips: Drop optimized MD5 code
lib: Move crypto library tests to Runtime Testing menu
crypto: sm3 - Remove 'struct sm3_state'
crypto: sm3 - Remove the original "sm3_block_generic()"
crypto: sm3 - Remove sm3_base.h
...
This series cleans up some of the special user copy functions naming and
semantics. In particular, get rid of the (very traditional) double
underscore names and behavior: the whole "optimize away the range check"
model has been largely excised from the other user accessors because
it's so subtle and can be unsafe, but also because it's just not a
relevant optimization any more.
To do that, a couple of drivers that misused the "user" copies as kernel
copies in order to get non-temporal stores had to be fixed up, but that
kind of code should never have been allowed anyway.
The x86-only "nocache" version was also renamed to more accurately
reflect what it actually does.
This was all done because I looked at this code due to a report by Jann
Horn, and I just couldn't stand the inconsistent naming, the horrible
semantics, and the random misuse of these functions. This code should
probably be cleaned up further, but it's at least slightly closer to
normal semantics.
I had a more intrusive series that went even further in trying to
normalize the semantics, but that ended up hitting so many other
inconsistencies between different architectures in this area (eg
'size_t' vs 'unsigned long' vs 'int' as size arguments, and various
iovec check differences that Vasily Gorbik pointed out) that I ended up
with this more limited version that fixed the worst of the issues.
Reported-by: Jann Horn <jannh@google.com>
Tested-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/all/CAHk-=wgg1QVWNWG-UCFo1hx0zqrPnB3qhPzUTrWNft+MtXQXig@mail.gmail.com/
* nocache-cleanup:
x86-64/arm64/powerpc: clean up and rename __copy_from_user_flushcache
x86: rename and clean up __copy_from_user_inatomic_nocache()
x86-64: rename misleadingly named '__copy_user_nocache()' function
Johannes Berg says:
====================
Final updates, notably:
- crypto: move Michael MIC code into wireless (only)
- mac80211:
- multi-link 4-addr support
- NAN data support (but no drivers yet)
- ath10k: DT quirk to make it work on some devices
- ath12k: IPQ5424 support
- rtw89: USB improvements for performance
* tag 'wireless-next-2026-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (124 commits)
wifi: cfg80211: Explicitly include <linux/export.h> in michael-mic.c
wifi: ath10k: Add device-tree quirk to skip host cap QMI requests
dt-bindings: wireless: ath10k: Add quirk to skip host cap QMI requests
crypto: Remove michael_mic from crypto_shash API
wifi: ipw2x00: Use michael_mic() from cfg80211
wifi: ath12k: Use michael_mic() from cfg80211
wifi: ath11k: Use michael_mic() from cfg80211
wifi: mac80211, cfg80211: Export michael_mic() and move it to cfg80211
wifi: ipw2x00: Rename michael_mic() to libipw_michael_mic()
wifi: libertas_tf: refactor endpoint lookup
wifi: libertas: refactor endpoint lookup
wifi: at76c50x: refactor endpoint lookup
wifi: ath12k: Enable IPQ5424 WiFi device support
wifi: ath12k: Add CE remap hardware parameters for IPQ5424
wifi: ath12k: add ath12k_hw_regs for IPQ5424
wifi: ath12k: add ath12k_hw_version_map entry for IPQ5424
wifi: ath12k: Add ath12k_hw_params for IPQ5424
dt-bindings: net: wireless: add ath12k wifi device IPQ5424
wifi: ath10k: fix station lookup failure during disconnect
wifi: ath12k: Create symlink for each radio in a wiphy
...
====================
Link: https://patch.msgid.link/20260410064703.735099-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The function creates temporary buffer to convert xibm->bitmap to a
human-readable list before passing it to seq_printf. Drop it and print
the list by seq_printf() directly with the "%*pbl" specifier.
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Acked-by: Madhavan Srinivasan <maddy@linux.ibm.com> for powerpc patch
Signed-off-by: Yury Norov <ynorov@nvidia.com>
Remove the "michael_mic" crypto_shash algorithm, since it's no longer
used. Its only users were wireless drivers, which have now been
converted to use the michael_mic() function instead.
It makes sense that no other users ever appeared: Michael MIC is an
insecure algorithm that is specific to WPA TKIP, which itself was an
interim security solution to replace the broken WEP standard.
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://patch.msgid.link/20260408030651.80336-7-ebiggers@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add support for a new instruction
BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=0
which does an indirect jump to a location stored in Rx. The
register Rx should have type PTR_TO_INSN. This new type ensures
that the Rx register contains a value (or a range of values)
loaded from a correct jump table – map of type instruction array.
Support indirect jump to all registers in powerpc64 JIT using
the ctr register. Move Rx content to ctr register, then invoke
bctr instruction to branch to address stored in ctr register.
Skip save and restore of TOC as the jump is always within the
program context.
Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260401152133.42544-4-adubey@linux.ibm.com
On loading the BPF program, the verifier might adjust/omit some
instructions. The adjusted instruction offset is accounted in the
map containing original instruction -> xlated mapping. This patch
add ppc64 JIT support to additionally build the xlated->jitted
mapping for every instruction present in instruction array. This
change is needed to enable support for indirect jumps, added in a
subsequent patch.
Invoke bpf_prog_update_insn_ptrs() with offset pair of xlated_offset
and jited_offset. The offset mapping is already available, which is
being used for bpf_prog_fill_jited_linfo() and can be directly used
for bpf_prog_update_insn_ptrs() as well.
Additional details present at:
commit b4ce5923e7 ("bpf, x86: add new map type: instructions array")
Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260401152133.42544-2-adubey@linux.ibm.com
Implement JIT support for fsession in powerpc64 trampoline.
The trampoline stack now accommodate session cookies and
function metadata in place of function argument. fentry/fexit
programs consume corresponding function metadata. This mirrors
existing x86 behavior and enable session cookies on powerpc64.
# ./test_progs -t fsession
#135/1 fsession_test/fsession_test:OK
#135/2 fsession_test/fsession_reattach:OK
#135/3 fsession_test/fsession_cookie:OK
#135 fsession_test:OK
Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260401141043.41513-1-adubey@linux.ibm.com
Provision the private stack as a per-CPU allocation during
bpf_int_jit_compile(). Align the stack to 16 bytes and place guard
regions at both ends to detect runtime stack overflow and underflow.
Round the private stack size up to the nearest 16-byte boundary.
Make each guard region 16 bytes to preserve the required overall
16-byte alignment. When private stack is set, skip bpf stack size
accounting in kernel stack.
There is no stack pointer in powerpc. Stack referencing during JIT
is done using frame pointer. Frame pointer calculation goes like:
BPF frame pointer = Priv stack allocation start address +
Overflow guard +
Actual stack size defined by verifier
Memory layout:
High Addr +--------------------------------------------------+
| |
| 16 bytes Underflow guard (0xEB9F12345678eb9fULL) |
| |
BPF FP -> +--------------------------------------------------+
| |
| Private stack - determined by verifier |
| 16-bytes aligned |
| |
+--------------------------------------------------+
| |
Lower Addr | 16 byte Overflow guard (0xEB9F12345678eb9fULL) |
| |
Priv stack alloc ->+--------------------------------------------------+
start
Update BPF_REG_FP to point to the calculated offset within the
allocated private stack buffer. Now, BPF stack usage reference
in the allocated private stack.
Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260401103215.104438-1-adubey@linux.ibm.com
Commit 861574d51b ("powerpc/uaccess: Implement masked user access")
provides optimised user access by avoiding the cost of access_ok().
Convert csum_and_copy_to_user() and csum_and_copy_from_user() to
scoped user access to benefit from masked user access.
csum_and_copy_to_user() and csum_and_copy_from_user() are only
called respectively by csum_and_copy_to_iter() and
csum_and_copy_from_iter_full() and they are only called twice.
Those functions used to be large but they were first reduced by
commit c693cc4676 ("saner calling conventions for
csum_and_copy_..._user()") then commit 70d65cd555 ("ppc: propagate
the calling conventions change down to csum_partial_copy_generic()").
With the additional size reduction provided by conversion to scoped
user access they are not worth being kept out of line.
$ ./scripts/bloat-o-meter vmlinux.0 vmlinux.1
add/remove: 0/2 grow/shrink: 2/0 up/down: 136/-176 (-40)
Function old new delta
csum_and_copy_to_iter 2416 2488 +72
csum_and_copy_from_iter_full 2272 2336 +64
csum_and_copy_to_user 88 - -88
csum_and_copy_from_user 88 - -88
Total: Before=11514471, After=11514431, chg -0.00%
Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/f44e1b2760dbed35b237040001a91bc8304b726b.1773137098.git.chleroy@kernel.org
Commit 861574d51b ("powerpc/uaccess: Implement masked user access")
provides optimised user access by avoiding the cost of access_ok().
Convert gpr32_set_common_user() to scoped user access to benefit
from masked user access.
Scoped user access also make the code simpler.
Also changes label from Efault to efault to avoid checkpatch
complaining about CamelCase.
Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/2409643daab08b4bc07004c2b88f42085d1ef45a.1773136838.git.chleroy@kernel.org
Fix enum and function return value kernel-doc warnings:
Warning: spu.c:36 Excess enum value '%spe_type_logical' description in 'spe_type'
Warning: spu.c:78 Excess enum value '%spe_ex_state_unexecutable' description in 'spe_ex_state'
Warning: spu.c:78 Excess enum value '%spe_ex_state_executable' description in 'spe_ex_state'
Warning: spu.c:78 Excess enum value '%spe_ex_state_executed' description in 'spe_ex_state'
Warning: spu.c:190 No description found for return value of 'setup_areas'
Fixes: de91a53429 ("[POWERPC] ps3: add spu support")
Fixes: b47027795a ("powerpc/ps3: Fix ioremap of spu shadow regs")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260225055328.249204-1-rdunlap@infradead.org
Remove empty comment line at the beginning of a kernel-doc function
block. Add a "Return:" section for this function.
These changes prevent 2 kernel-doc warnings:
Warning: ../arch/powerpc/kernel/kgdb.c:103 Cannot find identifier on line:
*
Warning: kgdb.c:113 No description found for return value of 'kgdb_skipexception'
Fixes: 949616cf2d ("powerpc/kgdb: Bail out of KGDB when we've been triggered")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260225055314.247966-1-rdunlap@infradead.org
Fix some kernel-doc warnings in ps3.h:
- add @dev to struct ps3_dma_region
- don't mark a function as "struct"
- add Returns: description for one function
- add a short description for ps3_system_bus_set_drvdata()
- correct an enum @name
- move intervening "struct ps3_system_bus_device;" from between
kernel-doc for ps3_dma_region_init() and the function declaration
to eliminate these warnings:
Warning: arch/powerpc/include/asm/ps3.h:96 struct member 'dev' not
described in 'ps3_dma_region'
Warning: arch/powerpc/include/asm/ps3.h:118 struct ps3_system_bus_device;
error: Cannot parse struct or union!
Warning: arch/powerpc/include/asm/ps3.h:166 int
ps3_mmio_region_init(struct ps3_system_bus_device *dev, struct
ps3_mmio_region *r, unsigned long bus_addr, unsigned long len, enum
ps3_mmio_page_size page_size); error: Cannot parse struct or union!
Warning: arch/powerpc/include/asm/ps3.h:167 No description found for
return value of 'ps3_mmio_region_init'
Warning: arch/powerpc/include/asm/ps3.h:407 missing initial short
description on line:
* ps3_system_bus_set_drvdata -
Warning: arch/powerpc/include/asm/ps3.h:473 Enum value
'PS3_LPM_TB_TYPE_INTERNAL' not described in enum 'ps3_lpm_tb_type'
Warning: arch/powerpc/include/asm/ps3.h:473 Excess enum value
'@PS3_LPM_RIGHTS_USE_TB' description in 'ps3_lpm_tb_type'
This leaves struct members in several structs and function parameters in
one function still undescribed.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251129183636.1893634-1-rdunlap@infradead.org
Adjust the name of the drive slot LED node to comply with the schema in
Documentation/devicetree/bindings/leds/leds-gpio.yaml.
arch/powerpc/boot/dts/wii.dtb: gpio-leds: 'drive-slot' does not match
any of the regexes: '(^led-[0-9a-f]$|led)', 'pinctrl-[0-9]+'
Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260311-wii-schema-v1-3-1563ac4aefa8@posteo.net
Move CONFIG_GAMECUBE and CONFIG_WII directly below other embedded6xx
boards, and above options such as TSI108_BRIDGE. This has two
advantages for the GC/Wii options:
- They won't be moved around by USBGECKO_UDBG appearing or disappearing
- They will be intendented in menuconfig/nconfig, to make it clear they
are part of the embedded6xx platforms
Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260303-gcwii-kconfig-v1-1-636b288e7270@posteo.net
The driver currently sets the handler data and the chained handler in
two separate steps. This creates a theoretical race window where an
interrupt could fire after the handler is set but before the data is
assigned, leading to a NULL pointer dereference.
Replace the two calls with irq_set_chained_handler_and_data() to set
both the handler and its data atomically under the irq_desc->lock.
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260119063507.940782-1-nichen@iscas.ac.cn
The driver currently sets the handler data and the chained handler in
two separate steps. This creates a theoretical race window where an
interrupt could fire after the handler is set but before the data is
assigned, leading to a NULL pointer dereference.
Replace the two calls with irq_set_chained_handler_and_data() to set
both the handler and its data atomically under the irq_desc->lock.
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260119061232.889236-1-nichen@iscas.ac.cn
The driver currently sets the handler data and the chained handler in
two separate steps. This creates a theoretical race window where an
interrupt could fire after the handler is set but before the data is
assigned, leading to a NULL pointer dereference.
Replace the two calls with irq_set_chained_handler_and_data() to set
both the handler and its data atomically under the irq_desc->lock.
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260119060450.889119-1-nichen@iscas.ac.cn
This reverts commit a9dadc1c51.
The commit message states:
When called from xive_irq_startup(), the size of the cpumask can be
larger than nr_cpu_ids. This can result in a WARN_ON.
[...]
This happens because we're being called with our affinity mask set to
irq_default_affinity. That in turn was populated using
cpumask_setall(), which sets NR_CPUs worth of bits, not nr_cpu_ids
worth. Finally cpumask_weight() will return > nr_cpu_ids when passed a
mask which has > nr_cpu_ids bits set.
In modern kernel, cpumask_weight() can't return > nr_cpu_ids.
In inline case, cpumask_setall() explicitly clears all bits above
nr_cpu_ids, see commit 63355b9884 ("cpumask: be more careful with
'cpumask_setall()'"). So, despite that cpumask_weight() is passed
with small_cpumask_bits, which is NR_CPUS in this case, it can't
count over the nr_cpu_ids.
In outline case, cpumask_setall() may set bits beyond the limit up to
the next byte alignment, but in this case small_cpumask_bits is wired
to nr_cpu_ids, thus making overcounting impossible.
Signed-off-by: Yury Norov <ynorov@nvidia.com>
Tested-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com>
Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260319033647.881246-2-ynorov@nvidia.com
When elfcorehdr is prepared for kdump, the program header representing
the first 64 KB of memory is expected to have its offset point to the
backup region. This is required because purgatory copies the first 64 KB
of the crashed kernel memory to this backup region following a kernel
crash. This allows the capture kernel to use the first 64 KB of memory
to place the exception vectors and other required data.
When elfcorehdr is recreated due to memory hotplug, the offset of
the program header representing the first 64 KB is not updated.
As a result, the capture kernel exports the first 64 KB at offset
0, even though the data actually resides in the backup region.
Fix this by calling sync_backup_region_phdr() to update the program
header offset in the elfcorehdr created during memory hotplug.
sync_backup_region_phdr() works for images loaded via the
kexec_file_load syscall. However, it does not work for kexec_load,
because image->arch.backup_start is not initialized in that case.
So introduce machine_kexec_post_load() to process the elfcorehdr
prepared by kexec-tools and initialize image->arch.backup_start for
kdump images loaded via kexec_load syscall.
Rename update_backup_region_phdr() to sync_backup_region_phdr() and
extend it to synchronize the backup region offset between the kdump
image and the ELF core header. The helper now supports updating either
the kdump image from the ELF program header or updating the ELF program
header from the kdump image, avoiding code duplication.
Define ARCH_HAS_KIMAGE_ARCH and struct kimage_arch when
CONFIG_KEXEC_FILE or CONFIG_CRASH_DUMP is enabled so that
kimage->arch.backup_start is available with the kexec_load system call.
This patch depends on the patch titled
"powerpc/crash: fix backup region offset update to elfcorehdr".
Fixes: 849599b702 ("powerpc/crash: add crash memory hotplug support")
Reviewed-by: Aditya Gupta <adityag@linux.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260312083051.1935737-3-sourabhjain@linux.ibm.com
update_backup_region_phdr() in file_load_64.c iterates over all the
program headers in the kdump kernel’s elfcorehdr and updates the
p_offset of the program header whose physical address starts at 0.
However, the loop logic is incorrect because the program header pointer
is not updated during iteration. Since elfcorehdr typically contains
PT_NOTE entries first, the PT_LOAD program header with physical address
0 is never reached. As a result, its p_offset is not updated to point to
the backup region.
Because of this behavior, the capture kernel exports the first 64 KB of
the crashed kernel’s memory at offset 0, even though that memory
actually lives in the backup region. When a crash happens, purgatory
copies the first 64 KB of the crashed kernel’s memory into the backup
region so the capture kernel can safely use it.
This has not caused problems so far because the first 64 KB is usually
identical in both the crashed and capture kernels. However, this is
just an assumption and is not guaranteed to always hold true.
Fix update_backup_region_phdr() to correctly update the p_offset of the
program header with a starting physical address of 0 by correcting the
logic used to iterate over the program headers.
Fixes: cb350c1f1f ("powerpc/kexec_file: Prepare elfcore header for crashing kernel")
Reviewed-by: Aditya Gupta <adityag@linux.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260312083051.1935737-2-sourabhjain@linux.ibm.com
This finishes the work on these odd functions that were only implemented
by a handful of architectures.
The 'flushcache' function was only used from the iterator code, and
let's make it do the same thing that the nontemporal version does:
remove the two underscores and add the user address checking.
Yes, yes, the user address checking is also done at iovec import time,
but we have long since walked away from the old double-underscore thing
where we try to avoid address checking overhead at access time, and
these functions shouldn't be so special and old-fashioned.
The arm64 version already did the address check, in fact, so there it's
just a matter of renaming it. For powerpc and x86-64 we now do the
proper user access boilerplate.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kmemleak reports the following memory leak:
Unreferenced object 0xc0000002a7fbc640 (size 64):
comm "kworker/8:1", pid 540, jiffies 4294937872
hex dump (first 32 bytes):
01 00 00 00 00 00 00 00 00 00 09 04 00 04 00 00 ................
00 00 a7 81 00 00 0a c0 00 00 08 04 00 04 00 00 ................
backtrace (crc 177d48f6):
__kmalloc_cache_noprof+0x520/0x730
xive_irq_alloc_data.constprop.0+0x40/0xe0
xive_irq_domain_alloc+0xd0/0x1b0
irq_domain_alloc_irqs_parent+0x44/0x6c
pseries_irq_domain_alloc+0x1cc/0x354
irq_domain_alloc_irqs_parent+0x44/0x6c
msi_domain_alloc+0xb0/0x220
irq_domain_alloc_irqs_locked+0x138/0x4d0
__irq_domain_alloc_irqs+0x8c/0xfc
__msi_domain_alloc_irqs+0x214/0x4d8
msi_domain_alloc_irqs_all_locked+0x70/0xf8
pci_msi_setup_msi_irqs+0x60/0x78
__pci_enable_msix_range+0x54c/0x98c
pci_alloc_irq_vectors_affinity+0x16c/0x1d4
nvme_pci_enable+0xac/0x9c0 [nvme]
nvme_probe+0x340/0x764 [nvme]
This occurs when allocating MSI-X vectors for an NVMe device. During
allocation the XIVE code creates a struct xive_irq_data and stores it
in irq_data->chip_data.
When the MSI-X irqdomain is later freed, xive_irq_free_data() is
responsible for retrieving this structure and freeing it. However,
after commit cc0cc23bab ("powerpc/xive: Untangle xive from child
interrupt controller drivers"), xive_irq_free_data() retrieves the
chip_data using irq_get_chip_data(), which looks up the data through
the child domain.
This is incorrect because the XIVE-specific irq data is associated with
the XIVE (parent) domain. As a result the lookup fails and the allocated
struct xive_irq_data is never freed, leading to the kmemleak report
shown above.
Fix this by retrieving the irq_data from the correct domain using
irq_domain_get_irq_data() and then accessing the chip_data via
irq_data_get_irq_chip_data().
Cc: stable@vger.kernel.org
Fixes: cc0cc23bab ("powerpc/xive: Untangle xive from child interrupt controller drivers")
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260311134336.326996-1-nilay@linux.ibm.com
This uses _RPAGE_SW2 bit for the PMD and PUDs similar to PTEs.
This also adds support for {pte,pmd,pud}_pgprot helpers needed for
follow_pfnmap APIs.
This allows us to extend the PFN mappings, e.g. PCI MMIO bars where
it can grow as large as 8GB or even bigger, to map at PMD / PUD level.
VFIO PCI core driver already supports fault handling at PMD / PUD level
for more efficient BAR mappings.
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/6fca726574236f556dd4e1e259692e82a4c29e85.1773058761.git.ritesh.list@gmail.com
Previously different architectures were using random sources of
differing strength and cost to decide the random kstack offset. A number
of architectures (loongarch, powerpc, s390, x86) were using their
timestamp counter, at whatever the frequency happened to be. Other
arches (arm64, riscv) were using entropy from the crng via
get_random_u16().
There have been concerns that in some cases the timestamp counters may
be too weak, because they can be easily guessed or influenced by user
space. And get_random_u16() has been shown to be too costly for the
level of protection kstack offset randomization provides.
So let's use a common, architecture-agnostic source of entropy; a
per-cpu prng, seeded at boot-time from the crng. This has a few
benefits:
- We can remove choose_random_kstack_offset(); That was only there to
try to make the timestamp counter value a bit harder to influence
from user space [*].
- The architecture code is simplified. All it has to do now is call
add_random_kstack_offset() in the syscall path.
- The strength of the randomness can be reasoned about independently
of the architecture.
- Arches previously using get_random_u16() now have much faster
syscall paths, see below results.
[*] Additionally, this gets rid of some redundant work on s390 and x86.
Before this patch, those architectures called
choose_random_kstack_offset() under arch_exit_to_user_mode_prepare(),
which is also called for exception returns to userspace which were *not*
syscalls (e.g. regular interrupts). Getting rid of
choose_random_kstack_offset() avoids a small amount of redundant work
for the non-syscall cases.
In some configurations, add_random_kstack_offset() will now call
instrumentable code, so for a couple of arches, I have moved the call a
bit later to the first point where instrumentation is allowed. This
doesn't impact the efficacy of the mechanism.
There have been some claims that a prng may be less strong than the
timestamp counter if not regularly reseeded. But the prng has a period
of about 2^113. So as long as the prng state remains secret, it should
not be possible to guess. If the prng state can be accessed, we have
bigger problems.
Additionally, we are only consuming 6 bits to randomize the stack, so
there are only 64 possible random offsets. I assert that it would be
trivial for an attacker to brute force by repeating their attack and
waiting for the random stack offset to be the desired one. The prng
approach seems entirely proportional to this level of protection.
Performance data are provided below. The baseline is v6.18 with rndstack
on for each respective arch. (I)/(R) indicate statistically significant
improvement/regression. arm64 platform is AWS Graviton3 (m7g.metal).
x86_64 platform is AWS Sapphire Rapids (m7i.24xlarge):
+-----------------+--------------+---------------+---------------+
| Benchmark | Result Class | per-cpu-prng | per-cpu-prng |
| | | arm64 (metal) | x86_64 (VM) |
+=================+==============+===============+===============+
| syscall/getpid | mean (ns) | (I) -9.50% | (I) -17.65% |
| | p99 (ns) | (I) -59.24% | (I) -24.41% |
| | p99.9 (ns) | (I) -59.52% | (I) -28.52% |
+-----------------+--------------+---------------+---------------+
| syscall/getppid | mean (ns) | (I) -9.52% | (I) -19.24% |
| | p99 (ns) | (I) -59.25% | (I) -25.03% |
| | p99.9 (ns) | (I) -59.50% | (I) -28.17% |
+-----------------+--------------+---------------+---------------+
| syscall/invalid | mean (ns) | (I) -10.31% | (I) -18.56% |
| | p99 (ns) | (I) -60.79% | (I) -20.06% |
| | p99.9 (ns) | (I) -61.04% | (I) -25.04% |
+-----------------+--------------+---------------+---------------+
I tested an earlier version of this change on x86 bare metal and it
showed a smaller but still significant improvement. The bare metal
system wasn't available this time around so testing was done in a VM
instance. I'm guessing the cost of rdtsc is higher for VMs.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://patch.msgid.link/20260303150840.3789438-3-ryan.roberts@arm.com
Signed-off-by: Kees Cook <kees@kernel.org>
Remove the "p8_ghash" crypto_shash algorithm. Move the corresponding
assembly code into lib/crypto/, and wire it up to the GHASH library.
This makes the GHASH library be optimized for POWER8. It also greatly
reduces the amount of powerpc-specific glue code that is needed, and it
fixes the issue where this optimized GHASH code was disabled by default.
Note that previously the C code defined the POWER8 GHASH key format as
"u128 htable[16]", despite the assembly code only using four entries.
Fix the C code to use the correct key format. To fulfill the library
API contract, also make the key preparation work in all contexts.
Note that the POWER8 assembly code takes the accumulator in GHASH
format, but it actually byte-reflects it to get it into POLYVAL format.
The library already works with POLYVAL natively. For now, just wire up
this existing code by converting it to/from GHASH format in C code.
This should be cleaned up to eliminate the unnecessary conversion later.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
The "small" argument to do_sys_ftruncate indicates if > 32-bit size
should be reject, but all the arch-specific compat ftruncate64
implementations get this wrong. Merge do_sys_ftruncate and
ksys_ftruncate, replace the integer as boolean small flag with a
descriptive one about LFS semantics, and use it correctly in the
architecture-specific ftruncate64 implementations.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Fixes: 3dd681d944 ("arm64: 32-bit (compat) applications support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260323070205.2939118-2-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>