linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-18 06:44:00 -04:00

Author	SHA1	Message	Date
Linus Torvalds	01f492e181	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "Arm: - Add support for tracing in the standalone EL2 hypervisor code, which should help both debugging and performance analysis. This uses the new infrastructure for 'remote' trace buffers that can be exposed by non-kernel entities such as firmware, and which came through the tracing tree - Add support for GICv5 Per Processor Interrupts (PPIs), as the starting point for supporting the new GIC architecture in KVM - Finally add support for pKVM protected guests, where pages are unmapped from the host as they are faulted into the guest and can be shared back from the guest using pKVM hypercalls. Protected guests are created using a new machine type identifier. As the elusive guestmem has not yet delivered on its promises, anonymous memory is also supported This is only a first step towards full isolation from the host; for example, the CPU register state and DMA accesses are not yet isolated. Because this does not really yet bring fully what it promises, it is hidden behind CONFIG_ARM_PKVM_GUEST + 'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is created. Caveat emptor - Rework the dreaded user_mem_abort() function to make it more maintainable, reducing the amount of state being exposed to the various helpers and rendering a substantial amount of state immutable - Expand the Stage-2 page table dumper to support NV shadow page tables on a per-VM basis - Tidy up the pKVM PSCI proxy code to be slightly less hard to follow - Fix both SPE and TRBE in non-VHE configurations so that they do not generate spurious, out of context table walks that ultimately lead to very bad HW lockups - A small set of patches fixing the Stage-2 MMU freeing in error cases - Tighten-up accepted SMC immediate value to be only #0 for host SMCCC calls - The usual cleanups and other selftest churn LoongArch: - Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel() - Add DMSINTC irqchip in kernel support RISC-V: - Fix steal time shared memory alignment checks - Fix vector context allocation leak - Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi() - Fix double-free of sdata in kvm_pmu_clear_snapshot_area() - Fix integer overflow in kvm_pmu_validate_counter_mask() - Fix shift-out-of-bounds in make_xfence_request() - Fix lost write protection on huge pages during dirty logging - Split huge pages during fault handling for dirty logging - Skip CSR restore if VCPU is reloaded on the same core - Implement kvm_arch_has_default_irqchip() for KVM selftests - Factored-out ISA checks into separate sources - Added hideleg to struct kvm_vcpu_config - Factored-out VCPU config into separate sources - Support configuration of per-VM HGATP mode from KVM user space s390: - Support for ESA (31-bit) guests inside nested hypervisors - Remove restriction on memslot alignment, which is not needed anymore with the new gmap code - Fix LPSW/E to update the bear (which of course is the breaking event address register) x86: - Shut up various UBSAN warnings on reading module parameter before they were initialized - Don't zero-allocate page tables that are used for splitting hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in the page table and thus write all bytes - As an optimization, bail early when trying to unsync 4KiB mappings if the target gfn can just be mapped with a 2MiB hugepage x86 generic: - Copy single-chunk MMIO write values into struct kvm_vcpu (more precisely struct kvm_mmio_fragment) to fix use-after-free stack bugs where KVM would dereference stack pointer after an exit to userspace - Clean up and comment the emulated MMIO code to try to make it easier to maintain (not necessarily "easy", but "easier") - Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not all of VMX and SVM enabling) as it is needed for trusted I/O - Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions - Immediately fail the build if a required #define is missing in one of KVM's headers that is included multiple times - Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected exception, mostly to prevent syzkaller from abusing the uAPI to trigger WARNs, but also because it can help prevent userspace from unintentionally crashing the VM - Exempt SMM from CPUID faulting on Intel, as per the spec - Misc hardening and cleanup changes x86 (AMD): - Fix and optimize IRQ window inhibit handling for AVIC; make it per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple vCPUs have to-be-injected IRQs - Clean up and optimize the OSVW handling, avoiding a bug in which KVM would overwrite state when enabling virtualization on multiple CPUs in parallel. This should not be a problem because OSVW should usually be the same for all CPUs - Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains about a "too large" size based purely on user input - Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION - Disallow synchronizing a VMSA of an already-launched/encrypted vCPU, as doing so for an SNP guest will crash the host due to an RMP violation page fault - Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped queries are required to hold kvm->lock, and enforce it by lockdep. Fix various bugs where sev_guest() was not ensured to be stable for the whole duration of a function or ioctl - Convert a pile of kvm->lock SEV code to guard() - Play nicer with userspace that does not enable KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6 as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the payload would end up in EXITINFO2 rather than CR2, for example). Only set CR2 and DR6 when consumption of the payload is imminent, but on the other hand force delivery of the payload in all paths where userspace retrieves CR2 or DR6 - Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT instead of vmcb02->save.cr2. The value is out of sync after a save/restore or after a #PF is injected into L2 - Fix a class of nSVM bugs where some fields written by the CPU are not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so are not up-to-date when saved by KVM_GET_NESTED_STATE - Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly initialized after save+restore - Add a variety of missing nSVM consistency checks - Fix several bugs where KVM failed to correctly update VMCB fields on nested #VMEXIT - Fix several bugs where KVM failed to correctly synthesize #UD or #GP for SVM-related instructions - Add support for save+restore of virtualized LBRs (on SVM) - Refactor various helpers and macros to improve clarity and (hopefully) make the code easier to maintain - Aggressively sanitize fields when copying from vmcb12, to guard against unintentionally allowing L1 to utilize yet-to-be-defined features - Fix several bugs where KVM botched rAX legality checks when emulating SVM instructions. There are remaining issues in that KVM doesn't handle size prefix overrides for 64-bit guests - Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails instead of somewhat arbitrarily synthesizing #GP (i.e. don't double down on AMD's architectural but sketchy behavior of generating #GP for "unsupported" addresses) - Cache all used vmcb12 fields to further harden against TOCTOU bugs x86 (Intel): - Drop obsolete branch hint prefixes from the VMX instruction macros - Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a register input when appropriate - Code cleanups guest_memfd: - Don't mark guest_memfd folios as accessed, as guest_memfd doesn't support reclaim, the memory is unevictable, and there is no storage to write back to LoongArch selftests: - Add KVM PMU test cases s390 selftests: - Enable more memory selftests x86 selftests: - Add support for Hygon CPUs in KVM selftests - Fix a bug in the MSR test where it would get false failures on AMD/Hygon CPUs with exactly one of RDPID or RDTSCP - Add an MADV_COLLAPSE testcase for guest_memfd as a regression test for a bug where the kernel would attempt to collapse guest_memfd folios against KVM's will" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (373 commits) KVM: x86: use inlines instead of macros for is_sev_*guest x86/virt: Treat SVM as unsupported when running as an SEV+ guest KVM: SEV: Goto an existing error label if charging misc_cg for an ASID fails KVM: SVM: Move lock-protected allocation of SEV ASID into a separate helper KVM: SEV: use mutex guard in snp_handle_guest_req() KVM: SEV: use mutex guard in sev_mem_enc_unregister_region() KVM: SEV: use mutex guard in sev_mem_enc_ioctl() KVM: SEV: use mutex guard in snp_launch_update() KVM: SEV: Assert that kvm->lock is held when querying SEV+ support KVM: SEV: Document that checking for SEV+ guests when reclaiming memory is "safe" KVM: SEV: Hide "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y KVM: SEV: WARN on unhandled VM type when initializing VM KVM: LoongArch: selftests: Add PMU overflow interrupt test KVM: LoongArch: selftests: Add basic PMU event counting test KVM: LoongArch: selftests: Add cpucfg read/write helpers LoongArch: KVM: Add DMSINTC inject msi to vCPU LoongArch: KVM: Add DMSINTC device support LoongArch: KVM: Make vcpu_is_preempted() as a macro rather than function LoongArch: KVM: Move host CSR_GSTAT save and restore in context switch LoongArch: KVM: Move host CSR_EENTRY save and restore in context switch ...	2026-04-17 07:18:03 -07:00
Linus Torvalds	c43267e679	Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "The biggest changes are MPAM enablement in drivers/resctrl and new PMU support under drivers/perf. On the core side, FEAT_LSUI lets futex atomic operations with EL0 permissions, avoiding PAN toggling. The rest is mostly TLB invalidation refactoring, further generic entry work, sysreg updates and a few fixes. Core features: - Add support for FEAT_LSUI, allowing futex atomic operations without toggling Privileged Access Never (PAN) - Further refactor the arm64 exception handling code towards the generic entry infrastructure - Optimise __READ_ONCE() with CONFIG_LTO=y and allow alias analysis through it Memory management: - Refactor the arm64 TLB invalidation API and implementation for better control over barrier placement and level-hinted invalidation - Enable batched TLB flushes during memory hot-unplug - Fix rodata=full block mapping support for realm guests (when BBML2_NOABORT is available) Perf and PMU: - Add support for a whole bunch of system PMUs featured in NVIDIA's Tegra410 SoC (cspmu extensions for the fabric and PCIe, new drivers for CPU/C2C memory latency PMUs) - Clean up iomem resource handling in the Arm CMN driver - Fix signedness handling of AA64DFR0.{PMUVer,PerfMon} MPAM (Memory Partitioning And Monitoring): - Add architecture context-switch and hiding of the feature from KVM - Add interface to allow MPAM to be exposed to user-space using resctrl - Add errata workaround for some existing platforms - Add documentation for using MPAM and what shape of platforms can use resctrl Miscellaneous: - Check DAIF (and PMR, where relevant) at task-switch time - Skip TFSR_EL1 checks and barriers in synchronous MTE tag check mode (only relevant to asynchronous or asymmetric tag check modes) - Remove a duplicate allocation in the kexec code - Remove redundant save/restore of SCS SP on entry to/from EL0 - Generate the KERNEL_HWCAP_ definitions from the arm64 hwcap descriptions - Add kselftest coverage for cmpbr_sigill() - Update sysreg definitions" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (109 commits) arm64: rsi: use linear-map alias for realm config buffer arm64: Kconfig: fix duplicate word in CMDLINE help text arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12 arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps arm64: kexec: Remove duplicate allocation for trans_pgd ACPI: AGDI: fix missing newline in error message arm64: Check DAIF (and PMR) at task-switch time arm64: entry: Use split preemption logic arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode() arm64: entry: Consistently prefix arm64-specific wrappers arm64: entry: Don't preempt with SError or Debug masked entry: Split preemption from irqentry_exit_to_kernel_mode() entry: Split kernel mode logic from irqentry_{enter,exit}() entry: Move irqentry_enter() prototype later entry: Remove local_irq_{enable,disable}_exit_to_user() ...	2026-04-14 16:48:56 -07:00
Catalin Marinas	480a9e57cc	Merge branches 'for-next/misc', 'for-next/tlbflush', 'for-next/ttbr-macros-cleanup', 'for-next/kselftest', 'for-next/feat_lsui', 'for-next/mpam', 'for-next/hotplug-batched-tlbi', 'for-next/bbml2-fixes', 'for-next/sysreg', 'for-next/generic-entry' and 'for-next/acpi', remote-tracking branches 'arm64/for-next/perf' and 'arm64/for-next/read-once' into for-next/core * arm64/for-next/perf: : Perf updates perf/arm-cmn: Fix resource_size_t printk specifier in arm_cmn_init_dtc() perf/arm-cmn: Fix incorrect error check for devm_ioremap() perf: add NVIDIA Tegra410 C2C PMU perf: add NVIDIA Tegra410 CPU Memory Latency PMU perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU perf/arm_cspmu: Add arm_cspmu_acpi_dev_get perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU perf/arm_cspmu: nvidia: Rename doc to Tegra241 perf/arm-cmn: Stop claiming entire iomem region arm64: cpufeature: Use pmuv3_implemented() function arm64: cpufeature: Make PMUVer and PerfMon unsigned KVM: arm64: Read PMUVer as unsigned * arm64/for-next/read-once: : Fixes for __READ_ONCE() with CONFIG_LTO=y arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y arm64: Optimize __READ_ONCE() with CONFIG_LTO=y * for-next/misc: : Miscellaneous cleanups/fixes arm64: rsi: use linear-map alias for realm config buffer arm64: Kconfig: fix duplicate word in CMDLINE help text arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps arm64: kexec: Remove duplicate allocation for trans_pgd arm64: mm: Use generic enum pgtable_level arm64: scs: Remove redundant save/restore of SCS SP on entry to/from EL0 arm64: remove ARCH_INLINE_* * for-next/tlbflush: : Refactor the arm64 TLB invalidation API and implementation arm64: mm: __ptep_set_access_flags must hint correct TTL arm64: mm: Provide level hint for flush_tlb_page() arm64: mm: Wrap flush_tlb_page() around __do_flush_tlb_range() arm64: mm: More flags for __flush_tlb_range() arm64: mm: Refactor __flush_tlb_range() to take flags arm64: mm: Refactor flush_tlb_page() to use __tlbi_level_asid() arm64: mm: Simplify __flush_tlb_range_limit_excess() arm64: mm: Simplify __TLBI_RANGE_NUM() macro arm64: mm: Re-implement the __flush_tlb_range_op macro in C arm64: mm: Inline __TLBI_VADDR_RANGE() into __tlbi_range() arm64: mm: Push __TLBI_VADDR() into __tlbi_level() arm64: mm: Implicitly invalidate user ASID based on TLBI operation arm64: mm: Introduce a C wrapper for by-range TLB invalidation arm64: mm: Re-implement the __tlbi_level macro as a C function * for-next/ttbr-macros-cleanup: : Cleanups of the TTBR1_* macros arm64/mm: Directly use TTBRx_EL1_CnP arm64/mm: Directly use TTBRx_EL1_ASID_MASK arm64/mm: Describe TTBR1_BADDR_4852_OFFSET * for-next/kselftest: : arm64 kselftest updates selftests/arm64: Implement cmpbr_sigill() to hwcap test * for-next/feat_lsui: : Futex support using FEAT_LSUI instructions to avoid toggling PAN arm64: armv8_deprecated: Disable swp emulation when FEAT_LSUI present arm64: Kconfig: Add support for LSUI KVM: arm64: Use CAST instruction for swapping guest descriptor arm64: futex: Support futex with FEAT_LSUI arm64: futex: Refactor futex atomic operation KVM: arm64: kselftest: set_id_regs: Add test for FEAT_LSUI KVM: arm64: Expose FEAT_LSUI to guests arm64: cpufeature: Add FEAT_LSUI * for-next/mpam: (40 commits) : Expose MPAM to user-space via resctrl: : - Add architecture context-switch and hiding of the feature from KVM. : - Add interface to allow MPAM to be exposed to user-space using resctrl. : - Add errata workaoround for some existing platforms. : - Add documentation for using MPAM and what shape of platforms can use resctrl arm64: mpam: Add initial MPAM documentation arm_mpam: Quirk CMN-650's CSU NRDY behaviour arm_mpam: Add workaround for T241-MPAM-6 arm_mpam: Add workaround for T241-MPAM-4 arm_mpam: Add workaround for T241-MPAM-1 arm_mpam: Add quirk framework arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl arm64: mpam: Select ARCH_HAS_CPU_RESCTRL arm_mpam: resctrl: Add empty definitions for assorted resctrl functions arm_mpam: resctrl: Update the rmid reallocation limit arm_mpam: resctrl: Add resctrl_arch_rmid_read() arm_mpam: resctrl: Allow resctrl to allocate monitors arm_mpam: resctrl: Add support for csu counters arm_mpam: resctrl: Add monitor initialisation and domain boilerplate arm_mpam: resctrl: Add kunit test for control format conversions arm_mpam: resctrl: Add support for 'MB' resource arm_mpam: resctrl: Wait for cacheinfo to be ready arm_mpam: resctrl: Add rmid index helpers arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats arm_mpam: resctrl: Hide CDP emulation behind CONFIG_EXPERT ... * for-next/hotplug-batched-tlbi: : arm64/mm: Enable batched TLB flush in unmap_hotplug_range() arm64/mm: Reject memory removal that splits a kernel leaf mapping arm64/mm: Enable batched TLB flush in unmap_hotplug_range() * for-next/bbml2-fixes: : Fixes for realm guest and BBML2_NOABORT arm64: mm: Remove pmd_sect() and pud_sect() arm64: mm: Handle invalid large leaf mappings correctly arm64: mm: Fix rodata=full block mapping support for realm guests * for-next/sysreg: : arm64 sysreg updates arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06 * for-next/generic-entry: : More arm64 refactoring towards using the generic entry code arm64: Check DAIF (and PMR) at task-switch time arm64: entry: Use split preemption logic arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode() arm64: entry: Consistently prefix arm64-specific wrappers arm64: entry: Don't preempt with SError or Debug masked entry: Split preemption from irqentry_exit_to_kernel_mode() entry: Split kernel mode logic from irqentry_{enter,exit}() entry: Move irqentry_enter() prototype later entry: Remove local_irq_{enable,disable}_exit_to_user() entry: Fix stale comment for irqentry_enter() * for-next/acpi: : arm64 ACPI updates ACPI: AGDI: fix missing newline in error message	2026-04-10 14:22:24 +01:00
Marc Zyngier	7e629348df	KVM: arm64: Advertise ID_AA64PFR2_EL1.GCIE As we are missing ID_AA64PFR2_EL1.GCIE from the kernel feature set, userspace cannot write ID_AA64PFR2_EL1 with GCIE set, even if we are on a GICv5 host. Add the required field description. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://patch.msgid.link/20260401170017.369529-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-05 08:15:35 +01:00
Ben Horgan	37fe0f984d	arm64: mpam: Initialise and context switch the MPAMSM_EL1 register The MPAMSM_EL1 sets the MPAM labels, PMG and PARTID, for loads and stores generated by a shared SMCU. Disable the traps so the kernel can use it and set it to the same configuration as the per-EL cpu MPAM configuration. If an SMCU is not shared with other cpus then it is implementation defined whether the configuration from MPAMSM_EL1 is used or that from the appropriate MPAMy_ELx. As we set the same, PMG_D and PARTID_D, configuration for MPAM0_EL1, MPAM1_EL1 and MPAMSM_EL1 the resulting configuration is the same regardless. The range of valid configurations for the PARTID and PMG in MPAMSM_EL1 is not currently specified in Arm Architectural Reference Manual but the architect has confirmed that it is intended to be the same as that for the cpu configuration in the MPAMy_ELx registers. Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:29:02 +00:00
James Morse	87b78a5d70	arm64: mpam: Re-initialise MPAM regs when CPU comes online Now that the MPAM system registers are expected to have values that change, reprogram them based on the previous value when a CPU is brought online. Previously MPAM's 'default PARTID' of 0 was always used for MPAM in kernel-space as this is the PARTID that hardware guarantees to reset. Because there are a limited number of PARTID, this value is exposed to user-space, meaning resctrl changes to the resctrl default group would also affect kernel threads. Instead, use the task's PARTID value for kernel work on behalf of user-space too. The default of 0 is kept for both user-space and kernel-space when MPAM is not enabled. Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Tested-by: Jesse Chick <jessechick@os.amperecomputing.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Co-developed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: James Morse <james.morse@arm.com>	2026-03-27 15:28:25 +00:00
Yeoreum Yun	7181f718cb	arm64: cpufeature: Add FEAT_LSUI Since Armv9.6, FEAT_LSUI introduces atomic instructions that allow privileged code to access user memory without clearing the PSTATE.PAN bit. Add CPU feature detection for FEAT_LSUI. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> [catalin.marinas@arm.com: Remove commit log references to SW_PAN] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-03-26 18:19:07 +00:00
James Clark	15ed3fa23c	arm64: cpufeature: Use pmuv3_implemented() function Other places that are doing this version comparison are already using pmuv3_implemented(), so might as well use it here too for consistency. Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Will Deacon <will@kernel.org>	2026-03-24 12:33:49 +00:00
James Clark	d1dcc20bcc	arm64: cpufeature: Make PMUVer and PerfMon unsigned On the host, this change doesn't make a difference because the fields are defined as FTR_EXACT. However, KVM allows userspace to set these fields for a guest and overrides the type to be FTR_LOWER_SAFE. And while KVM used to do an unsigned comparison to validate that the new value is lower than what the hardware provides, since the linked commit it uses the generic sanitization framework which does a signed comparison. Fix it by defining these fields as unsigned. In theory, without this fix, userspace could set a higher PMU version than the hardware supports by providing any value with the top bit set. Fixes: `c118cead07` ("KVM: arm64: Use generic sanitisation for ID_(AA64)DFR0_EL1") Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Will Deacon <will@kernel.org>	2026-03-24 12:33:49 +00:00
Marc Zyngier	a79f7b4aeb	KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug Hotplugging a CPU off and back on fails with pKVM, as we try to probe for ICH_VTR_EL2.TDS. In a non-VHE setup, this is achieved by using an EL2 stub helper. However, the stubs are out of reach once pKVM has deprivileged the kernel. The CPU never boots. Since pKVM doesn't allow late onlining of CPUs, we can detect that protected mode is enforced early on, and return the current state of the capability. Fixes: `2a28810cbb` ("KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trapping") Reported-by: Vincent Donnefort <vdonnefort@google.com> Tested-by: Vincent Donnefort <vdonnefort@google.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://patch.msgid.link/20260310085433.3936742-1-maz@kernel.org Cc: stable@vger.kernel.org	2026-03-10 09:48:45 +00:00
Linus Torvalds	cb5573868e	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "Loongarch: - Add more CPUCFG mask bits - Improve feature detection - Add lazy load support for FPU and binary translation (LBT) register state - Fix return value for memory reads from and writes to in-kernel devices - Add support for detecting preemption from within a guest - Add KVM steal time test case to tools/selftests ARM: - Add support for FEAT_IDST, allowing ID registers that are not implemented to be reported as a normal trap rather than as an UNDEF exception - Add sanitisation of the VTCR_EL2 register, fixing a number of UXN/PXN/XN bugs in the process - Full handling of RESx bits, instead of only RES0, and resulting in SCTLR_EL2 being added to the list of sanitised registers - More pKVM fixes for features that are not supposed to be exposed to guests - Make sure that MTE being disabled on the pKVM host doesn't give it the ability to attack the hypervisor - Allow pKVM's host stage-2 mappings to use the Force Write Back version of the memory attributes by using the "pass-through' encoding - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the guest - Preliminary work for guest GICv5 support - A bunch of debugfs fixes, removing pointless custom iterators stored in guest data structures - A small set of FPSIMD cleanups - Selftest fixes addressing the incorrect alignment of page allocation - Other assorted low-impact fixes and spelling fixes RISC-V: - Fixes for issues discoverd by KVM API fuzzing in kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and kvm_riscv_vcpu_aia_imsic_update() - Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM - Transparent huge page support for hypervisor page tables - Adjust the number of available guest irq files based on MMIO register sizes found in the device tree or the ACPI tables - Add RISC-V specific paging modes to KVM selftests - Detect paging mode at runtime for selftests s390: - Performance improvement for vSIE (aka nested virtualization) - Completely new memory management. s390 was a special snowflake that enlisted help from the architecture's page table management to build hypervisor page tables, in particular enabling sharing the last level of page tables. This however was a lot of code (~3K lines) in order to support KVM, and also blocked several features. The biggest advantages is that the page size of userspace is completely independent of the page size used by the guest: userspace can mix normal pages, THPs and hugetlbfs as it sees fit, and in fact transparent hugepages were not possible before. It's also now possible to have nested guests and guests with huge pages running on the same host - Maintainership change for s390 vfio-pci - Small quality of life improvement for protected guests x86: - Add support for giving the guest full ownership of PMU hardware (contexted switched around the fastpath run loop) and allowing direct access to data MSRs and PMCs (restricted by the vPMU model). KVM still intercepts access to control registers, e.g. to enforce event filtering and to prevent the guest from profiling sensitive host state. This is more accurate, since it has no risk of contention and thus dropped events, and also has significantly less overhead. For more information, see the commit message for merge commit `bf2c3138ae` ("Merge tag 'kvm-x86-pmu-6.20' ...") - Disallow changing the virtual CPU model if L2 is active, for all the same reasons KVM disallows change the model after the first KVM_RUN - Fix a bug where KVM would incorrectly reject host accesses to PV MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled, even if those were advertised as supported to userspace, - Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs, where KVM would attempt to read CR3 configuring an async #PF entry - Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM (for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL. Only a few exports that are intended for external usage, and those are allowed explicitly - When checking nested events after a vCPU is unblocked, ignore -EBUSY instead of WARNing. Userspace can sometimes put the vCPU into what should be an impossible state, and spurious exit to userspace on -EBUSY does not really do anything to solve the issue - Also throw in the towel and drop the WARN on INIT/SIPI being blocked when vCPU is in Wait-For-SIPI, which also resulted in playing whack-a-mole with syzkaller stuffing architecturally impossible states into KVM - Add support for new Intel instructions that don't require anything beyond enumerating feature flags to userspace - Grab SRCU when reading PDPTRs in KVM_GET_SREGS2 - Add WARNs to guard against modifying KVM's CPU caps outside of the intended setup flow, as nested VMX in particular is sensitive to unexpected changes in KVM's golden configuration - Add a quirk to allow userspace to opt-in to actually suppress EOI broadcasts when the suppression feature is enabled by the guest (currently limited to split IRQCHIP, i.e. userspace I/O APIC). Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an option as some userspaces have come to rely on KVM's buggy behavior (KVM advertises Supress EOI Broadcast irrespective of whether or not userspace I/O APIC supports Directed EOIs) - Clean up KVM's handling of marking mapped vCPU pages dirty - Drop a pile of ancient sanity checks hidden behind in KVM's unused ASSERT() macro, most of which could be trivially triggered by the guest and/or user, and all of which were useless - Fold "struct dest_map" into its sole user, "struct rtc_status", to make it more obvious what the weird parameter is used for, and to allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n - Bury all of ioapic.h, i8254.h and related ioctls (including KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y - Add a regression test for recent APICv update fixes - Handle "hardware APIC ISR", a.k.a. SVI, updates in kvm_apic_update_apicv() to consolidate the updates, and to co-locate SVI updates with the updates for KVM's own cache of ISR information - Drop a dead function declaration - Minor cleanups x86 (Intel): - Rework KVM's handling of VMCS updates while L2 is active to temporarily switch to vmcs01 instead of deferring the update until the next nested VM-Exit. The deferred updates approach directly contributed to several bugs, was proving to be a maintenance burden due to the difficulty in auditing the correctness of deferred updates, and was polluting "struct nested_vmx" with a growing pile of booleans - Fix an SGX bug where KVM would incorrectly try to handle EPCM page faults, and instead always reflect them into the guest. Since KVM doesn't shadow EPCM entries, EPCM violations cannot be due to KVM interference and can't be resolved by KVM - Fix a bug where KVM would register its posted interrupt wakeup handler even if loading kvm-intel.ko ultimately failed - Disallow access to vmcb12 fields that aren't fully supported, mostly to avoid weirdness and complexity for FRED and other features, where KVM wants enable VMCS shadowing for fields that conditionally exist - Print out the "bad" offsets and values if kvm-intel.ko refuses to load (or refuses to online a CPU) due to a VMCS config mismatch x86 (AMD): - Drop a user-triggerable WARN on nested_svm_load_cr3() failure - Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS relies on an upcoming, publicly announced change in the APM to reduce the set of conditions where hardware (i.e. KVM) must flush the RAP - Ignore nSVM intercepts for instructions that are not supported according to L1's virtual CPU model - Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath for EPT Misconfig - Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM, and allow userspace to restore nested state with GIF=0 - Treat exit_code as an unsigned 64-bit value through all of KVM - Add support for fetching SNP certificates from userspace - Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD or VMSAVE on behalf of L2 - Misc fixes and cleanups x86 selftests: - Add a regression test for TPR<=>CR8 synchronization and IRQ masking - Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support, and extend x86's infrastructure to support EPT and NPT (for L2 guests) - Extend several nested VMX tests to also cover nested SVM - Add a selftest for nested VMLOAD/VMSAVE - Rework the nested dirty log test, originally added as a regression test for PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage and to hopefully make the test easier to understand and maintain guest_memfd: - Remove kvm_gmem_populate()'s preparation tracking and half-baked hugepage handling. SEV/SNP was the only user of the tracking and it can do it via the RMP - Retroactively document and enforce (for SNP) that KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the source page to be 4KiB aligned, to avoid non-trivial complexity for something that no known VMM seems to be doing and to avoid an API special case for in-place conversion, which simply can't support unaligned sources - When populating guest_memfd memory, GUP the source page in common code and pass the refcounted page to the vendor callback, instead of letting vendor code do the heavy lifting. Doing so avoids a looming deadlock bug with in-place due an AB-BA conflict betwee mmap_lock and guest_memfd's filemap invalidate lock Generic: - Fix a bug where KVM would ignore the vCPU's selected address space when creating a vCPU-specific mapping of guest memory. Actually this bug could not be hit even on x86, the only architecture with multiple address spaces, but it's a bug nevertheless" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (267 commits) KVM: s390: Increase permitted SE header size to 1 MiB MAINTAINERS: Replace backup for s390 vfio-pci KVM: s390: vsie: Fix race in acquire_gmap_shadow() KVM: s390: vsie: Fix race in walk_guest_tables() KVM: s390: Use guest address to mark guest page dirty irqchip/riscv-imsic: Adjust the number of available guest irq files RISC-V: KVM: Transparent huge page support RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test RISC-V: KVM: Allow Zalasr extensions for Guest/VM KVM: riscv: selftests: Add riscv vm satp modes KVM: riscv: selftests: add Zilsd and Zclsd extension to get-reg-list test riscv: KVM: allow Zilsd and Zclsd extensions for Guest/VM RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr() RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr() RISC-V: KVM: Remove unnecessary 'ret' assignment KVM: s390: Add explicit padding to struct kvm_s390_keyop KVM: LoongArch: selftests: Add steal time test case LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side LoongArch: KVM: Add paravirt preempt feature in hypervisor side ...	2026-02-13 11:31:15 -08:00
Linus Torvalds	45bf4bc87c	Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "There's a little less than normal, probably due to LPC & Christmas/New Year meaning that a few series weren't quite ready or reviewed in time. It's still useful across the board, despite the only real feature being support for the LS64 feature enabling 64-byte atomic accesses to endpoints that support it. ACPI: - Add interrupt signalling support to the AGDI handler - Add Catalin and myself to the arm64 ACPI MAINTAINERS entry CPU features: - Drop Kconfig options for PAN and LSE (these are detected at runtime) - Add support for 64-byte single-copy atomic instructions (LS64/LS64V) - Reduce MTE overhead when executing in the kernel on Ampere CPUs - Ensure POR_EL0 value exposed via ptrace is up-to-date - Fix error handling on GCS allocation failure CPU frequency: - Add CPU hotplug support to the FIE setup in the AMU driver Entry code: - Minor optimisations and cleanups to the syscall entry path - Preparatory rework for moving to the generic syscall entry code Hardware errata: - Work around Spectre-BHB on TSV110 processors - Work around broken CMO propagation on some systems with the SI-L1 interconnect Miscellaneous: - Disable branch profiling for arch/arm64/ to avoid issues with noinstr - Minor fixes and cleanups (kexec + ubsan, WARN_ONCE() instead of WARN_ON(), reduction of boolean expression) - Fix custom __READ_ONCE() implementation for LTO builds when operating on non-atomic types Perf and PMUs: - Support for CMN-600AE - Be stricter about supported hardware in the CMN driver - Support for DSU-110 and DSU-120 - Support for the cycles event in the DSU driver (alongside the dedicated cycles counter) - Use IRQF_NO_THREAD instead of IRQF_ONESHOT in the cxlpmu driver - Use !bitmap_empty() as a faster alternative to bitmap_weight() - Fix SPE error handling when failing to resume profiling Selftests: - Add support for the FORCE_TARGETS option to the arm64 kselftests - Avoid nolibc-specific my_syscall() function - Add basic test for the LS64 HWCAP - Extend fp-pidbench to cover additional workload patterns" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (43 commits) perf/arm-cmn: Reject unsupported hardware configurations perf: arm_spe: Properly set hw.state on failures arm64/gcs: Fix error handling in arch_set_shadow_stack_status() arm64: Fix non-atomic __READ_ONCE() with CONFIG_LTO=y arm64: poe: fix stale POR_EL0 values for ptrace kselftest/arm64: Raise default number of loops in fp-pidbench kselftest/arm64: Add a no-SVE loop after SVE in fp-pidbench perf/cxlpmu: Replace IRQF_ONESHOT with IRQF_NO_THREAD arm64: mte: Set TCMA1 whenever MTE is present in the kernel arm64/ptrace: Return early for ptrace_report_syscall_entry() error arm64/ptrace: Split report_syscall() arm64: Remove unused _TIF_WORK_MASK kselftest/arm64: Add missing file in .gitignore arm64: errata: Workaround for SI L1 downstream coherency issue kselftest/arm64: Add HWCAP test for FEAT_LS64 arm64: Add support for FEAT_{LS64, LS64_V} KVM: arm64: Enable FEAT_{LS64, LS64_V} in the supported guest arm64: Provide basic EL2 setup for FEAT_{LS64, LS64_V} usage at EL0/1 KVM: arm64: Handle DABT caused by LS64* instructions on unsupported memory KVM: arm64: Add documentation for KVM_EXIT_ARM_LDST64B ...	2026-02-09 20:28:45 -08:00
Marc Zyngier	1c880ea3ef	Merge branch kvm-arm64/gicv3-tdir-fixes into kvmarm-master/next * kvm-arm64/gicv3-tdir-fixes: : . : Address two trapping-related issues when running legacy (i.e. GICv3) : guests on GICv5 hosts, courtesy of Sascha Bischoff. : . KVM: arm64: Correct test for ICH_HCR_EL2_TDIR cap for GICv5 hosts KVM: arm64: gic: Enable GICv3 CPUIF trapping on GICv5 hosts if required Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-02-05 09:17:04 +00:00
Frederic Weisbecker	79c0abaf06	sched/arm64: Move fallback task cpumask to HK_TYPE_DOMAIN When none of the allowed CPUs of a task are online, it gets migrated to the fallback cpumask which is all the non nohz_full CPUs. However just like nohz_full CPUs, domain isolated CPUs don't want to be disturbed by tasks that have lost their CPU affinities. And since nohz_full rely on domain isolation to work correctly, the housekeeping mask of domain isolated CPUs should always be a subset of the housekeeping mask of nohz_full CPUs (there can be CPUs that are domain isolated but not nohz_full, OTOH there shouldn't be nohz_full CPUs that are not domain isolated): HK_TYPE_DOMAIN & HK_TYPE_KERNEL_NOISE == HK_TYPE_DOMAIN Therefore use HK_TYPE_DOMAIN as the appropriate fallback target for tasks. Note that cpuset isolated partitions are not supported on those systems and may result in undefined behaviour. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Acked-by: Will Deacon <will@kernel.org> Tested-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linux-arm-kernel@lists.infradead.org	2026-02-03 15:23:35 +01:00
Sascha Bischoff	28e505d817	KVM: arm64: Correct test for ICH_HCR_EL2_TDIR cap for GICv5 hosts The original order of checks in the ICH_HCR_EL2_TDIR test returned with false early in the case where the native GICv3 CPUIF was not present. The result was that on GICv5 hosts with legacy support - which do not have the GICv3 CPUIF - the test always returned false. Reshuffle the checks such that support for GICv5 legacy is checked prior to checking for the native GICv3 CPUIF. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Fixes: `2a28810cbb` ("KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trapping") Link: https://patch.msgid.link/20251208152724.3637157-4-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-01-27 14:06:26 +00:00
Yicong Yang	58ce78667a	arm64: Add support for FEAT_{LS64, LS64_V} Armv8.7 introduces single-copy atomic 64-byte loads and stores instructions and its variants named under FEAT_{LS64, LS64_V}. These features are identified by ID_AA64ISAR1_EL1.LS64 and the use of such instructions in userspace (EL0) can be trapped. As st64bv (FEAT_LS64_V) and st64bv0 (FEAT_LS64_ACCDATA) can not be tell apart, FEAT_LS64 and FEAT_LS64_ACCDATA which will be supported in later patch will be exported to userspace, FEAT_LS64_V will be enabled only in kernel. In order to support the use of corresponding instructions in userspace: - Make ID_AA64ISAR1_EL1.LS64 visbile to userspace - Add identifying and enabling in the cpufeature list - Expose these support of these features to userspace through HWCAP3 and cpuinfo ld64b/st64b (FEAT_LS64) and st64bv (FEAT_LS64_V) is intended for special memory (device memory) so requires support by the CPU, system and target memory location (device that support these instructions). The HWCAP3_LS64, implies the support of CPU and system (since no identification method from system, so SoC vendors should advertise support in the CPU if system also support them). Otherwise for ld64b/st64b the atomicity may not be guaranteed or a DABT will be generated, so users (probably userspace driver developer) should make sure the target memory (device) also have the support. For st64bv 0xffffffffffffffff will be returned as status result for unsupported memory so user should check it. Document the restrictions along with HWCAP3_LS64. Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>	2026-01-22 13:24:49 +00:00
Marc Zyngier	018a231b02	arm64: Unconditionally enable PAN support FEAT_PAN has been around since ARMv8.1 (over 11 years ago), has no compiler dependency (we have our own accessors), and is a great security benefit. Drop CONFIG_ARM64_PAN, and make the support unconditionnal. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>	2026-01-22 10:16:42 +00:00
Marc Zyngier	6191b25d8b	arm64: Unconditionally enable LSE support LSE atomics have been in the architecture since ARMv8.1 (released in 2014), and are hopefully supported by all modern toolchains. Drop the optional nature of LSE support in the kernel, and always compile the support in, as this really is very little code. LL/SC still is the default, and the switch to LSE is done dynamically. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>	2026-01-22 10:16:41 +00:00
Linus Torvalds	51d90a15fe	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "ARM: - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM - Minor fixes to KVM and selftests Loongarch: - Get VM PMU capability from HW GCFG register - Add AVEC basic support - Use 64-bit register definition for EIOINTC - Add KVM timer test cases for tools/selftests RISC/V: - SBI message passing (MPXY) support for KVM guest - Give a new, more specific error subcode for the case when in-kernel AIA virtualization fails to allocate IMSIC VS-file - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually in small chunks - Fix guest page fault within HLV* instructions - Flush VS-stage TLB after VCPU migration for Andes cores s390: - Always allocate ESCA (Extended System Control Area), instead of starting with the basic SCA and converting to ESCA with the addition of the 65th vCPU. The price is increased number of exits (and worse performance) on z10 and earlier processor; ESCA was introduced by z114/z196 in 2010 - VIRT_XFER_TO_GUEST_WORK support - Operation exception forwarding support - Cleanups x86: - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE caching is disabled, as there can't be any relevant SPTEs to zap - Relocate a misplaced export - Fix an async #PF bug where KVM would clear the completion queue when the guest transitioned in and out of paging mode, e.g. when handling an SMI and then returning to paged mode via RSM - Leave KVM's user-return notifier registered even when disabling virtualization, as long as kvm.ko is loaded. On reboot/shutdown, keeping the notifier registered is ok; the kernel does not use the MSRs and the callback will run cleanly and restore host MSRs if the CPU manages to return to userspace before the system goes down - Use the checked version of {get,put}_user() - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC timers can result in a hard lockup in the host - Revert the periodic kvmclock sync logic now that KVM doesn't use a clocksource that's subject to NTP corrections - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter behind CONFIG_CPU_MITIGATIONS - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast path; the only reason they were handled in the fast path was to paper of a bug in the core #MC code, and that has long since been fixed - Add emulator support for AVX MOV instructions, to play nice with emulated devices whose guest drivers like to access PCI BARs with large multi-byte instructions x86 (AMD): - Fix a few missing "VMCB dirty" bugs - Fix the worst of KVM's lack of EFER.LMSLE emulation - Add AVIC support for addressing 4k vCPUs in x2AVIC mode - Fix incorrect handling of selective CR0 writes when checking intercepts during emulation of L2 instructions - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on VMRUN and #VMEXIT - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft interrupt if the guest patched the underlying code after the VM-Exit, e.g. when Linux patches code with a temporary INT3 - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to userspace, and extend KVM "support" to all policy bits that don't require any actual support from KVM x86 (Intel): - Use the root role from kvm_mmu_page to construct EPTPs instead of the current vCPU state, partly as worthwhile cleanup, but mostly to pave the way for tracking per-root TLB flushes, and elide EPT flushes on pCPU migration if the root is clean from a previous flush - Add a few missing nested consistency checks - Rip out support for doing "early" consistency checks via hardware as the functionality hasn't been used in years and is no longer useful in general; replace it with an off-by-default module param to WARN if hardware fails a check that KVM does not perform - Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32] on VM-Enter - Misc cleanups - Overhaul the TDX code to address systemic races where KVM (acting on behalf of userspace) could inadvertantly trigger lock contention in the TDX-Module; KVM was either working around these in weird, ugly ways, or was simply oblivious to them (though even Yan's devilish selftests could only break individual VMs, not the host kernel) - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a TDX vCPU, if creating said vCPU failed partway through - Fix a few sparse warnings (bad annotation, 0 != NULL) - Use struct_size() to simplify copying TDX capabilities to userspace - Fix a bug where TDX would effectively corrupt user-return MSR values if the TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected Selftests: - Fix a math goof in mmu_stress_test when running on a single-CPU system/VM - Forcefully override ARCH from x86_64 to x86 to play nice with specifying ARCH=x86_64 on the command line - Extend a bunch of nested VMX to validate nested SVM as well - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to verify KVM can save/restore nested VMX state when L1 is using 5-level paging, but L2 is not - Clean up the guest paging code in anticipation of sharing the core logic for nested EPT and nested NPT guest_memfd: - Add NUMA mempolicy support for guest_memfd, and clean up a variety of rough edges in guest_memfd along the way - Define a CLASS to automatically handle get+put when grabbing a guest_memfd from a memslot to make it harder to leak references - Enhance KVM selftests to make it easer to develop and debug selftests like those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs often result in hard-to-debug SIGBUS errors - Misc cleanups Generic: - Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for irqfd cleanup - Fix a goof in the dirty ring documentation - Fix choice of target for directed yield across different calls to kvm_vcpu_on_spin(); the function was always starting from the first vCPU instead of continuing the round-robin search" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits) KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions ...	2025-12-05 17:01:20 -08:00
Linus Torvalds	44fc84337b	Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "These are the arm64 updates for 6.19. The biggest part is the Arm MPAM driver under drivers/resctrl/. There's a patch touching mm/ to handle spurious faults for huge pmd (similar to the pte version). The corresponding arm64 part allows us to avoid the TLB maintenance if a (huge) page is reused after a write fault. There's EFI refactoring to allow runtime services with preemption enabled and the rest is the usual perf/PMU updates and several cleanups/typos. Summary: Core features: - Basic Arm MPAM (Memory system resource Partitioning And Monitoring) driver under drivers/resctrl/ which makes use of the fs/rectrl/ API Perf and PMU: - Avoid cycle counter on multi-threaded CPUs - Extend CSPMU device probing and add additional filtering support for NVIDIA implementations - Add support for the PMUs on the NoC S3 interconnect - Add additional compatible strings for new Cortex and C1 CPUs - Add support for data source filtering to the SPE driver - Add support for i.MX8QM and "DB" PMU in the imx PMU driver Memory managemennt: - Avoid broadcast TLBI if page reused in write fault - Elide TLB invalidation if the old PTE was not valid - Drop redundant cpu_set__tcr_t0sz() macros - Propagate pgtable_alloc() errors outside of __create_pgd_mapping() - Propagate return value from __change_memory_common() ACPI and EFI: - Call EFI runtime services without disabling preemption - Remove unused ACPI function Miscellaneous: - ptrace support to disable streaming on SME-only systems - Improve sysreg generation to include a 'Prefix' descriptor - Replace __ASSEMBLY__ with __ASSEMBLER__ - Align register dumps in the kselftest zt-test - Remove some no longer used macros/functions - Various spelling corrections" tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits) arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic arm64/pageattr: Propagate return value from __change_memory_common arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS KVM: arm64: selftests: Consider all 7 possible levels of cache KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros Documentation/arm64: Fix the typo of register names ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() perf: arm_spe: Add support for filtering on data source perf: Add perf_event_attr::config4 perf/imx_ddr: Add support for PMU in DB (system interconnects) perf/imx_ddr: Get and enable optional clks perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT arm64: mm: use untagged address to calculate page index MAINTAINERS: new entry for MPAM Driver arm_mpam: Add kunit tests for props_mismatch() arm_mpam: Add kunit test for bitmap reset arm_mpam: Add helper to reset saved mbwu state ...	2025-12-02 17:03:55 -08:00
Paolo Bonzini	f58e70cc31	Merge tag 'kvmarm-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.19 - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner. - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ. - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU. - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM. - Minor fixes to KVM and selftests	2025-12-02 18:36:26 +01:00
Oliver Upton	3eef0c83c3	Merge branch 'kvm-arm64/nv-xnx-haf' into kvmarm/next * kvm-arm64/nv-xnx-haf: (22 commits) : Support for FEAT_XNX and FEAT_HAF in nested : : Add support for a couple of MMU-related features that weren't : implemented by KVM's software page table walk: : : - FEAT_XNX: Allows the hypervisor to describe execute permissions : separately for EL0 and EL1 : : - FEAT_HAF: Hardware update of the Access Flag, which in the context of : nested means software walkers must also set the Access Flag. : : The series also adds some basic support for testing KVM's emulation of : the AT instruction, including the implementation detail that AT sets the : Access Flag in KVM. KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2 ... Signed-off-by: Oliver Upton <oupton@kernel.org>	2025-12-01 00:47:41 -08:00
Catalin Marinas	52c4d1d624	Merge branch 'for-next/sysreg' into for-next/core * for-next/sysreg: : arm64 sysreg updates/cleanups arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS KVM: arm64: selftests: Consider all 7 possible levels of cache KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user arm64/sysreg: Add ICH_VMCR_EL2 arm64/sysreg: Move generation of RES0/RES1/UNKN to function arm64/sysreg: Support feature-specific fields with 'Prefix' descriptor arm64/sysreg: Fix checks for incomplete sysreg definitions arm64/sysreg: Replace TCR_EL1 field macros	2025-11-28 15:47:53 +00:00
Marc Zyngier	64d67e7add	KVM: arm64: Convert ICH_HCR_EL2_TDIR cap to EARLY_LOCAL_CPU_FEATURE Suzuki notices that making the ICH_HCR_EL2_TDIR capability a system one isn't a very good idea, should we end-up with CPUs that have asymmetric TDIR support (somehow unlikely, but you never know what level of stupidity vendors are up to). For this hypothetical setup, making this an "EARLY_LOCAL_CPU_FEATURE" is a much better option. This is actually consistent with what we already do with GICv5 legacy interface, so flip the capability over. Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Fixes: `2a28810cbb` ("KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trapping") Link: https://lore.kernel.org/r/5df713d4-8b79-4456-8fd1-707ca89a61b6@arm.com Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://msgid.link/20251125160144.1086511-1-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>	2025-11-25 22:23:01 -08:00
Marc Zyngier	2a28810cbb	KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trapping A long time ago, an unsuspecting architect forgot to add a trap bit for ICV_DIR_EL1 in ICH_HCR_EL2. Which was unfortunate, but what's a bit of spec between friends? Thankfully, this was fixed in a later revision, and ARM "deprecates" the lack of trapping ability. Unfortuantely, a few (billion) CPUs went out with that defect, anything ARMv8.0 from ARM, give or take. And on these CPUs, you can't trap DIR on its own, full stop. As the next best thing, we can trap everything in the common group, which is a tad expensive, but hey ho, that's what you get. You can otherwise recycle the HW in the neaby bin. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-7-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>	2025-11-24 14:29:11 -08:00
Oliver Upton	dc31124379	arm64: Detect FEAT_XNX Detect the feature in anticipation of using it in KVM. Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251124190158.177318-2-oupton@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>	2025-11-24 14:24:44 -08:00
Anshuman Khandual	b0a3f0e894	arm64/sysreg: Replace TCR_EL1 field macros This just replaces all used TCR_EL1 field macros with tools sysreg variant based fields and subsequently drops them from the header (pgtable-hwdef.h), although while retaining the ones used for KVM (represented via the sysreg tools format). Cc: Will Deacon <will@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 15:58:30 +00:00
mrigendrachaubey	96ac403ea2	arm64: Fix typos and spelling errors in comments This patch corrects several minor typographical and spelling errors in comments across multiple arm64 source files. No functional changes. Signed-off-by: mrigendrachaubey <mrigendra.chaubey@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-12 17:06:21 +00:00
shechenglong	7f16357378	arm64: proton-pack: Fix hard lockup due to print in scheduler context Relocate the printk() calls from spectre_v4_mitigations_off() and spectre_v2_mitigations_off() into setup_system_capabilities() function, preventing hard lockups caused by printk calls in scheduler context: \| _raw_spin_lock_nested+168 \| ttwu_queue+180 （rq_lock(rq, &rf); 2nd acquiring the rq->__lock） \| try_to_wake_up+548 \| wake_up_process+32 \| __up+88 \| up+100 \| __up_console_sem+96 \| console_unlock+696 \| vprintk_emit+428 \| vprintk_default+64 \| vprintk_func+220 \| printk+104 \| spectre_v4_enable_task_mitigation+344 \| __switch_to+100 \| __schedule+1028 (rq_lock(rq, &rf); 1st acquiring the rq->__lock) \| schedule_idle+48 \| do_idle+388 \| cpu_startup_entry+44 \| secondary_start_kernel+352 Suggested-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: shechenglong <shechenglong@xfusion.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-07 14:49:12 +00:00
Linus Torvalds	971199ad2a	Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Preserve old 'tt_core' UAPI for Hisilicon L3C PMU driver - Ensure linear alias of kprobes instruction page is not writable - Fix kernel stack unwinding from BPF - Fix build warnings from the Fujitsu uncore PMU documentation - Fix hang with deferred 'struct page' initialisation and MTE - Consolidate KPTI page-table re-writing code * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mte: Do not flag the zero page as PG_mte_tagged docs: perf: Fujitsu: Fix htmldocs build warnings and errors arm64: mm: Move KPTI helpers to mmu.c tracing: Fix the bug where bpf_get_stackid returns -EFAULT on the ARM64 arm64: kprobes: call set_memory_rox() for kprobe page drivers/perf: hisi: Add tt_core_deprecated for compatibility	2025-10-07 08:59:25 -07:00
Linus Torvalds	f3826aa996	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "This excludes the bulk of the x86 changes, which I will send separately. They have two not complex but relatively unusual conflicts so I will wait for other dust to settle. guest_memfd: - Add support for host userspace mapping of guest_memfd-backed memory for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have no way to detect private vs. shared). This lays the groundwork for removal of guest memory from the kernel direct map, as well as for limited mmap() for guest_memfd-backed memory. For more information see: - commit `a6ad54137a` ("Merge branch 'guest-memfd-mmap' into HEAD") - guest_memfd in Firecracker: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding - direct map removal: https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/ - mmap support: https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/ ARM: - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature. LoongArch: - Detect page table walk feature on new hardware - Add sign extension with kernel MMIO/IOCSR emulation - Improve in-kernel IPI emulation - Improve in-kernel PCH-PIC emulation - Move kvm_iocsr tracepoint out of generic code RISC-V: - Added SBI FWFT extension for Guest/VM with misaligned delegation and pointer masking PMLEN features - Added ONE_REG interface for SBI FWFT extension - Added Zicbop and bfloat16 extensions for Guest/VM - Enabled more common KVM selftests for RISC-V - Added SBI v3.0 PMU enhancements in KVM and perf driver s390: - Improve interrupt cpu for wakeup, in particular the heuristic to decide which vCPU to deliver a floating interrupt to. - Clear the PTE when discarding a swapped page because of CMMA; this bug was introduced in 6.16 when refactoring gmap code. x86 selftests: - Add #DE coverage in the fastops test (the only exception that's guest- triggerable in fastop-emulated instructions). - Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra Forest (SRF) and Clearwater Forest (CWF). - Minor cleanups and improvements x86 (guest side): - For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping could map devices as WB and prevent the device drivers from mapping their devices with UC/UC-. - Make kvm_async_pf_task_wake() a local static helper and remove its export. - Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU bindings even when PV_UNHALT is unsupported. Generic: - Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as __GFP_NOWARN is now included in GFP_NOWAIT. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits) KVM: s390: Fix to clear PTE when discarding a swapped page KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs KVM: arm64: selftests: Cope with arch silliness in EL2 selftest KVM: arm64: selftests: Add basic test for running in VHE EL2 KVM: arm64: selftests: Enable EL2 by default KVM: arm64: selftests: Initialize HCR_EL2 KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2 KVM: arm64: selftests: Select SMCCC conduit based on current EL KVM: arm64: selftests: Provide helper for getting default vCPU target KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts KVM: arm64: selftests: Create a VGICv3 for 'default' VMs KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation KVM: arm64: selftests: Add helper to check for VGICv3 support KVM: arm64: selftests: Initialize VGICv3 only once KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code KVM: selftests: Add ex_str() to print human friendly name of exception vectors selftests/kvm: remove stale TODO in xapic_state_test KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount ...	2025-10-04 08:52:16 -07:00
Catalin Marinas	f620d66af3	arm64: mte: Do not flag the zero page as PG_mte_tagged Commit `68d54ceeec` ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page") attempted to fix ptrace() reading of tags from the zero page by marking it as PG_mte_tagged during cpu_enable_mte(). The same commit also changed the ptrace() tag access permission check to the VM_MTE vma flag while turning the page flag test into a WARN_ON_ONCE(). Attempting to set the PG_mte_tagged flag early with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled may either hang (after commit `d77e59a8fc` "arm64: mte: Lock a page for MTE tag initialisation") or have the flags cleared later during page_alloc_init_late(). In addition, pages_identical() -> memcmp_pages() will reject any comparison with the zero page as it is marked as tagged. Partially revert the above commit to avoid setting PG_mte_tagged on the zero page. Update the __access_remote_tags() warning on untagged pages to ignore the zero page since it is known to have the tags initialised. Note that all user mapping of the zero page are marked as pte_special(). The arm64 set_pte_at() will not call mte_sync_tags() on such pages, so PG_mte_tagged will remain cleared. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Fixes: `68d54ceeec` ("arm64: mte: Allow PTRACE_PEEKMTETAGS access to the zero page") Reported-by: Gergely Kovacs <Gergely.Kovacs2@arm.com> Cc: stable@vger.kernel.org # 5.10.x Cc: Will Deacon <will@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lance Yang <lance.yang@linux.dev> Acked-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: David Hildenbrand <david@redhat.com> Tested-by: Lance Yang <lance.yang@linux.dev> Signed-off-by: Will Deacon <will@kernel.org>	2025-10-03 15:27:34 +01:00
Linus Torvalds	8804d970fa	Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation - "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps - "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code - "mm: vm_normal_page() improvements" from David Hildenbrand provides code cleanup in the pagemap code - "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code - "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc - "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path - "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code - "test that rmap behaves as expected" from Wei Yang adds some rmap selftests - "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues - "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks - "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem - "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/ - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c - "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation - "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc) - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code - "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages() - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver - "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code - "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature - "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code - "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON - "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits) mm: swap: check for stable address space before operating on the VMA mm: convert folio_page() back to a macro mm/khugepaged: use start_addr/addr for improved readability hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list alloc_tag: fix boot failure due to NULL pointer dereference mm: silence data-race in update_hiwater_rss mm/memory-failure: don't select MEMORY_ISOLATION mm/khugepaged: remove definition of struct khugepaged_mm_slot mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL hugetlb: increase number of reserving hugepages via cmdline selftests/mm: add fork inheritance test for ksm_merging_pages counter mm/ksm: fix incorrect KSM counter handling in mm_struct during fork drivers/base/node: fix double free in register_one_node() mm: remove PMD alignment constraint in execmem_vmalloc() mm/memory_hotplug: fix typo 'esecially' -> 'especially' mm/rmap: improve mlock tracking for large folios mm/filemap: map entire large folio faultaround mm/fault: try to map the entire file folio in finish_fault() mm/rmap: mlock large folios in try_to_unmap_one() mm/rmap: fix a mlock race condition in folio_referenced_one() ...	2025-10-02 18:18:33 -07:00
Linus Torvalds	feafee2845	Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "There's good stuff across the board, including some nice mm improvements for CPUs with the 'noabort' BBML2 feature and a clever patch to allow ptdump to play nicely with block mappings in the vmalloc area. Confidential computing: - Add support for accepting secrets from firmware (e.g. ACPI CCEL) and mapping them with appropriate attributes. CPU features: - Advertise atomic floating-point instructions to userspace - Extend Spectre workarounds to cover additional Arm CPU variants - Extend list of CPUs that support break-before-make level 2 and guarantee not to generate TLB conflict aborts for changes of mapping granularity (BBML2_NOABORT) - Add GCS support to our uprobes implementation. Documentation: - Remove bogus SME documentation concerning register state when entering/exiting streaming mode. Entry code: - Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY) - Micro-optimise syscall entry path with a compiler branch hint. Memory management: - Enable huge mappings in vmalloc space even when kernel page-table dumping is enabled - Tidy up the types used in our early MMU setup code - Rework rodata= for closer parity with the behaviour on x86 - For CPUs implementing BBML2_NOABORT, utilise block mappings in the linear map even when rodata= applies to virtual aliases - Don't re-allocate the virtual region between '_text' and '_stext', as doing so confused tools parsing /proc/vmcore. Miscellaneous: - Clean-up Kconfig menuconfig text for architecture features - Avoid redundant bitmap_empty() during determination of supported SME vector lengths - Re-enable warnings when building the 32-bit vDSO object - Avoid breaking our eggs at the wrong end. Perf and PMUs: - Support for v3 of the Hisilicon L3C PMU - Support for Hisilicon's MN and NoC PMUs - Support for Fujitsu's Uncore PMU - Support for SPE's extended event filtering feature - Preparatory work to enable data source filtering in SPE - Support for multiple lanes in the DWC PCIe PMU - Support for i.MX94 in the IMX DDR PMU driver - MAINTAINERS update (Thank you, Yicong) - Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets). Selftests: - Add basic LSFE check to the existing hwcaps test - Support nolibc in GCS tests - Extend SVE ptrace test to pass unsupported regsets and invalid vector lengths - Minor cleanups (typos, cosmetic changes). System registers: - Fix ID_PFR1_EL1 definition - Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1 - Sync TCR_EL1 definition with the latest Arm ARM (L.b) - Be stricter about the input fed into our AWK sysreg generator script - Typo fixes and removal of redundant definitions. ACPI, EFI and PSCI: - Decouple Arm's "Software Delegated Exception Interface" (SDEI) support from the ACPI GHES code so that it can be used by platforms booted with device-tree - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI runtime calls - Fix a node refcount imbalance in the PSCI device-tree code. CPU Features: - Ensure register sanitisation is applied to fields in ID_AA64MMFR4 - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests can reliably query the underlying CPU types from the VMM - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes to our context-switching, signal handling and ptrace code" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits) arm64: cpufeature: Remove duplicate asm/mmu.h header arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN perf/dwc_pcie: Fix use of uninitialized variable arm/syscalls: mark syscall invocation as likely in invoke_syscall Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU Documentation: hisi-pmu: Fix of minor format error drivers/perf: hisi: Add support for L3C PMU v3 drivers/perf: hisi: Refactor the event configuration of L3C PMU drivers/perf: hisi: Extend the field of tt_core drivers/perf: hisi: Extract the event filter check of L3C PMU drivers/perf: hisi: Simplify the probe process of each L3C PMU version drivers/perf: hisi: Export hisi_uncore_pmu_isr() drivers/perf: hisi: Relax the event ID check in the framework perf: Fujitsu: Add the Uncore PMU driver arm64: map [_text, _stext) virtual address range non-executable+read-only arm64/sysreg: Update TCR_EL1 register arm64: Enable vmalloc-huge with ptdump arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list arm64: errata: Apply workarounds for Neoverse-V3AE arm64: cputype: Add Neoverse-V3AE definitions ...	2025-09-29 18:48:39 -07:00
Kevin Brodsky	200b0d2508	arm64: mm: Move KPTI helpers to mmu.c create_kpti_ng_temp_pgd() is currently defined (as an alias) in mmu.c without matching declaration in a header; instead cpufeature.c makes its own declaration. This is clearly not pretty, and as commit `ceca927c86` ("arm64: mm: Fix CFI failure due to kpti_ng_pgd_alloc function signature") showed, it also makes it very easy for the prototypes to go out of sync. All this would be much simpler if kpti_install_ng_mappings() and associated functions lived in mmu.c, where they logically belong. This is what this patch does: - Move kpti_install_ng_mappings() and associated functions from cpufeature.c to mmu.c, add a declaration to <asm/mmu.h> - Remove create_kpti_ng_temp_pgd() and just call __create_pgd_mapping_locked() directly instead - Mark all these functions __init - Move __initdata after kpti_ng_temp_alloc (as suggested by checkpatch) Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> [will: Fix conflicts with init_idmap_kpti_bbml2_flag()] Signed-off-by: Will Deacon <will@kernel.org>	2025-09-25 14:47:19 +01:00
Jiapeng Chong	ea0b39168d	arm64: cpufeature: Remove duplicate asm/mmu.h header ./arch/arm64/kernel/cpufeature.c: asm/mmu.h is included more than once. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-24 16:44:44 +01:00
Will Deacon	77dfca70ba	Merge branch 'for-next/mm' into for-next/core * for-next/mm: arm64: map [_text, _stext) virtual address range non-executable+read-only arm64: Enable vmalloc-huge with ptdump arm64: mm: split linear mapping if BBML2 unsupported on secondary CPUs arm64: mm: support large block mapping when rodata=full arm64: Enable permission change on arm64 kernel block mappings arm64/Kconfig: Remove CONFIG_RODATA_FULL_DEFAULT_ENABLED arm64: mm: Rework the 'rodata=' options arm64: mm: Represent physical memory with phys_addr_t and resource_size_t arm64: mm: Make map_fdt() return mapped pointer arm64: mm: Cast start/end markers to char *, not u64	2025-09-24 16:34:34 +01:00
Will Deacon	3d751c56c9	Merge branch 'for-next/cpufeature' into for-next/core * for-next/cpufeature: arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list arm64: errata: Apply workarounds for Neoverse-V3AE arm64: cputype: Add Neoverse-V3AE definitions arm64: cpufeature: add AmpereOne to BBML2 allow list arm64: cpufeature: Add Olympus MIDR to BBML2 allow list arm64: cputype: Add NVIDIA Olympus definitions arm64: cputype: Remove duplicate Cortex-X1C definitions arm64: errata: Expand speculative SSBS workaround for Cortex-A720AE arm64: cputype: Add Cortex-A720AE definitions arm64/hwcap: Add hwcap for FEAT_LSFE	2025-09-24 16:33:53 +01:00
Ryan Roberts	8fca3852e3	arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list Neoverse-V3AE advertises support for BBML2 and is known to not raise conflict aborts. So add it to the BBML2_NOABORT allow list. However, just like Neoverse-V3, Neoverse-V3AE r0p0 and r0p1 suffer from erratum #3053180, for which the workaround is to always observe break-before-make requirements for affected revisions. Therefore only add to the allow list from r0p2 onwards. For more details see Software Developer Errata Notice (SDEN) document: Neoverse V3AE (MP172) SDEN v9.0, erratum 3053180 https://developer.arm.com/documentation/SDEN-2615521/9-0/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-22 11:27:14 +01:00
Yeoreum Yun	31d8edb535	kasan/hw-tags: introduce kasan.write_only option Patch series "introduce kasan.write_only option in hw-tags", v8. Hardware tag based KASAN is implemented using the Memory Tagging Extension (MTE) feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI (Top Byte Ignore) feature and allows software to access a 4-bit allocation tag for each 16-byte granule in the physical address space. A logical tag is derived from bits 59-56 of the virtual address used for the memory access. A CPU with MTE enabled will compare the logical tag against the allocation tag and potentially raise an tag check fault on mismatch, subject to system registers configuration. Since ARMv8.9, FEAT_MTE_STORE_ONLY can be used to restrict raise of tag check fault on store operation only. Using this feature (FEAT_MTE_STORE_ONLY), introduce KASAN write-only mode which restricts KASAN check write (store) operation only. This mode omits KASAN check for read (fetch/load) operation. Therefore, it might be used not only debugging purpose but also in normal environment. This patch (of 2): Since Armv8.9, FEATURE_MTE_STORE_ONLY feature is introduced to restrict raise of tag check fault on store operation only. Introduce KASAN write only mode based on this feature. KASAN write only mode restricts KASAN checks operation for write only and omits the checks for fetch/read operations when accessing memory. So it might be used not only debugging enviroment but also normal enviroment to check memory safty. This features can be controlled with "kasan.write_only" arguments. When "kasan.write_only=on", KASAN checks write operation only otherwise KASAN checks all operations. This changes the MTE_STORE_ONLY feature as BOOT_CPU_FEATURE like ARM64_MTE_ASYMM so that makes it initialise in kasan_init_hw_tags() with other function together. Link: https://lkml.kernel.org/r/20250916222755.466009-1-yeoreum.yun@arm.com Link: https://lkml.kernel.org/r/20250916222755.466009-2-yeoreum.yun@arm.com Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Breno Leitao <leitao@debian.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: D Scott Phillips <scott@os.amperecomputing.com> Cc: Hardevsinh Palaniya <hardevsinh.palaniya@siliconsignals.io> Cc: James Morse <james.morse@arm.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: levi.yun <yeoreum.yun@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Pankaj Gupta <pankaj.gupta@amd.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-21 14:22:10 -07:00
Ryan Roberts	3df6979d22	arm64: mm: split linear mapping if BBML2 unsupported on secondary CPUs The kernel linear mapping is painted in very early stage of system boot. The cpufeature has not been finalized yet at this point. So the linear mapping is determined by the capability of boot CPU only. If the boot CPU supports BBML2, large block mappings will be used for linear mapping. But the secondary CPUs may not support BBML2, so repaint the linear mapping if large block mapping is used and the secondary CPUs don't support BBML2 once cpufeature is finalized on all CPUs. If the boot CPU doesn't support BBML2 or the secondary CPUs have the same BBML2 capability with the boot CPU, repainting the linear mapping is not needed. Repainting is implemented by the boot CPU, which we know supports BBML2, so it is safe for the live mapping size to change for this CPU. The linear map region is walked using the pagewalk API and any discovered large leaf mappings are split to pte mappings using the existing helper functions. Since the repainting is performed inside of a stop_machine(), we must use GFP_ATOMIC to allocate the extra intermediate pgtables. But since we are still early in boot, it is expected that there is plenty of memory available so we will never need to sleep for reclaim, and so GFP_ATOMIC is acceptable here. The secondary CPUs are all put into a waiting area with the idmap in TTBR0 and reserved map in TTBR1 while this is performed since they cannot be allowed to observe any size changes on the live mappings. Some of this infrastructure is reused from the kpti case. Specifically we share the same flag (was __idmap_kpti_flag, now idmap_kpti_bbml2_flag) since it means we don't have to reserve any extra pgtable memory to idmap the extra flag. Co-developed-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-19 19:13:54 +01:00
Yang Shi	a166563e7e	arm64: mm: support large block mapping when rodata=full When rodata=full is specified, kernel linear mapping has to be mapped at PTE level since large page table can't be split due to break-before-make rule on ARM64. This resulted in a couple of problems: - performance degradation - more TLB pressure - memory waste for kernel page table With FEAT_BBM level 2 support, splitting large block page table to smaller ones doesn't need to make the page table entry invalid anymore. This allows kernel split large block mapping on the fly. Add kernel page table split support and use large block mapping by default when FEAT_BBM level 2 is supported for rodata=full. When changing permissions for kernel linear mapping, the page table will be split to smaller size. The machine without FEAT_BBM level 2 will fallback to have kernel linear mapping PTE-mapped when rodata=full. With this we saw significant performance boost with some benchmarks and much less memory consumption on my AmpereOne machine (192 cores, 1P) with 256GB memory. * Memory use after boot Before: MemTotal: 258988984 kB MemFree: 254821700 kB After: MemTotal: 259505132 kB MemFree: 255410264 kB Around 500MB more memory are free to use. The larger the machine, the more memory saved. * Memcached We saw performance degradation when running Memcached benchmark with rodata=full vs rodata=on. Our profiling pointed to kernel TLB pressure. With this patchset we saw ops/sec is increased by around 3.5%, P99 latency is reduced by around 9.6%. The gain mainly came from reduced kernel TLB misses. The kernel TLB MPKI is reduced by 28.5%. The benchmark data is now on par with rodata=on too. * Disk encryption (dm-crypt) benchmark Ran fio benchmark with the below command on a 128G ramdisk (ext4) with disk encryption (by dm-crypt). fio --directory=/data --random_generator=lfsr --norandommap \ --randrepeat 1 --status-interval=999 --rw=write --bs=4k --loops=1 \ --ioengine=sync --iodepth=1 --numjobs=1 --fsync_on_close=1 \ --group_reporting --thread --name=iops-test-job --eta-newline=1 \ --size 100G The IOPS is increased by 90% - 150% (the variance is high, but the worst number of good case is around 90% more than the best number of bad case). The bandwidth is increased and the avg clat is reduced proportionally. * Sequential file read Read 100G file sequentially on XFS (xfs_io read with page cache populated). The bandwidth is increased by 150%. Co-developed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 21:36:37 +01:00
Yang Shi	13efe932d2	arm64: cpufeature: add AmpereOne to BBML2 allow list AmpereOne supports BBML2 without conflict abort, add to the allow list. Reviewed-by: Christoph Lameter (Ampere) <cl@gentwo.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 21:26:50 +01:00
Shanker Donthineni	cc80537caa	arm64: cpufeature: Add Olympus MIDR to BBML2 allow list The NVIDIA Olympus core supports BBML2 without conflict abort. Add its MIDR to the allow list to enable FEAT_BBM. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-18 20:21:37 +01:00
Sascha Bischoff	7847f51189	arm64: cpucaps: Add GICv5 Legacy vCPU interface (GCIE_LEGACY) capability Implement the GCIE_LEGACY capability as a system feature to be able to check for support from KVM. The type is explicitly ARM64_CPUCAP_EARLY_LOCAL_CPU_FEATURE, which means that the capability is enabled early if all boot CPUs support it. Additionally, if this capability is enabled during boot, it prevents late onlining of CPUs that lack it, thereby avoiding potential mismatched configurations which would break KVM. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-17 17:41:02 +01:00
Mark Brown	220928e52c	arm64/hwcap: Add hwcap for FEAT_LSFE FEAT_LSFE (Large System Float Extension), providing atomic floating point memory operations, is optional from v9.5. This feature adds no new architectural stare and we have no immediate use for it in the kernel so simply provide a hwcap for it to support discovery by userspace. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-09-16 19:55:11 +01:00
Linus Torvalds	c8bc81a52d	Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - CFI failure due to kpti_ng_pgd_alloc() signature mismatch - Underallocation bug in the SVE ptrace kselftest * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: kselftest/arm64: Don't open code SVE_PT_SIZE() in fp-ptrace arm64: mm: Fix CFI failure due to kpti_ng_pgd_alloc function signature	2025-08-30 10:43:53 -07:00
Kees Cook	ceca927c86	arm64: mm: Fix CFI failure due to kpti_ng_pgd_alloc function signature Seen during KPTI initialization: CFI failure at create_kpti_ng_temp_pgd+0x124/0xce8 (target: kpti_ng_pgd_alloc+0x0/0x14; expected type: 0xd61b88b6) The call site is alloc_init_pud() at arch/arm64/mm/mmu.c: pud_phys = pgtable_alloc(TABLE_PUD); alloc_init_pud() has the prototype: static void alloc_init_pud(p4d_t p4dp, unsigned long addr, unsigned long end, phys_addr_t phys, pgprot_t prot, phys_addr_t (pgtable_alloc)(enum pgtable_type), int flags) where the pgtable_alloc() prototype is declared. The target (kpti_ng_pgd_alloc) is used in arch/arm64/kernel/cpufeature.c: create_kpti_ng_temp_pgd(kpti_ng_temp_pgd, __pa(alloc), KPTI_NG_TEMP_VA, PAGE_SIZE, PAGE_KERNEL, kpti_ng_pgd_alloc, 0); which is an alias for __create_pgd_mapping_locked() with prototype: extern __alias(__create_pgd_mapping_locked) void create_kpti_ng_temp_pgd(pgd_t pgdir, phys_addr_t phys, unsigned long virt, phys_addr_t size, pgprot_t prot, phys_addr_t (pgtable_alloc)(enum pgtable_type), int flags); __create_pgd_mapping_locked() passes the function pointer down: __create_pgd_mapping_locked() -> alloc_init_p4d() -> alloc_init_pud() But the target function (kpti_ng_pgd_alloc) has the wrong signature: static phys_addr_t __init kpti_ng_pgd_alloc(int shift); The "int" should be "enum pgtable_type". To make "enum pgtable_type" available to cpufeature.c, move enum pgtable_type definition from arch/arm64/mm/mmu.c to arch/arm64/include/asm/mmu.h. Adjust kpti_ng_pgd_alloc to use "enum pgtable_type" instead of "int". The function behavior remains identical (parameter is unused). Fixes: `c64f46ee13` ("arm64: mm: use enum to identify pgtable level instead of *_SHIFT") Cc: <stable@vger.kernel.org> # 6.16.x Signed-off-by: Kees Cook <kees@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250829190721.it.373-kees@kernel.org Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-08-30 11:30:06 +01:00
Paolo Bonzini	42a0305ab1	Merge tag 'kvmarm-fixes-6.17-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, take #2 - Correctly handle 'invariant' system registers for protected VMs - Improved handling of VNCR data aborts, including external aborts - Fixes for handling of FEAT_RAS for NV guests, providing a sane fault context during SEA injection and preventing the use of RASv1p1 fault injection hardware - Ensure that page table destruction when a VM is destroyed gives an opportunity to reschedule - Large fix to KVM's infrastructure for managing guest context loaded on the CPU, addressing issues where the output of AT emulation doesn't get reflected to the guest - Fix AT S12 emulation to actually perform stage-2 translation when necessary - Avoid attempting vLPI irqbypass when GICv4 has been explicitly disabled for a VM - Minor KVM + selftest fixes	2025-08-29 12:57:31 -04:00
Marc Zyngier	8049164653	arm64: Add capability denoting FEAT_RASv1p1 Detecting FEAT_RASv1p1 is rather complicated, as there are two ways for the architecture to advertise the same thing (always a delight...). Add a capability that will advertise this in a synthetic way to the rest of the kernel. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20250817202158.395078-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-08-21 16:28:46 -07:00

1 2 3 4 5 ...

706 Commits