linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-18 06:44:00 -04:00

Author	SHA1	Message	Date
Paolo Bonzini	6b80203187	Merge tag 'kvm-s390-next-7.1-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD - ESA nesting support - 4k memslots - LPSW/E fix	2026-04-13 19:01:15 +02:00
Marc Zyngier	83a3980750	Merge branch kvm-arm64/pkvm-protected-guest into kvmarm-master/next * kvm-arm64/pkvm-protected-guest: (41 commits) : . : pKVM support for protected guests, implementing the very long : awaited support for anonymous memory, as the elusive guestmem : has failed to deliver on its promises despite a multi-year : effort. Patches courtesy of Will Deacon. From the initial cover : letter: : : "[...] this patch series implements support for protected guest : memory with pKVM, where pages are unmapped from the host as they are : faulted into the guest and can be shared back from the guest using pKVM : hypercalls. Protected guests are created using a new machine type : identifier and can be booted to a shell using the kvmtool patches : available at [2], which finally means that we are able to test the pVM : logic in pKVM. Since this is an incremental step towards full isolation : from the host (for example, the CPU register state and DMA accesses are : not yet isolated), creating a pVM requires a developer Kconfig option to : be enabled in addition to booting with 'kvm-arm.mode=protected' and : results in a kernel taint." : . KVM: arm64: Don't hold 'vm_table_lock' across guest page reclaim KVM: arm64: Allow get_pkvm_hyp_vm() to take a reference to a dying VM KVM: arm64: Prevent teardown finalisation of referenced 'hyp_vm' drivers/virt: pkvm: Add Kconfig dependency on DMA_RESTRICTED_POOL KVM: arm64: Rename PKVM_PAGE_STATE_MASK KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim KVM: arm64: Register 'selftest_vm' in the VM table KVM: arm64: Extend pKVM page ownership selftests to cover guest donation KVM: arm64: Add some initial documentation for pKVM KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled KVM: arm64: Implement the MEM_UNSHARE hypercall for protected VMs KVM: arm64: Implement the MEM_SHARE hypercall for protected VMs KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler KVM: arm64: Introduce hypercall to force reclaim of a protected page KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 KVM: arm64: Change 'pkvm_handle_t' to u16 KVM: arm64: Introduce host_stage2_set_owner_metadata_locked() ... Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-04-08 12:25:39 +01:00
Hendrik Brueckner	4aebd7d5c7	KVM: s390: Add KVM capability for ESA mode guests Now that all the bits are properly addressed, provide a mechanism for testing ESA mode guests in nested configurations. Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com> [farman@us.ibm.com: Updated commit message] Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>	2026-04-02 15:37:02 +02:00
Will Deacon	287c6981f1	KVM: arm64: Add some initial documentation for pKVM Add some initial documentation for pKVM to help people understand what is supported, the limitations of protected VMs when compared to non-protected VMs and also what is left to do. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Tested-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20260330144841.26181-33-will@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-30 16:58:09 +01:00
Sascha Bischoff	d51c978b7d	KVM: arm64: gic-v5: Communicate userspace-driveable PPIs via a UAPI GICv5 systems will likely not support the full set of PPIs. The presence of any virtual PPI is tied to the presence of the physical PPI. Therefore, the available PPIs will be limited by the physical host. Userspace cannot drive any PPIs that are not implemented. Moreover, it is not desirable to expose all PPIs to the guest in the first place, even if they are supported in hardware. Some devices, such as the arch timer, are implemented in KVM, and hence those PPIs shouldn't be driven by userspace, either. Provided a new UAPI: KVM_DEV_ARM_VGIC_GRP_CTRL => KVM_DEV_ARM_VGIC_USERPSPACE_PPIs This allows userspace to query which PPIs it is able to drive via KVM_IRQ_LINE. Additionally, introduce a check in kvm_vm_ioctl_irq_line() to reject any PPIs not in the userspace mask. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-40-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:29 +00:00
Sascha Bischoff	eb3c4d2c9a	Documentation: KVM: Introduce documentation for VGICv5 Now that it is possible to create a VGICv5 device, provide initial documentation for it. At this stage, there is little to document. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-39-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:29 +00:00
Sascha Bischoff	7c31c06e2d	KVM: arm64: gic-v5: Mandate architected PPI for PMU emulation on GICv5 Make it mandatory to use the architected PPI when running a GICv5 guest. Attempts to set anything other than the architected PPI (23) are rejected. Additionally, KVM_ARM_VCPU_PMU_V3_INIT is relaxed to no longer require KVM_ARM_VCPU_PMU_V3_IRQ to be called for GICv5-based guests. In this case, the architectued PPI is automatically used. Documentation is bumped accordingly. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://patch.msgid.link/20260319154937.3619520-33-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:29 +00:00
Sascha Bischoff	b88d05a893	KVM: arm64: gic-v5: Support GICv5 interrupts with KVM_IRQ_LINE Interrupts under GICv5 look quite different to those from older Arm GICs. Specifically, the type is encoded in the top bits of the interrupt ID. Extend KVM_IRQ_LINE to cope with GICv5 PPIs and SPIs. The requires subtly changing the KVM_IRQ_LINE API for GICv5 guests. For older Arm GICs, PPIs had to be in the range of 16-31, and SPIs had to be 32-1019, but this no longer holds true for GICv5. Instead, for a GICv5 guest support PPIs in the range of 0-127, and SPIs in the range 0-65535. The documentation is updated accordingly. The SPI range doesn't cover the full SPI range that a GICv5 system can potentially cope with (GICv5 provides up to 24-bits of SPI ID space, and we only have 16 bits to work with in KVM_IRQ_LINE). However, 65k SPIs is more than would be reasonably expected on systems for years to come. In order to use vgic_is_v5(), the kvm/arm_vgic.h header is added to kvm/arm.c. Note: As the GICv5 KVM implementation currently doesn't support injecting SPIs attempts to do so will fail. This restriction will by lifted as the GICv5 KVM support evolves. Co-authored-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-28-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2026-03-19 18:21:28 +00:00
Paolo Bonzini	dca01b0a26	Documentation: kvm: fix formatting of the quirks table A recently added quirk does not fit in the left column of the table, so it all has to be reformatted and realigned. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2026-03-11 19:16:52 +01:00
Jim Mattson	e2ffe85b6d	KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM Add KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM to allow L1 to set FREEZE_IN_SMM in vmcs12's GUEST_IA32_DEBUGCTL field, as permitted prior to commit `6b1dd26544` ("KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest"). Enable the quirk by default for backwards compatibility (like all quirks); userspace can disable it via KVM_CAP_DISABLE_QUIRKS2 for consistency with the constraints on WRMSR(IA32_DEBUGCTL). Note that the quirk only bypasses the consistency check. The vmcs02 bit is still owned by the host, and PMCs are not frozen during virtualized SMM. In particular, if a host administrator decides that PMCs should not be frozen during physical SMM, then L1 has no say in the matter. Fixes: `095686e6fc` ("KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter") Cc: stable@vger.kernel.org Signed-off-by: Jim Mattson <jmattson@google.com> Link: https://patch.msgid.link/20260205231537.1278753-1-jmattson@google.com [sean: tag for stable@, clean-up and fix goofs in the comment and docs] Signed-off-by: Sean Christopherson <seanjc@google.com> [Rename quirk. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2026-03-11 18:41:11 +01:00
Paolo Bonzini	94fe3e6515	Merge tag 'kvm-x86-generic-7.0-rc3' of https://github.com/kvm-x86/linux into HEAD KVM generic changes for 7.0 - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being unnecessary and confusing, triggered compiler warnings due to -Wflex-array-member-not-at-end. - Document that vcpu->mutex is take outside of kvm->slots_lock and kvm->slots_arch_lock, which is intentional and desirable despite being rather unintuitive.	2026-03-11 18:01:55 +01:00
Sean Christopherson	f8211e95df	Documentation: KVM: Formalizing taking vcpu->mutex outside of kvm->slots_lock Explicitly document the ordering of vcpu->mutex being taken outside of kvm->slots_lock. While somewhat unintuitive since vCPUs conceptually have narrower scope than VMs, the scope of the owning object (vCPU versus VM) doesn't automatically carry over to the lock. In this case, vcpu->mutex has far broader scope than kvm->slots_lock. As Paolo put it, it's a "don't worry about multiple ioctls at the same time" mutex that's intended to be taken at the outer edges of KVM. More importantly, arm64 and x86 have gained flows that take kvm->slots_lock inside of vcpu->mutex. x86's kvm_inhibit_apic_access_page() is particularly nasty, as slots_lock is taken quite deep within KVM_RUN, i.e. simply swapping the ordering isn't an option. Commit to the vcpu->mutex => kvm->slots_lock ordering, as vcpu->mutex really is intended to be a "top-level" lock, whereas kvm->slots_lock is "just" a helper lock. Opportunistically document that vcpu->mutex is also taken outside of slots_arch_lock, e.g. when allocating shadow roots on x86 (which is the entire reason slots_arch_lock exists, as shadow roots must be allocated while holding kvm->srcu) kvm_mmu_new_pgd() \| -> kvm_mmu_reload() \| -> kvm_mmu_load() \| -> mmu_alloc_shadow_roots() \| -> mmu_first_shadow_root_alloc() but also when manipulating memslots in vCPU context, e.g. when inhibiting the APIC-access page via the aforementioned kvm_inhibit_apic_access_page() kvm_inhibit_apic_access_page() \| -> __x86_set_memory_region() \| -> kvm_set_internal_memslot() \| -> kvm_set_memory_region() \| -> kvm_set_memslot() Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Marc Zyngier <maz@kernel.org> Link: https://patch.msgid.link/20260302170239.596810-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2026-03-02 09:52:09 -08:00
Paolo Bonzini	70295a479d	KVM: always define KVM_CAP_SYNC_MMU KVM_CAP_SYNC_MMU is provided by KVM's MMU notifiers, which are now always available. Move the definition from individual architectures to common code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2026-02-28 15:31:35 +01:00
Linus Torvalds	cb5573868e	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "Loongarch: - Add more CPUCFG mask bits - Improve feature detection - Add lazy load support for FPU and binary translation (LBT) register state - Fix return value for memory reads from and writes to in-kernel devices - Add support for detecting preemption from within a guest - Add KVM steal time test case to tools/selftests ARM: - Add support for FEAT_IDST, allowing ID registers that are not implemented to be reported as a normal trap rather than as an UNDEF exception - Add sanitisation of the VTCR_EL2 register, fixing a number of UXN/PXN/XN bugs in the process - Full handling of RESx bits, instead of only RES0, and resulting in SCTLR_EL2 being added to the list of sanitised registers - More pKVM fixes for features that are not supposed to be exposed to guests - Make sure that MTE being disabled on the pKVM host doesn't give it the ability to attack the hypervisor - Allow pKVM's host stage-2 mappings to use the Force Write Back version of the memory attributes by using the "pass-through' encoding - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the guest - Preliminary work for guest GICv5 support - A bunch of debugfs fixes, removing pointless custom iterators stored in guest data structures - A small set of FPSIMD cleanups - Selftest fixes addressing the incorrect alignment of page allocation - Other assorted low-impact fixes and spelling fixes RISC-V: - Fixes for issues discoverd by KVM API fuzzing in kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and kvm_riscv_vcpu_aia_imsic_update() - Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM - Transparent huge page support for hypervisor page tables - Adjust the number of available guest irq files based on MMIO register sizes found in the device tree or the ACPI tables - Add RISC-V specific paging modes to KVM selftests - Detect paging mode at runtime for selftests s390: - Performance improvement for vSIE (aka nested virtualization) - Completely new memory management. s390 was a special snowflake that enlisted help from the architecture's page table management to build hypervisor page tables, in particular enabling sharing the last level of page tables. This however was a lot of code (~3K lines) in order to support KVM, and also blocked several features. The biggest advantages is that the page size of userspace is completely independent of the page size used by the guest: userspace can mix normal pages, THPs and hugetlbfs as it sees fit, and in fact transparent hugepages were not possible before. It's also now possible to have nested guests and guests with huge pages running on the same host - Maintainership change for s390 vfio-pci - Small quality of life improvement for protected guests x86: - Add support for giving the guest full ownership of PMU hardware (contexted switched around the fastpath run loop) and allowing direct access to data MSRs and PMCs (restricted by the vPMU model). KVM still intercepts access to control registers, e.g. to enforce event filtering and to prevent the guest from profiling sensitive host state. This is more accurate, since it has no risk of contention and thus dropped events, and also has significantly less overhead. For more information, see the commit message for merge commit `bf2c3138ae` ("Merge tag 'kvm-x86-pmu-6.20' ...") - Disallow changing the virtual CPU model if L2 is active, for all the same reasons KVM disallows change the model after the first KVM_RUN - Fix a bug where KVM would incorrectly reject host accesses to PV MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled, even if those were advertised as supported to userspace, - Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs, where KVM would attempt to read CR3 configuring an async #PF entry - Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM (for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL. Only a few exports that are intended for external usage, and those are allowed explicitly - When checking nested events after a vCPU is unblocked, ignore -EBUSY instead of WARNing. Userspace can sometimes put the vCPU into what should be an impossible state, and spurious exit to userspace on -EBUSY does not really do anything to solve the issue - Also throw in the towel and drop the WARN on INIT/SIPI being blocked when vCPU is in Wait-For-SIPI, which also resulted in playing whack-a-mole with syzkaller stuffing architecturally impossible states into KVM - Add support for new Intel instructions that don't require anything beyond enumerating feature flags to userspace - Grab SRCU when reading PDPTRs in KVM_GET_SREGS2 - Add WARNs to guard against modifying KVM's CPU caps outside of the intended setup flow, as nested VMX in particular is sensitive to unexpected changes in KVM's golden configuration - Add a quirk to allow userspace to opt-in to actually suppress EOI broadcasts when the suppression feature is enabled by the guest (currently limited to split IRQCHIP, i.e. userspace I/O APIC). Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an option as some userspaces have come to rely on KVM's buggy behavior (KVM advertises Supress EOI Broadcast irrespective of whether or not userspace I/O APIC supports Directed EOIs) - Clean up KVM's handling of marking mapped vCPU pages dirty - Drop a pile of ancient sanity checks hidden behind in KVM's unused ASSERT() macro, most of which could be trivially triggered by the guest and/or user, and all of which were useless - Fold "struct dest_map" into its sole user, "struct rtc_status", to make it more obvious what the weird parameter is used for, and to allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n - Bury all of ioapic.h, i8254.h and related ioctls (including KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y - Add a regression test for recent APICv update fixes - Handle "hardware APIC ISR", a.k.a. SVI, updates in kvm_apic_update_apicv() to consolidate the updates, and to co-locate SVI updates with the updates for KVM's own cache of ISR information - Drop a dead function declaration - Minor cleanups x86 (Intel): - Rework KVM's handling of VMCS updates while L2 is active to temporarily switch to vmcs01 instead of deferring the update until the next nested VM-Exit. The deferred updates approach directly contributed to several bugs, was proving to be a maintenance burden due to the difficulty in auditing the correctness of deferred updates, and was polluting "struct nested_vmx" with a growing pile of booleans - Fix an SGX bug where KVM would incorrectly try to handle EPCM page faults, and instead always reflect them into the guest. Since KVM doesn't shadow EPCM entries, EPCM violations cannot be due to KVM interference and can't be resolved by KVM - Fix a bug where KVM would register its posted interrupt wakeup handler even if loading kvm-intel.ko ultimately failed - Disallow access to vmcb12 fields that aren't fully supported, mostly to avoid weirdness and complexity for FRED and other features, where KVM wants enable VMCS shadowing for fields that conditionally exist - Print out the "bad" offsets and values if kvm-intel.ko refuses to load (or refuses to online a CPU) due to a VMCS config mismatch x86 (AMD): - Drop a user-triggerable WARN on nested_svm_load_cr3() failure - Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS relies on an upcoming, publicly announced change in the APM to reduce the set of conditions where hardware (i.e. KVM) must flush the RAP - Ignore nSVM intercepts for instructions that are not supported according to L1's virtual CPU model - Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath for EPT Misconfig - Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM, and allow userspace to restore nested state with GIF=0 - Treat exit_code as an unsigned 64-bit value through all of KVM - Add support for fetching SNP certificates from userspace - Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD or VMSAVE on behalf of L2 - Misc fixes and cleanups x86 selftests: - Add a regression test for TPR<=>CR8 synchronization and IRQ masking - Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support, and extend x86's infrastructure to support EPT and NPT (for L2 guests) - Extend several nested VMX tests to also cover nested SVM - Add a selftest for nested VMLOAD/VMSAVE - Rework the nested dirty log test, originally added as a regression test for PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage and to hopefully make the test easier to understand and maintain guest_memfd: - Remove kvm_gmem_populate()'s preparation tracking and half-baked hugepage handling. SEV/SNP was the only user of the tracking and it can do it via the RMP - Retroactively document and enforce (for SNP) that KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the source page to be 4KiB aligned, to avoid non-trivial complexity for something that no known VMM seems to be doing and to avoid an API special case for in-place conversion, which simply can't support unaligned sources - When populating guest_memfd memory, GUP the source page in common code and pass the refcounted page to the vendor callback, instead of letting vendor code do the heavy lifting. Doing so avoids a looming deadlock bug with in-place due an AB-BA conflict betwee mmap_lock and guest_memfd's filemap invalidate lock Generic: - Fix a bug where KVM would ignore the vCPU's selected address space when creating a vCPU-specific mapping of guest memory. Actually this bug could not be hit even on x86, the only architecture with multiple address spaces, but it's a bug nevertheless" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (267 commits) KVM: s390: Increase permitted SE header size to 1 MiB MAINTAINERS: Replace backup for s390 vfio-pci KVM: s390: vsie: Fix race in acquire_gmap_shadow() KVM: s390: vsie: Fix race in walk_guest_tables() KVM: s390: Use guest address to mark guest page dirty irqchip/riscv-imsic: Adjust the number of available guest irq files RISC-V: KVM: Transparent huge page support RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test RISC-V: KVM: Allow Zalasr extensions for Guest/VM KVM: riscv: selftests: Add riscv vm satp modes KVM: riscv: selftests: add Zilsd and Zclsd extension to get-reg-list test riscv: KVM: allow Zilsd and Zclsd extensions for Guest/VM RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr() RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr() RISC-V: KVM: Remove unnecessary 'ret' assignment KVM: s390: Add explicit padding to struct kvm_s390_keyop KVM: LoongArch: selftests: Add steal time test case LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side LoongArch: KVM: Add paravirt preempt feature in hypervisor side ...	2026-02-13 11:31:15 -08:00
Paolo Bonzini	b1195183ed	Merge tag 'kvm-s390-next-7.0-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD - gmap rewrite: completely new memory management for kvm/s390 - vSIE improvement - maintainership change for s390 vfio-pci - small quality of life improvement for protected guests	2026-02-11 18:52:27 +01:00
Paolo Bonzini	9123c5f956	Merge tag 'kvm-x86-gmem-6.20' of https://github.com/kvm-x86/linux into HEAD KVM guest_memfd changes for 6.20 - Remove kvm_gmem_populate()'s preparation tracking and half-baked hugepage handling, and instead rely on SNP (the only user of the tracking) to do its own tracking via the RMP. - Retroactively document and enforce (for SNP) that KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the source page to be 4KiB aligned, to avoid non-trivial complexity for a non-existent usecase (and because in-place conversion simply can't support unaligned sources). - When populating guest_memfd memory, GUP the source page in common code and pass the refcounted page to the vendor callback, instead of letting vendor code do the heavy lifting. Doing so avoids a looming deadlock bug with in-place due an AB-BA conflict betwee mmap_lock and guest_memfd's filemap invalidate lock.	2026-02-11 12:45:12 -05:00
Linus Torvalds	72c395024d	Merge tag 'docs-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux Pull documentation updates from Jonathan Corbet: "A slightly calmer cycle for docs this time around, though there is still a fair amount going on, including: - Some signs of life on the long-moribund Japanese translation - Documentation on policies around the use of generative tools for patch submissions, and a separate document intended for consumption by generative tools - The completion of the move of the documentation tools to tools/docs. For now we're leaving a /scripts/kernel-doc symlink behind to avoid breaking scripts - Ongoing build-system work includes the incorporation of documentation in Python code, better support for documenting variables, and lots of improvements and fixes - Automatic linking of man-page references -- cat(1), for example -- to the online pages in the HTML build ...and the usual array of typo fixes and such" * tag 'docs-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux: (107 commits) doc: development-process: add notice on testing tools: sphinx-build-wrapper: improve its help message docs: sphinx-build-wrapper: allow -v override -q docs: kdoc: Fix pdfdocs build for tools docs: ja_JP: process: translate 'Obtain a current source tree' docs: fix 're-use' -> 'reuse' in documentation docs: ioctl-number: fix a typo in ioctl-number.rst docs: filesystems: ensure proc pid substitutable is complete docs: automarkup.py: Skip common English words as C identifiers Documentation: use a source-read extension for the index link boilerplate docs: parse_features: make documentation more consistent docs: add parse_features module documentation docs: jobserver: do some documentation improvements docs: add jobserver module documentation docs: kabi: helpers: add documentation for each "enum" value docs: kabi: helpers: add helper for debug bits 7 and 8 docs: kabi: system_symbols: end docstring phrases with a dot docs: python: abi_regex: do some improvements at documentation docs: python: abi_parser: do some improvements at documentation docs: add kabi modules documentation ...	2026-02-09 20:53:18 -08:00
Paolo Bonzini	9e03b7caf4	Merge tag 'kvm-x86-misc-6.20' of https://github.com/kvm-x86/linux into HEAD KVM x86 misc changes for 6.20 - Disallow changing the virtual CPU model if L2 is active, for all the same reasons KVM disallows change the model after the first KVM_RUN. - Fix a bug where KVM would incorrectly reject host accesses to PV MSRs that were advertised as supported to userspace when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled. - Fix a bug where KVM would attempt to read protect guest state (CR3) when configuring an async #PF entry. - Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM (for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL. Explicitly allow the few exports that are intended for external usage. - Ignore -EBUSY when checking nested events after a vCPU exits blocking as the WARN is user-triggerable, and because exiting to userspace on -EBUSY does more harm than good in pretty much every situation. - Throw in the towel and drop the WARN on INIT/SIPI being blocked when vCPU is in Wait-For-SIPI, as playing whack-a-mole with syzkaller turned out to be an unwinnable game. - Add support for new Intel instructions that don't require anything beyond enumerating feature flags to userspace. - Grab SRCU when reading PDPTRs in KVM_GET_SREGS2. - Add WARNs to guard against modifying KVM's CPU caps outside of the intended setup flow, as nested VMX in particular is sensitive to unexpected changes in KVM's golden configuration. - Add a quirk to allow userspace to opt-in to actually suppress EOI broadcasts when the suppression feature is enabled by the guest (currently limited to split IRQCHIP, i.e. userspace I/O APIC). Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an option as some userspaces have come to rely on KVM's buggy behavior (KVM advertises Supress EOI Broadcast irrespective of whether or not userspace I/O APIC supports Directed EOIs). - Minor cleanups.	2026-02-09 18:53:47 +01:00
Paolo Bonzini	4215ee0d7b	Merge tag 'kvm-x86-svm-6.20' of https://github.com/kvm-x86/linux into HEAD KVM SVM changes for 6.20 - Drop a user-triggerable WARN on nested_svm_load_cr3() failure. - Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS relies on an upcoming, publicly announced change in the APM to reduce the set of conditions where hardware (i.e. KVM) must flush the RAP. - Ignore nSVM intercepts for instructions that are not supported according to L1's virtual CPU model. - Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath for EPT Misconfig. - Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM, and allow userspace to restore nested state with GIF=0. - Treat exit_code as an unsigned 64-bit value through all of KVM. - Add support for fetching SNP certificates from userspace. - Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD or VMSAVE on behalf of L2. - Misc fixes and cleanups.	2026-02-09 18:51:37 +01:00
Claudio Imbrenda	0ee4ddc164	KVM: s390: Storage key manipulation IOCTL Add a new IOCTL to allow userspace to manipulate storage keys directly. This will make it easier to write selftests related to storage keys. Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>	2026-02-04 17:00:10 +01:00
Khushit Shah	6517dfbcc9	KVM: x86: Add x2APIC "features" to control EOI broadcast suppression Add two flags for KVM_CAP_X2APIC_API to allow userspace to control support for Suppress EOI Broadcasts when using a split IRQCHIP (I/O APIC emulated by userspace), which KVM completely mishandles. When x2APIC support was first added, KVM incorrectly advertised and "enabled" Suppress EOI Broadcast, without fully supporting the I/O APIC side of the equation, i.e. without adding directed EOI to KVM's in-kernel I/O APIC. That flaw was carried over to split IRQCHIP support, i.e. KVM advertised support for Suppress EOI Broadcasts irrespective of whether or not the userspace I/O APIC implementation supported directed EOIs. Even worse, KVM didn't actually suppress EOI broadcasts, i.e. userspace VMMs without support for directed EOI came to rely on the "spurious" broadcasts. KVM "fixed" the in-kernel I/O APIC implementation by completely disabling support for Suppress EOI Broadcasts in commit `0bcc3fb95b` ("KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), but didn't do anything to remedy userspace I/O APIC implementations. KVM's bogus handling of Suppress EOI Broadcast is problematic when the guest relies on interrupts being masked in the I/O APIC until well after the initial local APIC EOI. E.g. Windows with Credential Guard enabled handles interrupts in the following order: 1. Interrupt for L2 arrives. 2. L1 APIC EOIs the interrupt. 3. L1 resumes L2 and injects the interrupt. 4. L2 EOIs after servicing. 5. L1 performs the I/O APIC EOI. Because KVM EOIs the I/O APIC at step #2, the guest can get an interrupt storm, e.g. if the IRQ line is still asserted and userspace reacts to the EOI by re-injecting the IRQ, because the guest doesn't de-assert the line until step #4, and doesn't expect the interrupt to be re-enabled until step #5. Unfortunately, simply "fixing" the bug isn't an option, as KVM has no way of knowing if the userspace I/O APIC supports directed EOIs, i.e. suppressing EOI broadcasts would result in interrupts being stuck masked in the userspace I/O APIC due to step #5 being ignored by userspace. And fully disabling support for Suppress EOI Broadcast is also undesirable, as picking up the fix would require a guest reboot, and more importantly would change the virtual CPU model exposed to the guest without any buy-in from userspace. Add KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST flags to allow userspace to explicitly enable or disable support for Suppress EOI Broadcasts. This gives userspace control over the virtual CPU model exposed to the guest, as KVM should never have enabled support for Suppress EOI Broadcast without userspace opt-in. Not setting either flag will result in legacy quirky behavior for backward compatibility. Disallow fully enabling SUPPRESS_EOI_BROADCAST when using an in-kernel I/O APIC, as KVM's history/support is just as tragic. E.g. it's not clear that commit `c806a6ad35` ("KVM: x86: call irq notifiers with directed EOI") was entirely correct, i.e. it may have simply papered over the lack of Directed EOI emulation in the I/O APIC. Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's APM. But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI is to support Directed EOI (KVM's name) irrespective of guest CPU vendor. Fixes: `7543a635aa` ("KVM: x86: Add KVM exit for IOAPIC EOIs") Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com Cc: stable@vger.kernel.org Suggested-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Khushit Shah <khushit.shah@nutanix.com> Link: https://patch.msgid.link/20260123125657.3384063-1-khushit.shah@nutanix.com [sean: clean up minor formatting goofs and fix a comment typo] Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>	2026-01-30 13:28:35 -08:00
Jani Nikula	a592a36e49	Documentation: use a source-read extension for the index link boilerplate The root document usually has a special :ref:`genindex` link to the generated index. This is also the case for Documentation/index.rst. The other index.rst files deeper in the directory hierarchy usually don't. For SPHINXDIRS builds, the root document isn't Documentation/index.rst, but some other index.rst in the hierarchy. Currently they have a ".. only::" block to add the index link when doing SPHINXDIRS html builds. This is obviously very tedious and repetitive. The link is also added to all index.rst files in the hierarchy for SPHINXDIRS builds, not just the root document. Put the boilerplate in a sphinx-includes/subproject-index.rst file, and include it at the end of the root document for subproject builds in an ad-hoc source-read extension defined in conf.py. For now, keep having the boilerplate in translations, because this approach currently doesn't cover translated index link headers. Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Tested-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> [jc: did s/doctree/kern_doc_dir/ ] Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20260123143149.2024303-1-jani.nikula@intel.com>	2026-01-23 11:59:34 -07:00
Michael Roth	20c3c4108d	KVM: SEV: Add KVM_SEV_SNP_ENABLE_REQ_CERTS command Introduce a new command for KVM_MEMORY_ENCRYPT_OP ioctl that can be used to enable fetching of endorsement key certificates from userspace via the new KVM_EXIT_SNP_REQ_CERTS exit type. Also introduce a new KVM_X86_SEV_SNP_REQ_CERTS KVM device attribute so that userspace can query whether the kernel supports the new command/exit. Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Tested-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Link: https://patch.msgid.link/20260109231732.1160759-3-michael.roth@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2026-01-23 09:14:16 -08:00
Michael Roth	fa9893fadb	KVM: Introduce KVM_EXIT_SNP_REQ_CERTS for SNP certificate-fetching For SEV-SNP, the host can optionally provide a certificate table to the guest when it issues an attestation request to firmware (see GHCB 2.0 specification regarding "SNP Extended Guest Requests"). This certificate table can then be used to verify the endorsement key used by firmware to sign the attestation report. While it is possible for guests to obtain the certificates through other means, handling it via the host provides more flexibility in being able to keep the certificate data in sync with the endorsement key throughout host-side operations that might resulting in the endorsement key changing. In the case of KVM, userspace will be responsible for fetching the certificate table and keeping it in sync with any modifications to the endorsement key by other userspace management tools. Define a new KVM_EXIT_SNP_REQ_CERTS event where userspace is provided with the GPA of the buffer the guest has provided as part of the attestation request so that userspace can write the certificate data into it while relying on filesystem-based locking to keep the certificates up-to-date relative to the endorsement keys installed/utilized by firmware at the time the certificates are fetched. [Melody: Update the documentation scheme about how file locking is expected to happen.] Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Tested-by: Liam Merwick <liam.merwick@oracle.com> Tested-by: Dionna Glaze <dionnaglaze@google.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Melody Wang <huibo.wang@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Link: https://patch.msgid.link/20260109231732.1160759-2-michael.roth@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2026-01-23 09:14:15 -08:00
Marc Zyngier	902eebac8f	KVM: arm64: Add documentation for KVM_EXIT_ARM_LDST64B Add a bit of documentation for KVM_EXIT_ARM_LDST64B so that userspace knows what to expect. Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>	2026-01-22 13:24:49 +00:00
Michael Roth	189fd1b059	KVM: TDX: Document alignment requirements for KVM_TDX_INIT_MEM_REGION Since it was never possible to use a non-PAGE_SIZE-aligned @source_addr, go ahead and document this as a requirement. This is in preparation for enforcing page-aligned @source_addr for all architectures in guest_memfd. Reviewed-by: Vishal Annapurve <vannapurve@google.com> Tested-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Link: https://patch.msgid.link/20260108214622.1084057-6-michael.roth@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2026-01-15 12:31:17 -08:00
Michael Roth	dcbcc2323c	KVM: SEV: Document/enforce page-alignment for KVM_SEV_SNP_LAUNCH_UPDATE In the past, KVM_SEV_SNP_LAUNCH_UPDATE accepted a non-page-aligned 'uaddr' parameter to copy data from, but continuing to support this with new functionality like in-place conversion and hugepages in the pipeline has proven to be more trouble than it is worth, since there are no known users that have been identified who use a non-page-aligned 'uaddr' parameter. Rather than locking guest_memfd into continuing to support this, go ahead and document page-alignment as a requirement and begin enforcing this in the handling function. Reviewed-by: Vishal Annapurve <vannapurve@google.com> Tested-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Link: https://patch.msgid.link/20260108214622.1084057-5-michael.roth@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2026-01-15 12:31:16 -08:00
Linus Torvalds	feb06d2690	Merge tag 'hyperv-next-signed-20251207' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Enhancements to Linux as the root partition for Microsoft Hypervisor: - Support a new mode called L1VH, which allows Linux to drive the hypervisor running the Azure Host directly - Support for MSHV crash dump collection - Allow Linux's memory management subsystem to better manage guest memory regions - Fix issues that prevented a clean shutdown of the whole system on bare metal and nested configurations - ARM64 support for the MSHV driver - Various other bug fixes and cleanups - Add support for Confidential VMBus for Linux guest on Hyper-V - Secure AVIC support for Linux guests on Hyper-V - Add the mshv_vtl driver to allow Linux to run as the secure kernel in a higher virtual trust level for Hyper-V * tag 'hyperv-next-signed-20251207' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (58 commits) mshv: Cleanly shutdown root partition with MSHV mshv: Use reboot notifier to configure sleep state mshv: Add definitions for MSHV sleep state configuration mshv: Add support for movable memory regions mshv: Add refcount and locking to mem regions mshv: Fix huge page handling in memory region traversal mshv: Move region management to mshv_regions.c mshv: Centralize guest memory region destruction mshv: Refactor and rename memory region handling functions mshv: adjust interrupt control structure for ARM64 Drivers: hv: use kmalloc_array() instead of kmalloc() mshv: Add ioctl for self targeted passthrough hvcalls Drivers: hv: Introduce mshv_vtl driver Drivers: hv: Export some symbols for mshv_vtl static_call: allow using STATIC_CALL_TRAMP_STR() from assembly mshv: Extend create partition ioctl to support cpu features mshv: Allow mappings that overlap in uaddr mshv: Fix create memory region overlap check mshv: add WQ_PERCPU to alloc_workqueue users Drivers: hv: Use kmalloc_array() instead of kmalloc() ...	2025-12-09 06:10:17 +09:00
Paolo Bonzini	e0c26d47de	Merge tag 'kvm-s390-next-6.19-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD - SCA rework - VIRT_XFER_TO_GUEST_WORK support - Operation exception forwarding support - Cleanups	2025-12-02 18:58:47 +01:00
Paolo Bonzini	f58e70cc31	Merge tag 'kvmarm-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.19 - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner. - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ. - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU. - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM. - Minor fixes to KVM and selftests	2025-12-02 18:36:26 +01:00
Paolo Bonzini	63a9b0bc65	Merge tag 'kvm-riscv-6.19-1' of https://github.com/kvm-riscv/linux into HEAD KVM/riscv changes for 6.19 - SBI MPXY support for KVM guest - New KVM_EXIT_FAIL_ENTRY_NO_VSFILE for the case when in-kernel AIA virtualization fails to allocate IMSIC VS-file - Support enabling dirty log gradually in small chunks - Fix guest page fault within HLV* instructions - Flush VS-stage TLB after VCPU migration for Andes cores	2025-12-02 18:35:25 +01:00
Paolo Bonzini	679fcce002	Merge tag 'kvm-x86-svm-6.19' of https://github.com/kvm-x86/linux into HEAD KVM SVM changes for 6.19: - Fix a few missing "VMCB dirty" bugs. - Fix the worst of KVM's lack of EFER.LMSLE emulation. - Add AVIC support for addressing 4k vCPUs in x2AVIC mode. - Fix incorrect handling of selective CR0 writes when checking intercepts during emulation of L2 instructions. - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on VMRUN and #VMEXIT. - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft interrupt if the guest patched the underlying code after the VM-Exit, e.g. when Linux patches code with a temporary INT3. - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to userspace, and extend KVM "support" to all policy bits that don't require any actual support from KVM.	2025-11-26 09:48:39 +01:00
Paolo Bonzini	9aca52b552	Merge tag 'kvm-x86-generic-6.19' of https://github.com/kvm-x86/linux into HEAD KVM generic changes for 6.19: - Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for irqfd cleanup. - Fix a goof in the dirty ring documentation.	2025-11-26 09:22:45 +01:00
Dong Yang	df60cb2e67	KVM: riscv: Support enabling dirty log gradually in small chunks There is already support of enabling dirty log gradually in small chunks for x86 in commit `3c9bd4006b` ("KVM: x86: enable dirty log gradually in small chunks") and `c862626` ("KVM: arm64: Support enabling dirty log gradually in small chunks"). This adds support for riscv. x86 and arm64 writes protect both huge pages and normal pages now, so riscv protect also protects both huge pages and normal pages. On a nested virtualization setup (RISC-V KVM running inside a QEMU VM on an [Intel® Core™ i5-12500H] host), I did some tests with a 2G Linux VM using different backing page sizes. The time taken for memory_global_dirty_log_start in the L2 QEMU is listed below: Page Size Before After Optimization 4K 4490.23ms 31.94ms 2M 48.97ms 45.46ms 1G 28.40ms 30.93ms Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Signed-off-by: Dong Yang <dayss1224@gmail.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20251103062825.9084-1-dayss1224@gmail.com Signed-off-by: Anup Patel <anup@brainfault.org>	2025-11-24 09:55:36 +05:30
Janosch Frank	8e8678e740	KVM: s390: Add capability that forwards operation exceptions Setting KVM_CAP_S390_USER_OPEREXEC will forward all operation exceptions to user space. This also includes the 0x0000 instructions managed by KVM_CAP_S390_USER_INSTR0. It's helpful if user space wants to emulate instructions which do not (yet) have an opcode. While we're at it refine the documentation for KVM_CAP_S390_USER_INSTR0. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>	2025-11-21 10:26:03 +01:00
Roman Kisel	92c7053b44	Documentation: hyperv: Confidential VMBus Define what the confidential VMBus is and describe what advantages it offers on the capable hardware. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2025-11-15 06:18:14 +00:00
Yosry Ahmed	9f4ce48788	KVM: x86: Document a virtualization gap for GIF on AMD CPUs According to the APM Volume #2, Section 15.17, Table 15-10 (24593—Rev. 3.42—March 2024), When "GIF==0", an "Debug exception or trap, due to breakpoint register match" should be "Ignored and discarded". KVM lacks any handling of this. Even when vGIF is enabled and vGIF==0, the CPU does not ignore #DBs and relies on the VMM to do so. Handling this is possible, but the complexity is unjustified given the rarity of using HW breakpoints when GIF==0 (e.g. near VMRUN). KVM would need to intercept the #DB, temporarily disable the breakpoint, singe-step over the instruction (probably reusing NMI singe-stepping), and re-enable the breakpoint. Instead, document this as an erratum. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://patch.msgid.link/20251030223757.2950309-1-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-13 13:15:15 -08:00
Jiaqi Yan	4debb5e895	Documentation: kvm: new UAPI for handling SEA Document the new userspace-visible features and APIs for handling synchronous external abort (SEA) - KVM_CAP_ARM_SEA_TO_USER: How userspace enables the new feature. - KVM_EXIT_ARM_SEA: exit userspace gets when it needs to handle SEA and what userspace gets while taking the SEA. Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> Link: https://msgid.link/20251013185903.1372553-4-jiaqiyan@google.com [ oliver: make documentation concise, remove implementation detail ] Signed-off-by: Oliver Upton <oupton@kernel.org>	2025-11-12 01:52:37 -08:00
Janosch Frank	182a258b5e	Documentation: kvm: Fix ordering 7.43 has been assigned twice, make KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 7.44. Fixes: `f55ce5a6cd` ("KVM: arm64: Expose new KVM cap for cacheable PFNMAP") Reviewed-by: Ankit Agrawal <ankita@nvidia.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>	2025-10-20 12:44:04 +00:00
Paolo Bonzini	4361f5aa8b	Merge tag 'kvm-x86-fixes-6.18-rc2' of https://github.com/kvm-x86/linux into HEAD KVM x86 fixes for 6.18: - Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the bug fixed by commit `3ccbf6f470` ("KVM: x86/mmu: Return -EAGAIN if userspace deletes/moves memslot during prefault") - Don't try to get PMU capabbilities from perf when running a CPU with hybrid CPUs/PMUs, as perf will rightly WARN. - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more generic KVM_CAP_GUEST_MEMFD_FLAGS - Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set said flag to initialize memory as SHARED, irrespective of MMAP. The behavior merged in 6.18 is that enabling mmap() implicitly initializes memory as SHARED, which would result in an ABI collision for x86 CoCo VMs as their memory is currently always initialized PRIVATE. - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private memory, to enable testing such setups, i.e. to hopefully flush out any other lurking ABI issues before 6.18 is officially released. - Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP, and host userspace accesses to mmap()'d private memory.	2025-10-18 10:25:43 +02:00
Leonardo Bras	04fd067b77	KVM: Fix VM exit code for full dirty ring in API documentation While reading the documentation, I saw a exit code I could not grep for, to figure out it has a slightly different name. Fix that name in documentation so it points to the right exit code. Signed-off-by: Leonardo Bras <leo.bras@arm.com> Link: https://lore.kernel.org/r/20251014152802.13563-1-leo.bras@arm.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-10-14 15:19:05 -07:00
Sascha Bischoff	164ecbf73c	Documentation: KVM: Update GICv3 docs for GICv5 hosts GICv5 hosts optionally include FEAT_GCIE_LEGACY, which allows them to execute GICv3-based VMs on GICv5 hardware. Update the GICv3 documentation to reflect this now that GICv3 guests are supports on compatible GICv5 hosts. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:40:58 +01:00
Oliver Upton	cc4309324d	KVM: arm64: Document vCPU event ioctls as requiring init'ed vCPU KVM rejects calls to KVM_{GET,SET}_VCPU_EVENTS for an uninitialized vCPU as of commit cc96679f3c03 ("KVM: arm64: Prevent access to vCPU events before init"). Update the corresponding API documentation. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:03 +01:00
Sean Christopherson	fe2bf6234e	KVM: guest_memfd: Add INIT_SHARED flag, reject user page faults if not set Add a guest_memfd flag to allow userspace to state that the underlying memory should be configured to be initialized as shared, and reject user page faults if the guest_memfd instance's memory isn't shared. Because KVM doesn't yet support in-place private<=>shared conversions, all guest_memfd memory effectively follows the initial state. Alternatively, KVM could deduce the initial state based on MMAP, which for all intents and purposes is what KVM currently does. However, implicitly deriving the default state based on MMAP will result in a messy ABI when support for in-place conversions is added. For x86 CoCo VMs, which don't yet support MMAP, memory is currently private by default (otherwise the memory would be unusable). If MMAP implies memory is shared by default, then the default state for CoCo VMs will vary based on MMAP, and from userspace's perspective, will change when in-place conversion support is added. I.e. to maintain guest<=>host ABI, userspace would need to immediately convert all memory from shared=>private, which is both ugly and inefficient. The inefficiency could be avoided by adding a flag to state that memory is _private_ by default, irrespective of MMAP, but that would lead to an equally messy and hard to document ABI. Bite the bullet and immediately add a flag to control the default state so that the effective behavior is explicit and straightforward. Fixes: `3d3a04fad2` ("KVM: Allow and advertise support for host mmap() on guest_memfd files") Cc: David Hildenbrand <david@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20251003232606.4070510-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-10-10 14:25:23 -07:00
Sean Christopherson	d2042d8f96	KVM: Rework KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS Rework the not-yet-released KVM_CAP_GUEST_MEMFD_MMAP into a more generic KVM_CAP_GUEST_MEMFD_FLAGS capability so that adding new flags doesn't require a new capability, and so that developers aren't tempted to bundle multiple flags into a single capability. Note, kvm_vm_ioctl_check_extension_generic() can only return a 32-bit value, but that limitation can be easily circumvented by adding e.g. KVM_CAP_GUEST_MEMFD_FLAGS2 in the unlikely event guest_memfd supports more than 32 flags. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20251003232606.4070510-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-10-10 14:25:22 -07:00
Linus Torvalds	256e341706	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull x86 kvm updates from Paolo Bonzini: "Generic: - Rework almost all of KVM's exports to expose symbols only to KVM's x86 vendor modules (kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko x86: - Rework almost all of KVM x86's exports to expose symbols only to KVM's vendor modules, i.e. to kvm-{amd,intel}.ko - Add support for virtualizing Control-flow Enforcement Technology (CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks). It is worth noting that while SHSTK and IBT can be enabled separately in CPUID, it is not really possible to virtualize them separately. Therefore, Intel processors will really allow both SHSTK and IBT under the hood if either is made visible in the guest's CPUID. The alternative would be to intercept XSAVES/XRSTORS, which is not feasible for performance reasons - Fix a variety of fuzzing WARNs all caused by checking L1 intercepts when completing userspace I/O. KVM has already committed to allowing L2 to to perform I/O at that point - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is supposed to exist for v2 PMUs - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs - Add support for the immediate forms of RDMSR and WRMSRNS, sans full emulator support (KVM should never need to emulate the MSRs outside of forced emulation and other contrived testing scenarios) - Clean up the MSR APIs in preparation for CET and FRED virtualization, as well as mediated vPMU support - Clean up a pile of PMU code in anticipation of adding support for mediated vPMUs - Reject in-kernel IOAPIC/PIT for TDX VMs, as KVM can't obtain EOI vmexits needed to faithfully emulate an I/O APIC for such guests - Many cleanups and minor fixes - Recover possible NX huge pages within the TDP MMU under read lock to reduce guest jitter when restoring NX huge pages - Return -EAGAIN during prefault if userspace concurrently deletes/moves the relevant memslot, to fix an issue where prefaulting could deadlock with the memslot update x86 (AMD): - Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is supported - Require a minimum GHCB version of 2 when starting SEV-SNP guests via KVM_SEV_INIT2 so that invalid GHCB versions result in immediate errors instead of latent guest failures - Add support for SEV-SNP's CipherText Hiding, an opt-in feature that prevents unauthorized CPU accesses from reading the ciphertext of SNP guest private memory, e.g. to attempt an offline attack. This feature splits the shared SEV-ES/SEV-SNP ASID space into separate ranges for SEV-ES and SEV-SNP guests, therefore a new module parameter is needed to control the number of ASIDs that can be used for VMs with CipherText Hiding vs. how many can be used to run SEV-ES guests - Add support for Secure TSC for SEV-SNP guests, which prevents the untrusted host from tampering with the guest's TSC frequency, while still allowing the the VMM to configure the guest's TSC frequency prior to launch - Validate the XCR0 provided by the guest (via the GHCB) to avoid bugs resulting from bogus XCR0 values - Save an SEV guest's policy if and only if LAUNCH_START fully succeeds to avoid leaving behind stale state (thankfully not consumed in KVM) - Explicitly reject non-positive effective lengths during SNP's LAUNCH_UPDATE instead of subtly relying on guest_memfd to deal with them - Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the host's desired TSC_AUX, to fix a bug where KVM was keeping a different vCPU's TSC_AUX in the host MSR until return to userspace KVM (Intel): - Preparation for FRED support - Don't retry in TDX's anti-zero-step mitigation if the target memslot is invalid, i.e. is being deleted or moved, to fix a deadlock scenario similar to the aforementioned prefaulting case - Misc bugfixes and minor cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits) KVM: x86: Export KVM-internal symbols for sub-modules only KVM: x86: Drop pointless exports of kvm_arch_xxx() hooks KVM: x86: Move kvm_intr_is_single_vcpu() to lapic.c KVM: Export KVM-internal symbols for sub-modules only KVM: s390/vfio-ap: Use kvm_is_gpa_in_memslot() instead of open coded equivalent KVM: VMX: Make CR4.CET a guest owned bit KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported KVM: selftests: Add coverage for KVM-defined registers in MSRs test KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test KVM: selftests: Extend MSRs test to validate vCPUs without supported features KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test KVM: selftests: Add an MSR test to exercise guest/host and read/write KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors KVM: x86: Define Control Protection Exception (#CP) vector KVM: x86: Add human friendly formatting for #XM, and #VE KVM: SVM: Enable shadow stack virtualization for SVM KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid KVM: SVM: Pass through shadow stack MSRs as appropriate KVM: SVM: Update dump_vmcb with shadow stack save area additions KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02 ...	2025-10-06 12:37:34 -07:00
Linus Torvalds	f3826aa996	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "This excludes the bulk of the x86 changes, which I will send separately. They have two not complex but relatively unusual conflicts so I will wait for other dust to settle. guest_memfd: - Add support for host userspace mapping of guest_memfd-backed memory for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have no way to detect private vs. shared). This lays the groundwork for removal of guest memory from the kernel direct map, as well as for limited mmap() for guest_memfd-backed memory. For more information see: - commit `a6ad54137a` ("Merge branch 'guest-memfd-mmap' into HEAD") - guest_memfd in Firecracker: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding - direct map removal: https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/ - mmap support: https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/ ARM: - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature. LoongArch: - Detect page table walk feature on new hardware - Add sign extension with kernel MMIO/IOCSR emulation - Improve in-kernel IPI emulation - Improve in-kernel PCH-PIC emulation - Move kvm_iocsr tracepoint out of generic code RISC-V: - Added SBI FWFT extension for Guest/VM with misaligned delegation and pointer masking PMLEN features - Added ONE_REG interface for SBI FWFT extension - Added Zicbop and bfloat16 extensions for Guest/VM - Enabled more common KVM selftests for RISC-V - Added SBI v3.0 PMU enhancements in KVM and perf driver s390: - Improve interrupt cpu for wakeup, in particular the heuristic to decide which vCPU to deliver a floating interrupt to. - Clear the PTE when discarding a swapped page because of CMMA; this bug was introduced in 6.16 when refactoring gmap code. x86 selftests: - Add #DE coverage in the fastops test (the only exception that's guest- triggerable in fastop-emulated instructions). - Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra Forest (SRF) and Clearwater Forest (CWF). - Minor cleanups and improvements x86 (guest side): - For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping could map devices as WB and prevent the device drivers from mapping their devices with UC/UC-. - Make kvm_async_pf_task_wake() a local static helper and remove its export. - Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU bindings even when PV_UNHALT is unsupported. Generic: - Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as __GFP_NOWARN is now included in GFP_NOWAIT. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits) KVM: s390: Fix to clear PTE when discarding a swapped page KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs KVM: arm64: selftests: Cope with arch silliness in EL2 selftest KVM: arm64: selftests: Add basic test for running in VHE EL2 KVM: arm64: selftests: Enable EL2 by default KVM: arm64: selftests: Initialize HCR_EL2 KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2 KVM: arm64: selftests: Select SMCCC conduit based on current EL KVM: arm64: selftests: Provide helper for getting default vCPU target KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts KVM: arm64: selftests: Create a VGICv3 for 'default' VMs KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation KVM: arm64: selftests: Add helper to check for VGICv3 support KVM: arm64: selftests: Initialize VGICv3 only once KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code KVM: selftests: Add ex_str() to print human friendly name of exception vectors selftests/kvm: remove stale TODO in xapic_state_test KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount ...	2025-10-04 08:52:16 -07:00
Paolo Bonzini	12abeb81c8	Merge tag 'kvm-x86-cet-6.18' of https://github.com/kvm-x86/linux into HEAD KVM x86 CET virtualization support for 6.18 Add support for virtualizing Control-flow Enforcement Technology (CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks). CET is comprised of two distinct features, Shadow Stacks (SHSTK) and Indirect Branch Tracking (IBT), that can be utilized by software to help provide Control-flow integrity (CFI). SHSTK defends against backward-edge attacks (a.k.a. Return-oriented programming (ROP)), while IBT defends against forward-edge attacks (a.k.a. similarly CALL/JMP-oriented programming (COP/JOP)). Attackers commonly use ROP and COP/JOP methodologies to redirect the control- flow to unauthorized targets in order to execute small snippets of code, a.k.a. gadgets, of the attackers choice. By chaining together several gadgets, an attacker can perform arbitrary operations and circumvent the system's defenses. SHSTK defends against backward-edge attacks, which execute gadgets by modifying the stack to branch to the attacker's target via RET, by providing a second stack that is used exclusively to track control transfer operations. The shadow stack is separate from the data/normal stack, and can be enabled independently in user and kernel mode. When SHSTK is is enabled, CALL instructions push the return address on both the data and shadow stack. RET then pops the return address from both stacks and compares the addresses. If the return addresses from the two stacks do not match, the CPU generates a Control Protection (#CP) exception. IBT defends against backward-edge attacks, which branch to gadgets by executing indirect CALL and JMP instructions with attacker controlled register or memory state, by requiring the target of indirect branches to start with a special marker instruction, ENDBRANCH. If an indirect branch is executed and the next instruction is not an ENDBRANCH, the CPU generates a #CP. Note, ENDBRANCH behaves as a NOP if IBT is disabled or unsupported. From a virtualization perspective, CET presents several problems. While SHSTK and IBT have two layers of enabling, a global control in the form of a CR4 bit, and a per-feature control in user and kernel (supervisor) MSRs (U_CET and S_CET respectively), the {S,U}_CET MSRs can be context switched via XSAVES/XRSTORS. Practically speaking, intercepting and emulating XSAVES/XRSTORS is not a viable option due to complexity, and outright disallowing use of XSTATE to context switch SHSTK/IBT state would render the features unusable to most guests. To limit the overall complexity without sacrificing performance or usability, simply ignore the potential virtualization hole, but ensure that all paths in KVM treat SHSTK/IBT as usable by the guest if the feature is supported in hardware, and the guest has access to at least one of SHSTK or IBT. I.e. allow userspace to advertise one of SHSTK or IBT if both are supported in hardware, even though doing so would allow a misbehaving guest to use the unadvertised feature. Fully emulating SHSTK and IBT would also require significant complexity, e.g. to track and update branch state for IBT, and shadow stack state for SHSTK. Given that emulating large swaths of the guest code stream isn't necessary on modern CPUs, punt on emulating instructions that meaningful impact or consume SHSTK or IBT. However, instead of doing nothing, explicitly reject emulation of such instructions so that KVM's emulator can't be abused to circumvent CET. Disable support for SHSTK and IBT if KVM is configured such that emulation of arbitrary guest instructions may be required, specifically if Unrestricted Guest (Intel only) is disabled, or if KVM will emulate a guest.MAXPHYADDR that is smaller than host.MAXPHYADDR. Lastly disable SHSTK support if shadow paging is enabled, as the protections for the shadow stack are novel (shadow stacks require Writable=0,Dirty=1, so that they can't be directly modified by software), i.e. would require non-trivial support in the Shadow MMU. Note, AMD CPUs currently only support SHSTK. Explicitly disable IBT support so that KVM doesn't over-advertise if AMD CPUs add IBT, and virtualizing IBT in SVM requires KVM modifications.	2025-09-30 13:37:14 -04:00
Paolo Bonzini	d05ca6b793	Merge tag 'kvm-x86-misc-6.18' of https://github.com/kvm-x86/linux into HEAD KVM x86 changes for 6.18 - Don't (re)check L1 intercepts when completing userspace I/O to fix a flaw where a misbehaving usersepace (a.k.a. syzkaller) could swizzle L1's intercepts and trigger a variety of WARNs in KVM. - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is supposed to exist for v2 PMUs. - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs. - Clean up KVM's vector hashing code for delivering lowest priority IRQs. - Clean up the fastpath handler code to only handle IPIs and WRMSRs that are actually "fast", as opposed to handling those that KVM _hopes_ are fast, and in the process of doing so add fastpath support for TSC_DEADLINE writes on AMD CPUs. - Clean up a pile of PMU code in anticipation of adding support for mediated vPMUs. - Add support for the immediate forms of RDMSR and WRMSRNS, sans full emulator support (KVM should never need to emulate the MSRs outside of forced emulation and other contrived testing scenarios). - Clean up the MSR APIs in preparation for CET and FRED virtualization, as well as mediated vPMU support. - Rejecting a fully in-kernel IRQCHIP if EOIs are protected, i.e. for TDX VMs, as KVM can't faithfully emulate an I/O APIC for such guests. - KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS in preparation for mediated vPMU support, as KVM will need to recalculate MSR intercepts in response to PMU refreshes for guests with mediated vPMUs. - Misc cleanups and minor fixes.	2025-09-30 13:36:41 -04:00
Yang Weijiang	9d6812d415	KVM: x86: Enable guest SSP read/write interface with new uAPIs Add a KVM-defined ONE_REG register, KVM_REG_GUEST_SSP, to let userspace save and restore the guest's Shadow Stack Pointer (SSP). On both Intel and AMD, SSP is a hardware register that can only be accessed by software via dedicated ISA (e.g. RDSSP) or via VMCS/VMCB fields (used by hardware to context switch SSP at entry/exit). As a result, SSP doesn't fit in any of KVM's existing interfaces for saving/restoring state. Internally, treat SSP as a fake/synthetic MSR, as the semantics of writes to SSP follow that of several other Shadow Stack MSRs, e.g. the PLx_SSP MSRs. Use a translation layer to hide the KVM-internal MSR index so that the arbitrary index doesn't become ABI, e.g. so that KVM can rework its implementation as needed, so long as the ONE_REG ABI is maintained. Explicitly reject accesses to SSP if the vCPU doesn't have Shadow Stack support to avoid running afoul of ignore_msrs, which unfortunately applies to host-initiated accesses (which is a discussion for another day). I.e. ensure consistent behavior for KVM-defined registers irrespective of ignore_msrs. Link: https://lore.kernel.org/all/aca9d389-f11e-4811-90cf-d98e345a5cc2@intel.com Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-14-seanjc@google.com Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:10:33 -07:00

1 2 3 4 5 ...

733 Commits