Commit Graph

57 Commits

Author SHA1 Message Date
Paolo Bonzini
b1195183ed Merge tag 'kvm-s390-next-7.0-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
- gmap rewrite: completely new memory management for kvm/s390
- vSIE improvement
- maintainership change for s390 vfio-pci
- small quality of life improvement for protected guests
2026-02-11 18:52:27 +01:00
Paolo Bonzini
1b13885edf Merge tag 'kvm-x86-apic-6.20' of https://github.com/kvm-x86/linux into HEAD
KVM x86 APIC-ish changes for 6.20

 - Fix a benign bug where KVM could use the wrong memslots (ignored SMM) when
   creating a vCPU-specific mapping of guest memory.

 - Clean up KVM's handling of marking mapped vCPU pages dirty.

 - Drop a pile of *ancient* sanity checks hidden behind in KVM's unused
   ASSERT() macro, most of which could be trivially triggered by the guest
   and/or user, and all of which were useless.

 - Fold "struct dest_map" into its sole user, "struct rtc_status", to make it
   more obvious what the weird parameter is used for, and to allow burying the
   RTC shenanigans behind CONFIG_KVM_IOAPIC=y.

 - Bury all of ioapic.h and KVM_IRQCHIP_KERNEL behind CONFIG_KVM_IOAPIC=y.

 - Add a regression test for recent APICv update fixes.

 - Rework KVM's handling of VMCS updates while L2 is active to temporarily
   switch to vmcs01 instead of deferring the update until the next nested
   VM-Exit.  The deferred updates approach directly contributed to several
   bugs, was proving to be a maintenance burden due to the difficulty in
   auditing the correctness of deferred updates, and was polluting
   "struct nested_vmx" with a growing pile of booleans.

 - Handle "hardware APIC ISR", a.k.a. SVI, updates in kvm_apic_update_apicv()
   to consolidate the updates, and to co-locate SVI updates with the updates
   for KVM's own cache of ISR information.

 - Drop a dead function declaration.
2026-02-11 12:45:32 -05:00
Paolo Bonzini
4215ee0d7b Merge tag 'kvm-x86-svm-6.20' of https://github.com/kvm-x86/linux into HEAD
KVM SVM changes for 6.20

 - Drop a user-triggerable WARN on nested_svm_load_cr3() failure.

 - Add support for virtualizing ERAPS.  Note, correct virtualization of ERAPS
   relies on an upcoming, publicly announced change in the APM to reduce the
   set of conditions where hardware (i.e. KVM) *must* flush the RAP.

 - Ignore nSVM intercepts for instructions that are not supported according to
   L1's virtual CPU model.

 - Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath
   for EPT Misconfig.

 - Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM,
   and allow userspace to restore nested state with GIF=0.

 - Treat exit_code as an unsigned 64-bit value through all of KVM.

 - Add support for fetching SNP certificates from userspace.

 - Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD
   or VMSAVE on behalf of L2.

 - Misc fixes and cleanups.
2026-02-09 18:51:37 +01:00
Paolo Bonzini
a0c468eda4 Merge tag 'kvm-x86-selftests-6.20' of https://github.com/kvm-x86/linux into HEAD
KVM selftests changes for 6.20

 - Add a regression test for TPR<=>CR8 synchronization and IRQ masking.

 - Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support,
   and extend x86's infrastructure to support EPT and NPT (for L2 guests).

 - Extend several nested VMX tests to also cover nested SVM.

 - Add a selftest for nested VMLOAD/VMSAVE.

 - Rework the nested dirty log test, originally added as a regression test for
   PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage
   and to hopefully make the test easier to understand and maintain.
2026-02-09 18:38:54 +01:00
Paolo Bonzini
5490063269 Merge tag 'kvmarm-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 7.0

- Add support for FEAT_IDST, allowing ID registers that are not
  implemented to be reported as a normal trap rather than as an UNDEF
  exception.

- Add sanitisation of the VTCR_EL2 register, fixing a number of
  UXN/PXN/XN bugs in the process.

- Full handling of RESx bits, instead of only RES0, and resulting in
  SCTLR_EL2 being added to the list of sanitised registers.

- More pKVM fixes for features that are not supposed to be exposed to
  guests.

- Make sure that MTE being disabled on the pKVM host doesn't give it
  the ability to attack the hypervisor.

- Allow pKVM's host stage-2 mappings to use the Force Write Back
  version of the memory attributes by using the "pass-through'
  encoding.

- Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
  guest.

- Preliminary work for guest GICv5 support.

- A bunch of debugfs fixes, removing pointless custom iterators stored
  in guest data structures.

- A small set of FPSIMD cleanups.

- Selftest fixes addressing the incorrect alignment of page
  allocation.

- Other assorted low-impact fixes and spelling fixes.
2026-02-09 18:18:19 +01:00
Paolo Bonzini
c14f646638 Merge tag 'loongarch-kvm-6.20' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.20

1. Add more CPUCFG mask bits.
2. Improve feature detection.
3. Add FPU/LBT delay load support.
4. Set default return value in KVM IO bus ops.
5. Add paravirt preempt feature support.
6. Add KVM steal time test case for tools/selftests.
2026-02-09 18:17:01 +01:00
Bibo Mao
2d94a3f708 KVM: LoongArch: selftests: Add steal time test case
LoongArch KVM supports steal time accounting now, here add steal time
test case on LoongArch.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-06 09:28:01 +08:00
Claudio Imbrenda
52940a34a8 KVM: s390: selftests: Add selftest for the KVM_S390_KEYOP ioctl
This test allows to test the various storage key handling functions.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-04 17:00:10 +01:00
Zhiquan Li
e396a74222 KVM: selftests: Add -U_FORTIFY_SOURCE to avoid some unpredictable test failures
Some distributions (such as Ubuntu) configure GCC so that
_FORTIFY_SOURCE is automatically enabled at -O1 or above.  This results
in some fortified version of definitions of standard library functions
are included.  While linker resolves the symbols, the fortified versions
might override the definitions in lib/string_override.c and reference to
those PLT entries in GLIBC.  This is not a problem for the code in host,
but it is a disaster for the guest code.  E.g., if build and run
x86/nested_emulation_test on Ubuntu 24.04 will encounter a L1 #PF due to
memset() reference to __memset_chk@plt.

The option -fno-builtin-memset is not helpful here, because those
fortified versions are not built-in but some definitions which are
included by header, they are for different intentions.

In order to eliminate the unpredictable behaviors may vary depending on
the linker and platform, add the "-U_FORTIFY_SOURCE" into CFLAGS to
prevent from introducing the fortified definitions.

Signed-off-by: Zhiquan Li <zhiquan_li@163.com>
Link: https://patch.msgid.link/20260122053551.548229-1-zhiquan_li@163.com
Fixes: 6b6f71484b ("KVM: selftests: Implement memcmp(), memcpy(), and memset() for guest use")
Cc: stable@vger.kernel.org
[sean: tag for stable]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-23 08:38:31 -08:00
Marc Zyngier
b638a9d0f8 KVM: arm64: selftests: Add a test for FEAT_IDST
Add a very basic test checking that FEAT_IDST actually works for
the {GMID,SMIDR,CSSIDR2}_EL1 registers.

Link: https://patch.msgid.link/20260108173233.2911955-10-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15 11:58:57 +00:00
Yosry Ahmed
55058e3215 KVM: selftests: Add a selftests for nested VMLOAD/VMSAVE
Add a test for VMLOAD/VMSAVE in an L2 guest. The test verifies that L1
intercepts for VMSAVE/VMLOAD always work regardless of
VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK.

Then, more interestingly, it makes sure that when L1 does not intercept
VMLOAD/VMSAVE, they work as intended in L2. When
VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK is enabled by L1, VMSAVE/VMLOAD from
L2 should interpret the GPA as an L2 GPA and translate it through the
NPT. When VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK is disabled by L1,
VMSAVE/VMLOAD from L2 should interpret the GPA as an L1 GPA.

To test this, put two VMCBs (0 and 1) in L1's physical address space,
and have a single L2 GPA where:
- L2 VMCB GPA == L1 VMCB(0) GPA
- L2 VMCB GPA maps to L1 VMCB(1) via the NPT in L1.

This setup allows detecting how the GPA is interpreted based on which L1
VMCB is actually accessed.

In both cases, L2 sets KERNEL_GS_BASE (one of the fields handled by
VMSAVE/VMLOAD), and executes VMSAVE to write its value to the VMCB. The
test userspace code then checks that the write was made to the correct
VMCB (based on whether VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK is set by L1),
and writes a new value to that VMCB. L2 then executes VMLOAD to load the
new value and makes sure it's reflected correctly in KERNERL_GS_BASE.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20260110004821.3411245-4-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-14 14:09:10 -08:00
Sean Christopherson
c3a9a27c79 KVM: selftests: Add a test to verify APICv updates (while L2 is active)
Add a test to verify KVM correctly handles a variety of edge cases related
to APICv updates, and in particular updates that are triggered while L2 is
actively running.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://patch.msgid.link/20260109034532.1012993-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13 17:35:31 -08:00
Yosry Ahmed
ca2eccb953 KVM: selftests: Extend vmx_set_nested_state_test to cover SVM
Add test cases for the validation checks in svm_set_nested_state(), and
allow the test to run with SVM as well as VMX. The SVM test also makes
sure that KVM_SET_NESTED_STATE accepts GIF being set or cleared if
EFER.SVME is cleared, verifying a recently fixed bug where GIF was
incorrectly expected to always be set when EFER.SVME is cleared.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251121204803.991707-5-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:54:19 -08:00
Yosry Ahmed
6794d916f8 KVM: selftests: Extend vmx_dirty_log_test to cover SVM
Generalize the code in vmx_dirty_log_test.c by adding SVM-specific L1
code, doing some renaming (e.g. EPT -> TDP), and having setup code for
both SVM and VMX in test_dirty_log().

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-19-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 12:02:16 -08:00
Maciej S. Szmigiero
0b28194c4c KVM: selftests: Test TPR / CR8 sync and interrupt masking
Add a few extra TPR / CR8 tests to x86's xapic_state_test to see if:
  * TPR is 0 on reset,
  * TPR, PPR and CR8 are equal inside the guest,
  * TPR and CR8 read equal by the host after a VMExit
  * TPR borderline values set by the host correctly mask interrupts in the
    guest.

These hopefully will catch the most obvious cases of improper TPR sync or
interrupt masking.

Do these tests both in x2APIC and xAPIC modes.
The x2APIC mode uses SELF_IPI register to trigger interrupts to give it a
bit of exercise too.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
[sean: put code in separate test]
Link: https://patch.msgid.link/20251205224937.428122-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08 10:50:50 -08:00
Paolo Bonzini
e0c26d47de Merge tag 'kvm-s390-next-6.19-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
- SCA rework
- VIRT_XFER_TO_GUEST_WORK support
- Operation exception forwarding support
- Cleanups
2025-12-02 18:58:47 +01:00
Paolo Bonzini
f58e70cc31 Merge tag 'kvmarm-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.19

 - Support for userspace handling of synchronous external aborts (SEAs),
   allowing the VMM to potentially handle the abort in a non-fatal
   manner.

 - Large rework of the VGIC's list register handling with the goal of
   supporting more active/pending IRQs than available list registers in
   hardware. In addition, the VGIC now supports EOImode==1 style
   deactivations for IRQs which may occur on a separate vCPU than the
   one that acked the IRQ.

 - Support for FEAT_XNX (user / privileged execute permissions) and
   FEAT_HAF (hardware update to the Access Flag) in the software page
   table walkers and shadow MMU.

 - Allow page table destruction to reschedule, fixing long need_resched
   latencies observed when destroying a large VM.

 - Minor fixes to KVM and selftests
2025-12-02 18:36:26 +01:00
Paolo Bonzini
8040280405 Merge tag 'loongarch-kvm-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.19

1. Get VM PMU capability from HW GCFG register.
2. Add AVEC basic support.
3. Use 64-bit register definition for EIOINTC.
4. Add KVM timer test cases for tools/selftests.
2025-12-02 18:34:22 +01:00
Oliver Upton
3eef0c83c3 Merge branch 'kvm-arm64/nv-xnx-haf' into kvmarm/next
* kvm-arm64/nv-xnx-haf: (22 commits)
  : Support for FEAT_XNX and FEAT_HAF in nested
  :
  : Add support for a couple of MMU-related features that weren't
  : implemented by KVM's software page table walk:
  :
  :  - FEAT_XNX: Allows the hypervisor to describe execute permissions
  :    separately for EL0 and EL1
  :
  :  - FEAT_HAF: Hardware update of the Access Flag, which in the context of
  :    nested means software walkers must also set the Access Flag.
  :
  : The series also adds some basic support for testing KVM's emulation of
  : the AT instruction, including the implementation detail that AT sets the
  : Access Flag in KVM.
  KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
  KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
  KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
  KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
  KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
  KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
  KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
  KVM: arm64: selftests: Add test for AT emulation
  KVM: arm64: nv: Expose hardware access flag management to NV guests
  KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
  KVM: arm64: Implement HW access flag management in stage-1 SW PTW
  KVM: arm64: Propagate PTW errors up to AT emulation
  KVM: arm64: Add helper for swapping guest descriptor
  KVM: arm64: nv: Use pgtable definitions in stage-2 walk
  KVM: arm64: Handle endianness in read helper for emulated PTW
  KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
  KVM: arm64: Call helper for reading descriptors directly
  KVM: arm64: nv: Advertise support for FEAT_XNX
  KVM: arm64: Teach ptdump about FEAT_XNX permissions
  KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2
  ...

Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:47:41 -08:00
Oliver Upton
66f1888583 KVM: arm64: selftests: Add test for AT emulation
Add a basic test for AT emulation in the EL2&0 and EL1&0 translation
regimes.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-16-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01 00:44:02 -08:00
Bibo Mao
df41742343 KVM: LoongArch: selftests: Add timer interrupt test case
Add timer test case based on common arch_timer code, timer interrupt
with one-shot and period mode is tested.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-11-28 14:49:44 +08:00
Janosch Frank
8e8678e740 KVM: s390: Add capability that forwards operation exceptions
Setting KVM_CAP_S390_USER_OPEREXEC will forward all operation
exceptions to user space. This also includes the 0x0000 instructions
managed by KVM_CAP_S390_USER_INSTR0. It's helpful if user space wants
to emulate instructions which do not (yet) have an opcode.

While we're at it refine the documentation for
KVM_CAP_S390_USER_INSTR0.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2025-11-21 10:26:03 +01:00
Jim Mattson
6a8818de21 KVM: selftests: Add a VMX test for LA57 nested state
Add a selftest that verifies KVM's ability to save and restore
nested state when the L1 guest is using 5-level paging and the L2
guest is using 4-level paging. Specifically, canonicality tests of
the VMCS12 host-state fields should accept 57-bit virtual addresses.

Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://patch.msgid.link/20251028225827.2269128-5-jmattson@google.com
[sean: rename to vmx_nested_la57_state_test to prep nested_<test> namespace]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:21:52 -08:00
Yosry Ahmed
3c40777f0e KVM: selftests: Extend vmx_tsc_adjust_test to cover SVM
Add SVM L1 code to run the nested guest, and allow the test to run with
SVM as well as VMX.

Reviewed-by: Jim Mattson <jmattson@google.com>

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-8-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:56 -08:00
Yosry Ahmed
4d256d00e4 KVM: selftests: Move nested invalid CR3 check to its own test
vmx_tsc_adjust_test currently verifies that a nested VMLAUNCH fails with
an invalid CR3. This is irrelevant to TSC scaling, move it to a
standalone test.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-6-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:54 -08:00
Yosry Ahmed
e6bcdd2122 KVM: selftests: Extend vmx_nested_tsc_scaling_test to cover SVM
Add SVM L1 code to run the nested guest, and allow the test to run with
SVM as well as VMX.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-5-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:54 -08:00
Yosry Ahmed
0a9eb2afa1 KVM: selftests: Extend vmx_close_while_nested_test to cover SVM
Add SVM L1 code to run the nested guest, and allow the test to run with
SVM as well as VMX.

Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-4-yosry.ahmed@linux.dev
[sean: rename to "nested_close_kvm_test" to provide nested_* sorting]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20 16:19:53 -08:00
Jiaqi Yan
feee9ef7ac KVM: selftests: Test for KVM_EXIT_ARM_SEA
Test how KVM handles guest SEA when APEI is unable to claim it, and
KVM_CAP_ARM_SEA_TO_USER is enabled.

The behavior is triggered by consuming recoverable memory error (UER)
injected via EINJ. The test asserts two major things:
1. KVM returns to userspace with KVM_EXIT_ARM_SEA exit reason, and
   has provided expected fault information, e.g. esr, flags, gva, gpa.
2. Userspace is able to handle KVM_EXIT_ARM_SEA by injecting SEA to
   guest and KVM injects expected SEA into the VCPU.

Tested on a data center server running Siryn AmpereOne processor
that has RAS support.

Several things to notice before attempting to run this selftest:
- The test relies on EINJ support in both firmware and kernel to
  inject UER. Otherwise the test will be skipped.
- The under-test platform's APEI should be unable to claim the SEA.
  Otherwise the test will be skipped.
- Some platform doesn't support notrigger in EINJ, which may cause
  APEI and GHES to offline the memory before guest can consume
  injected UER, and making test unable to trigger SEA.

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Link: https://msgid.link/20251013185903.1372553-3-jiaqiyan@google.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-12 01:27:16 -08:00
Paolo Bonzini
12abeb81c8 Merge tag 'kvm-x86-cet-6.18' of https://github.com/kvm-x86/linux into HEAD
KVM x86 CET virtualization support for 6.18

Add support for virtualizing Control-flow Enforcement Technology (CET) on
Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks).

CET is comprised of two distinct features, Shadow Stacks (SHSTK) and Indirect
Branch Tracking (IBT), that can be utilized by software to help provide
Control-flow integrity (CFI).  SHSTK defends against backward-edge attacks
(a.k.a. Return-oriented programming (ROP)), while IBT defends against
forward-edge attacks (a.k.a. similarly CALL/JMP-oriented programming (COP/JOP)).

Attackers commonly use ROP and COP/JOP methodologies to redirect the control-
flow to unauthorized targets in order to execute small snippets of code,
a.k.a. gadgets, of the attackers choice.  By chaining together several gadgets,
an attacker can perform arbitrary operations and circumvent the system's
defenses.

SHSTK defends against backward-edge attacks, which execute gadgets by modifying
the stack to branch to the attacker's target via RET, by providing a second
stack that is used exclusively to track control transfer operations.  The
shadow stack is separate from the data/normal stack, and can be enabled
independently in user and kernel mode.

When SHSTK is is enabled, CALL instructions push the return address on both the
data and shadow stack. RET then pops the return address from both stacks and
compares the addresses.  If the return addresses from the two stacks do not
match, the CPU generates a Control Protection (#CP) exception.

IBT defends against backward-edge attacks, which branch to gadgets by executing
indirect CALL and JMP instructions with attacker controlled register or memory
state, by requiring the target of indirect branches to start with a special
marker instruction, ENDBRANCH.  If an indirect branch is executed and the next
instruction is not an ENDBRANCH, the CPU generates a #CP.  Note, ENDBRANCH
behaves as a NOP if IBT is disabled or unsupported.

From a virtualization perspective, CET presents several problems.  While SHSTK
and IBT have two layers of enabling, a global control in the form of a CR4 bit,
and a per-feature control in user and kernel (supervisor) MSRs (U_CET and S_CET
respectively), the {S,U}_CET MSRs can be context switched via XSAVES/XRSTORS.
Practically speaking, intercepting and emulating XSAVES/XRSTORS is not a viable
option due to complexity, and outright disallowing use of XSTATE to context
switch SHSTK/IBT state would render the features unusable to most guests.

To limit the overall complexity without sacrificing performance or usability,
simply ignore the potential virtualization hole, but ensure that all paths in
KVM treat SHSTK/IBT as usable by the guest if the feature is supported in
hardware, and the guest has access to at least one of SHSTK or IBT.  I.e. allow
userspace to advertise one of SHSTK or IBT if both are supported in hardware,
even though doing so would allow a misbehaving guest to use the unadvertised
feature.

Fully emulating SHSTK and IBT would also require significant complexity, e.g.
to track and update branch state for IBT, and shadow stack state for SHSTK.
Given that emulating large swaths of the guest code stream isn't necessary on
modern CPUs, punt on emulating instructions that meaningful impact or consume
SHSTK or IBT.  However, instead of doing nothing, explicitly reject emulation
of such instructions so that KVM's emulator can't be abused to circumvent CET.
Disable support for SHSTK and IBT if KVM is configured such that emulation of
arbitrary guest instructions may be required, specifically if Unrestricted
Guest (Intel only) is disabled, or if KVM will emulate a guest.MAXPHYADDR that
is smaller than host.MAXPHYADDR.

Lastly disable SHSTK support if shadow paging is enabled, as the protections
for the shadow stack are novel (shadow stacks require Writable=0,Dirty=1, so
that they can't be directly modified by software), i.e. would require
non-trivial support in the Shadow MMU.

Note, AMD CPUs currently only support SHSTK.  Explicitly disable IBT support
so that KVM doesn't over-advertise if AMD CPUs add IBT, and virtualizing IBT
in SVM requires KVM modifications.
2025-09-30 13:37:14 -04:00
Paolo Bonzini
924ccf1d09 Merge tag 'kvm-riscv-6.18-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv changes for 6.18

- Added SBI FWFT extension for Guest/VM with misaligned
  delegation and pointer masking PMLEN features
- Added ONE_REG interface for SBI FWFT extension
- Added Zicbop and bfloat16 extensions for Guest/VM
- Enabled more common KVM selftests for RISC-V such as
  access_tracking_perf_test, dirty_log_perf_test,
  memslot_modification_stress_test, memslot_perf_test,
  mmu_stress_test, and rseq_test
- Added SBI v3.0 PMU enhancements in KVM and perf driver
2025-09-30 13:23:36 -04:00
Paolo Bonzini
924ebaefce Merge tag 'kvmarm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.18

- Add support for FF-A 1.2 as the secure memory conduit for pKVM,
  allowing more registers to be used as part of the message payload.

- Change the way pKVM allocates its VM handles, making sure that the
  privileged hypervisor is never tricked into using uninitialised
  data.

- Speed up MMIO range registration by avoiding unnecessary RCU
  synchronisation, which results in VMs starting much quicker.

- Add the dump of the instruction stream when panic-ing in the EL2
  payload, just like the rest of the kernel has always done. This will
  hopefully help debugging non-VHE setups.

- Add 52bit PA support to the stage-1 page-table walker, and make use
  of it to populate the fault level reported to the guest on failing
  to translate a stage-1 walk.

- Add NV support to the GICv3-on-GICv5 emulation code, ensuring
  feature parity for guests, irrespective of the host platform.

- Fix some really ugly architecture problems when dealing with debug
  in a nested VM. This has some bad performance impacts, but is at
  least correct.

- Add enough infrastructure to be able to disable EL2 features and
  give effective values to the EL2 control registers. This then allows
  a bunch of features to be turned off, which helps cross-host
  migration.

- Large rework of the selftest infrastructure to allow most tests to
  transparently run at EL2. This is the first step towards enabling
  NV testing.

- Various fixes and improvements all over the map, including one BE
  fix, just in time for the removal of the feature.
2025-09-30 13:23:28 -04:00
Paolo Bonzini
8cbb0df294 Merge tag 'kvmarm-fixes-6.17-2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, round #3

 - Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
   visiting from an MMU notifier

 - Fixes to the TLB match process and TLB invalidation range for
   managing the VCNR pseudo-TLB

 - Prevent SPE from erroneously profiling guests due to UNKNOWN reset
   values in PMSCR_EL1

 - Fix save/restore of host MDCR_EL2 to account for eagerly programming
   at vcpu_load() on VHE systems

 - Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios
   where an xarray's spinlock was nested with a *raw* spinlock

 - Permit stage-2 read permission aborts which are possible in the case
   of NV depending on the guest hypervisor's stage-2 translation

 - Call raw_spin_unlock() instead of the internal spinlock API

 - Fix parameter ordering when assigning VBAR_EL1

[Pull into kvm/master to fix conflicts. - Paolo]
2025-09-30 13:23:06 -04:00
Oliver Upton
f677b0efa9 KVM: arm64: selftests: Add basic test for running in VHE EL2
Add an embarrassingly simple selftest for sanity checking KVM's VHE EL2
and test that the ID register bits are consistent with HCR_EL2.E2H being
RES1.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24 19:23:32 +01:00
Sean Christopherson
9c38ddb3df KVM: selftests: Add an MSR test to exercise guest/host and read/write
Add a selftest to verify reads and writes to various MSRs, from both the
guest and host, and expect success/failure based on whether or not the
vCPU supports the MSR according to supported CPUID.

Note, this test is extremely similar to KVM-Unit-Test's "msr" test, but
provides more coverage with respect to host accesses, and will be extended
to provide addition testing of CPUID-based features, save/restore lists,
and KVM_{G,S}ET_ONE_REG, all which are extremely difficult to validate in
KUT.

If kvm.ignore_msrs=true, skip the unsupported and reserved testcases as
KVM's ABI is a mess; what exactly is supposed to be ignored, and when,
varies wildly.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-46-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23 09:51:56 -07:00
Quan Zhou
dbe3d1d160 KVM: riscv: selftests: Add common supported test cases
Some common KVM test cases are supported on riscv now as following:

    access_tracking_perf_test
    dirty_log_perf_test
    memslot_modification_stress_test
    memslot_perf_test
    mmu_stress_test
    rseq_test

Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Signed-off-by: Dong Yang <dayss1224@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/c447f18115b27562cd65863645e41a5ef89bd37b.1756710918.git.dayss1224@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
2025-09-16 10:53:58 +05:30
Paolo Bonzini
42a0305ab1 Merge tag 'kvmarm-fixes-6.17-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, take #2

 - Correctly handle 'invariant' system registers for protected VMs

 - Improved handling of VNCR data aborts, including external aborts

 - Fixes for handling of FEAT_RAS for NV guests, providing a sane
   fault context during SEA injection and preventing the use of
   RASv1p1 fault injection hardware

 - Ensure that page table destruction when a VM is destroyed gives an
   opportunity to reschedule

 - Large fix to KVM's infrastructure for managing guest context loaded
   on the CPU, addressing issues where the output of AT emulation
   doesn't get reflected to the guest

 - Fix AT S12 emulation to actually perform stage-2 translation when
   necessary

 - Avoid attempting vLPI irqbypass when GICv4 has been explicitly
   disabled for a VM

 - Minor KVM + selftest fixes
2025-08-29 12:57:31 -04:00
Fuad Tabba
692f6ecf38 KVM: selftests: Do not use hardcoded page sizes in guest_memfd test
Update the guest_memfd_test selftest to use getpagesize() instead of
hardcoded 4KB page size values.

Using hardcoded page sizes can cause test failures on architectures or
systems configured with larger page sizes, such as arm64 with 64KB
pages. By dynamically querying the system's page size, the test becomes
more portable and robust across different environments.

Additionally, build the guest_memfd_test selftest for arm64.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Suggested-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-23-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-27 04:37:03 -04:00
Marc Zyngier
85acc29f90 KVM: arm64: selftest: Add standalone test checking for KVM's own UUID
Tinkering with UUIDs is a perilious task, and the KVM UUID gets
broken at times. In order to spot this early enough, add a selftest
that will shout if the expected value isn't found.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20250721130558.50823-1-jackabt.amazon@gmail.com
Link: https://lore.kernel.org/r/20250806171341.1521210-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-08 01:30:16 -07:00
Paolo Bonzini
314b40b3b6 Merge tag 'kvmarm-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, round #1

 - Host driver for GICv5, the next generation interrupt controller for
   arm64, including support for interrupt routing, MSIs, interrupt
   translation and wired interrupts.

 - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
   GICv5 hardware, leveraging the legacy VGIC interface.

 - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
   userspace to disable support for SGIs w/o an active state on hardware
   that previously advertised it unconditionally.

 - Map supporting endpoints with cacheable memory attributes on systems
   with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
   maintenance on the address range.

 - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
   hypervisor to inject external aborts into an L2 VM and take traps of
   masked external aborts to the hypervisor.

 - Convert more system register sanitization to the config-driven
   implementation.

 - Fixes to the visibility of EL2 registers, namely making VGICv3 system
   registers accessible through the VGIC device instead of the ONE_REG
   vCPU ioctls.

 - Various cleanups and minor fixes.
2025-07-29 12:27:40 -04:00
Paolo Bonzini
1a14928e2e Merge tag 'kvm-x86-misc-6.17' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.17

 - Prevert the host's DEBUGCTL.FREEZE_IN_SMM (Intel only) when running the
   guest.  Failure to honor FREEZE_IN_SMM can bleed host state into the guest.

 - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter (Intel only) to
   prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF.

 - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the
   vCPU's CPUID model.

 - Rework the MSR interception code so that the SVM and VMX APIs are more or
   less identical.

 - Recalculate all MSR intercepts from the "source" on MSR filter changes, and
   drop the dedicated "shadow" bitmaps (and their awful "max" size defines).

 - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the
   nested SVM MSRPM offsets tracker can't handle an MSR.

 - Advertise support for LKGS (Load Kernel GS base), a new instruction that's
   loosely related to FRED, but is supported and enumerated independently.

 - Fix a user-triggerable WARN that syzkaller found by stuffing INIT_RECEIVED,
   a.k.a. WFS, and then putting the vCPU into VMX Root Mode (post-VMXON).  Use
   the same approach KVM uses for dealing with "impossible" emulation when
   running a !URG guest, and simply wait until KVM_RUN to detect that the vCPU
   has architecturally impossible state.

 - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of
   APERF/MPERF reads, so that a "properly" configured VM can "virtualize"
   APERF/MPERF (with many caveats).

 - Reject KVM_SET_TSC_KHZ if vCPUs have been created, as changing the "default"
   frequency is unsupported for VMs with a "secure" TSC, and there's no known
   use case for changing the default frequency for other VM types.
2025-07-29 08:36:43 -04:00
Jim Mattson
df98ce784a KVM: selftests: Test behavior of KVM_X86_DISABLE_EXITS_APERFMPERF
For a VCPU thread pinned to a single LPU, verify that interleaved host
and guest reads of IA32_[AM]PERF return strictly increasing values when
APERFMPERF exiting is disabled.

Run the test in both L1 and L2 to verify that KVM passes through the
APERF and MPERF MSRs when L1 doesn't want to intercept them (or any MSRs).

Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250530185239.2335185-4-jmattson@google.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250626001225.744268-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-07-09 09:33:41 -07:00
Oliver Upton
2858ea3083 KVM: arm64: selftests: Add basic SError injection test
Add tests for SError injection considering KVM is more directly involved
in delivery:

 - Pending SErrors are taken at the first CSE after SErrors are unmasked

 - Pending SErrors aren't taken and remain pending if SErrors are masked

 - Unmasked SErrors are taken immediately when injected (implementation
   detail)

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-25-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08 11:36:36 -07:00
Sean Christopherson
7e9b231c40 KVM: selftests: Add a KVM_IRQFD test to verify uniqueness requirements
Add a selftest to verify that eventfd+irqfd bindings are globally unique,
i.e. that KVM doesn't allow multiple irqfds to bind to a single eventfd,
even across VMs.

Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250522235223.3178519-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-23 09:51:01 -07:00
Paolo Bonzini
4e02d4f973 Merge tag 'kvm-x86-svm-6.16' of https://github.com/kvm-x86/linux into HEAD
KVM SVM changes for 6.16:

 - Wait for target vCPU to acknowledge KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to
   fix a race between AP destroy and VMRUN.

 - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM.

 - Add support for ALLOWED_SEV_FEATURES.

 - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y.

 - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features
   that utilize those bits.

 - Don't account temporary allocations in sev_send_update_data().

 - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold.
2025-05-27 12:15:49 -04:00
Paolo Bonzini
3e0797f6dd Merge tag 'kvm-x86-selftests-6.16' of https://github.com/kvm-x86/linux into HEAD
KVM selftests changes for 6.16:

 - Add support for SNP to the various SEV selftests.

 - Add a selftest to verify fastops instructions via forced emulation.

 - Add MGLRU support to the access tracking perf test.
2025-05-27 12:15:26 -04:00
Paolo Bonzini
4d526b02df Merge tag 'kvmarm-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.16

* New features:

  - Add large stage-2 mapping support for non-protected pKVM guests,
    clawing back some performance.

  - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and
    protected modes.

  - Enable nested virtualisation support on systems that support it
    (yes, it has been a long time coming), though it is disabled by
    default.

* Improvements, fixes and cleanups:

  - Large rework of the way KVM tracks architecture features and links
    them with the effects of control bits. This ensures correctness of
    emulation (the data is automatically extracted from the published
    JSON files), and helps dealing with the evolution of the
    architecture.

  - Significant changes to the way pKVM tracks ownership of pages,
    avoiding page table walks by storing the state in the hypervisor's
    vmemmap. This in turn enables the THP support described above.

  - New selftest checking the pKVM ownership transition rules

  - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
    even if the host didn't have it.

  - Fixes for the address translation emulation, which happened to be
    rather buggy in some specific contexts.

  - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
    from the number of counters exposed to a guest and addressing a
    number of issues in the process.

  - Add a new selftest for the SVE host state being corrupted by a
    guest.

  - Keep HCR_EL2.xMO set at all times for systems running with the
    kernel at EL2, ensuring that the window for interrupts is slightly
    bigger, and avoiding a pretty bad erratum on the AmpereOne HW.

  - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
    from a pretty bad case of TLB corruption unless accesses to HCR_EL2
    are heavily synchronised.

  - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
    tables in a human-friendly fashion.

  - and the usual random cleanups.
2025-05-26 16:19:46 -04:00
Bibo Mao
a867688c8c KVM: selftests: Add supported test cases for LoongArch
Some common KVM test cases are supported on LoongArch now as following:
  coalesced_io_test
  demand_paging_test
  dirty_log_perf_test
  dirty_log_test
  guest_print_test
  hardware_disable_test
  kvm_binary_stats_test
  kvm_create_max_vcpus
  kvm_page_table_test
  memslot_modification_stress_test
  memslot_perf_test
  set_memory_region_test

And other test cases are not supported by LoongArch such as rseq_test,
since it is not supported on LoongArch physical machine either.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-05-20 20:20:26 +08:00
Nikunj A Dadhania
72df72e1c6 KVM: selftests: Add test to verify KVM_CAP_X86_BUS_LOCK_EXIT
Add a test case to verify x86's bus lock exit functionality, which is now
supported on both Intel and AMD.  Trigger bus lock exits by performing a
split-lock access, i.e. an atomic access that splits two cache lines.

Verify that the correct number of bus lock exits are generated, and that
the counter is incremented correctly and at the appropriate time based on
the underlying architecture.

Generate bus locks in both L1 and L2 (if nested virtualization is enabled),
as SVM's functionality in particular requires non-trivial logic to do the
right thing when running nested VMs.

Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Co-developed-by: Manali Shukla <manali.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
Link: https://lore.kernel.org/r/20250502050346.14274-6-manali.shukla@amd.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-19 11:05:19 -07:00
James Houghton
d166453ebd KVM: selftests: access_tracking_perf_test: Use MGLRU for access tracking
Use MGLRU's debugfs interface to do access tracking instead of
page_idle. The logic to use the page_idle bitmap is left in, as it is
useful for kernels that do not have MGLRU built in.

When MGLRU is enabled, page_idle will report pages as still idle even
after being accessed, as MGLRU doesn't necessarily clear the Idle folio
flag when accessing an idle page, so the test will not attempt to use
page_idle if MGLRU is enabled but otherwise not usable.

Aging pages with MGLRU is much faster than marking pages as idle with
page_idle.

Co-developed-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: James Houghton <jthoughton@google.com>
Link: https://lore.kernel.org/r/20250508184649.2576210-8-jthoughton@google.com
[sean: print parsed features, not raw string]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-16 12:58:21 -07:00
James Houghton
b11fcb51e2 KVM: selftests: Build and link selftests/cgroup/lib into KVM selftests
libcgroup.o is built separately from KVM selftests and cgroup selftests,
so different compiler flags used by the different selftests will not
conflict with each other.

Signed-off-by: James Houghton <jthoughton@google.com>
Link: https://lore.kernel.org/r/20250508184649.2576210-7-jthoughton@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-16 11:45:16 -07:00