KVM x86 Hyper-V changes for 6.8:
- Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
CONFIG_HYPERV as a minor optimization, and to self-document the code.
- Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation"
at build time.
The "vmxon_pa == vmcs12_pa == -1ull" test happens to work by accident: as
Enlightened VMCS is always supported, set_default_vmx_state() adds
'KVM_STATE_NESTED_EVMCS' to 'flags' and the following branch of
vmx_set_nested_state() is executed:
if ((kvm_state->flags & KVM_STATE_NESTED_EVMCS) &&
(!guest_can_use(vcpu, X86_FEATURE_VMX) ||
!vmx->nested.enlightened_vmcs_enabled))
return -EINVAL;
as 'enlightened_vmcs_enabled' is false. In fact, "vmxon_pa == vmcs12_pa ==
-1ull" is a valid state when not tainted by wrong flags so the test should
aim for this branch:
if (kvm_state->hdr.vmx.vmxon_pa == INVALID_GPA)
return 0;
Test all this properly:
- Without KVM_STATE_NESTED_EVMCS in the flags, the expected return value is
'0'.
- With KVM_STATE_NESTED_EVMCS flag (when supported) set, the expected
return value is '-EINVAL' prior to enabling eVMCS and '0' after.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20231205103630.1391318-11-vkuznets@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Swap the ordering of parameters to guest asserts related to {RD,WR}MSR
success/failure in the Hyper-V features test. As is, the output will
be mangled and broken due to passing an integer as a string and vice
versa.
Opportunistically fix a benign %u vs. %lu issue as well.
Link: https://lore.kernel.org/r/20231129224916.532431-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Convert %llx to %lx as appropriate in guest asserts. The guest printf
implementation treats them the same as KVM selftests are 64-bit only, but
strictly adhering to the correct format will allow annotating the
underlying helpers with __printf() without introducing new warnings in the
build.
Link: https://lore.kernel.org/r/20231129224916.532431-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Remove x86's mmio_warning_test, as it is unnecessarily complex (there's no
reason to fork, spawn threads, initialize srand(), etc..), unnecessarily
restrictive (triggering triple fault is not unique to Intel CPUs without
unrestricted guest), and provides no meaningful coverage beyond what
basic fuzzing can achieve (running a vCPU with garbage is fuzzing's bread
and butter).
That the test has *all* of the above flaws is not coincidental, as the
code was copy+pasted almost verbatim from the syzkaller reproducer that
originally found the KVM bug (which has long since been fixed).
Cc: Michal Luczaj <mhal@rbox.co>
Link: https://groups.google.com/g/syzkaller/c/lHfau8E3SOE
Link: https://lore.kernel.org/r/20230815220030.560372-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Pass MAGIC_TOKEN to __TEST_REQUIRE() when printing the help message about
needing to pass a magic value to manually run the NX hugepages test,
otherwise the help message will contain garbage.
In file included from x86_64/nx_huge_pages_test.c:15:
x86_64/nx_huge_pages_test.c: In function ‘main’:
include/test_util.h:40:32: error: format ‘%d’ expects a matching ‘int’ argument [-Werror=format=]
40 | ksft_exit_skip("- " fmt "\n", ##__VA_ARGS__); \
| ^~~~
x86_64/nx_huge_pages_test.c:259:9: note: in expansion of macro ‘__TEST_REQUIRE’
259 | __TEST_REQUIRE(token == MAGIC_TOKEN,
| ^~~~~~~~~~~~~~
Signed-off-by: angquan yu <angquan21@gmail.com>
Link: https://lore.kernel.org/r/20231128221105.63093-1-angquan21@gmail.com
[sean: rewrite shortlog+changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
"Testing private access when memslot gets deleted" tests the behavior
of KVM when a private memslot gets deleted while the VM is using the
private memslot. When KVM looks up the deleted (slot = NULL) memslot,
KVM should exit to userspace with KVM_EXIT_MEMORY_FAULT.
In the second test, upon a private access to non-private memslot, KVM
should also exit to userspace with KVM_EXIT_MEMORY_FAULT.
Intentionally don't take a requirement on KVM_CAP_GUEST_MEMFD,
KVM_CAP_MEMORY_FAULT_INFO, KVM_MEMORY_ATTRIBUTE_PRIVATE, etc., as it's a
KVM bug to advertise KVM_X86_SW_PROTECTED_VM without its prerequisites.
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
[sean: call out the similarities with set_memory_region_test]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-36-seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a selftest to exercise implicit/explicit conversion functionality
within KVM and verify:
- Shared memory is visible to host userspace
- Private memory is not visible to host userspace
- Host userspace and guest can communicate over shared memory
- Data in shared backing is preserved across conversions (test's
host userspace doesn't free the data)
- Private memory is bound to the lifetime of the VM
Ideally, KVM's selftests infrastructure would be reworked to allow backing
a single region of guest memory with multiple memslots for _all_ backing
types and shapes, i.e. ideally the code for using a single backing fd
across multiple memslots would work for "regular" memory as well. But
sadly, support for KVM_CREATE_GUEST_MEMFD has languished for far too long,
and overhauling selftests' memslots infrastructure would likely open a can
of worms, i.e. delay things even further.
In addition to the more obvious tests, verify that PUNCH_HOLE actually
frees memory. Directly verifying that KVM frees memory is impractical, if
it's even possible, so instead indirectly verify memory is freed by
asserting that the guest reads zeroes after a PUNCH_HOLE. E.g. if KVM
zaps SPTEs but doesn't actually punch a hole in the inode, the subsequent
read will still see the previous value. And obviously punching a hole
shouldn't cause explosions.
Let the user specify the number of memslots in the private mem conversion
test, i.e. don't require the number of memslots to be '1' or "nr_vcpus".
Creating more memslots than vCPUs is particularly interesting, e.g. it can
result in a single KVM_SET_MEMORY_ATTRIBUTES spanning multiple memslots.
To keep the math reasonable, align each vCPU's chunk to at least 2MiB (the
size is 2MiB+4KiB), and require the total size to be cleanly divisible by
the number of memslots. The goal is to be able to validate that KVM plays
nice with multiple memslots, being able to create a truly arbitrary number
of memslots doesn't add meaningful value, i.e. isn't worth the cost.
Intentionally don't take a requirement on KVM_CAP_GUEST_MEMFD,
KVM_CAP_MEMORY_FAULT_INFO, KVM_MEMORY_ATTRIBUTE_PRIVATE, etc., as it's a
KVM bug to advertise KVM_X86_SW_PROTECTED_VM without its prerequisites.
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-32-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a "vm_shape" structure to encapsulate the selftests-defined "mode",
along with the KVM-defined "type" for use when creating a new VM. "mode"
tracks physical and virtual address properties, as well as the preferred
backing memory type, while "type" corresponds to the VM type.
Taking the VM type will allow adding tests for KVM_CREATE_GUEST_MEMFD
without needing an entirely separate set of helpers. At this time,
guest_memfd is effectively usable only by confidential VM types in the
form of guest private memory, and it's expected that x86 will double down
and require unique VM types for TDX and SNP guests.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20231027182217.3615211-30-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM x86 misc changes for 6.7:
- Add CONFIG_KVM_MAX_NR_VCPUS to allow supporting up to 4096 vCPUs without
forcing more common use cases to eat the extra memory overhead.
- Add IBPB and SBPB virtualization support.
- Fix a bug where restoring a vCPU snapshot that was taken within 1 second of
creating the original vCPU would cause KVM to try to synchronize the vCPU's
TSC and thus clobber the correct TSC being set by userspace.
- Compute guest wall clock using a single TSC read to avoid generating an
inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads.
- "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain
about a "Firmware Bug" if the bit isn't set for select F/M/S combos.
- Don't apply side effects to Hyper-V's synthetic timer on writes from
userspace to fix an issue where the auto-enable behavior can trigger
spurious interrupts, i.e. do auto-enabling only for guest writes.
- Remove an unnecessary kick of all vCPUs when synchronizing the dirty log
without PML enabled.
- Advertise "support" for non-serializing FS/GS base MSR writes as appropriate.
- Use octal notation for file permissions through KVM x86.
- Fix a handful of typo fixes and warts.
KVM selftests fixes for 6.6:
- Play nice with %llx when formatting guest printf and assert statements.
- Clean up stale test metadata.
- Zero-initialize structures in memslot perf test to workaround a suspected
"may be used uninitialized" false positives from GCC.
Extend x86's state to forcefully load *all* host-supported xfeatures by
modifying xstate_bv in the saved state. Stuffing xstate_bv ensures that
the selftest is verifying KVM's full ABI regardless of whether or not the
guest code is successful in getting various xfeatures out of their INIT
state, e.g. see the disaster that is/was MPX.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230928001956.924301-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Expand x86's state test to load XSAVE state into a "dummy" vCPU prior to
KVM_SET_CPUID2, and again with an empty guest CPUID model. Except for
off-by-default features, i.e. AMX, KVM's ABI for KVM_SET_XSAVE is that
userspace is allowed to load xfeatures so long as they are supported by
the host. This is a regression test for a combination of KVM bugs where
the state saved by KVM_GET_XSAVE{2} could not be loaded via KVM_SET_XSAVE
if the saved xstate_bv would load guest-unsupported xfeatures.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230928001956.924301-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Modify support XSAVE state in the "state test's" guest code so that saving
and loading state via KVM_{G,S}ET_XSAVE actually does something useful,
i.e. so that xstate_bv in XSAVE state isn't empty.
Punt on BNDCSR for now, it's easier to just stuff that xfeature from the
host side.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230928001956.924301-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: x86: Selftests changes for 6.6:
- Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs
- Add support for printf() in guest code and covert all guest asserts to use
printf-based reporting
- Clean up the PMU event filter test and add new testcases
- Include x86 selftests in the KVM x86 MAINTAINERS entry
Explicitly set the exception vector to #UD when potentially injecting an
exception in sync_regs_test's subtests that try to detect TOCTOU bugs
in KVM's handling of exceptions injected by userspace. A side effect of
the original KVM bug was that KVM would clear the vector, but relying on
KVM to clear the vector (i.e. make it #DE) makes it less likely that the
test would ever find *new* KVM bugs, e.g. because only the first iteration
would run with a legal vector to start.
Explicitly inject #UD for race_events_inj_pen() as well, e.g. so that it
doesn't inherit the illegal 255 vector from race_events_exc(), which
currently runs first.
Link: https://lore.kernel.org/r/20230817233430.1416463-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reload known good vCPU state if the vCPU triple faults in any of the
race_sync_regs() subtests, e.g. if KVM successfully injects an exception
(the vCPU isn't configured to handle exceptions). On Intel, the VMCS
is preserved even after shutdown, but AMD's APM states that the VMCB is
undefined after a shutdown and so KVM synthesizes an INIT to sanitize
vCPU/VMCB state, e.g. to guard against running with a garbage VMCB.
The synthetic INIT results in the vCPU never exiting to userspace, as it
gets put into Real Mode at the reset vector, which is full of zeros (as is
GPA 0 and beyond), and so executes ADD for a very, very long time.
Fixes: 60c4063b47 ("KVM: selftests: Extend x86's sync_regs_test to check for event vector races")
Cc: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20230817233430.1416463-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add test cases to verify the handling of unsupported input values for the
PMU event filter. The tests cover unsupported "action" values, unsupported
"flags" values, and unsupported "nevents" values. All these cases should
return an error, as they are currently not supported by the filter.
Furthermore, the tests also cover the case where setting non-existent
fixed counters in the fixed bitmap does not fail.
Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Link: https://lore.kernel.org/r/20230810090945.16053-5-cloudliang@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add custom "__kvm_pmu_event_filter" structure to improve pmu event
filter settings. Simplifies event filter setup by organizing event
filter parameters in a cleaner, more organized way.
Alternatively, selftests could use a struct overlay ala vcpu_set_msr()
to avoid dynamically allocating the array:
struct {
struct kvm_msrs header;
struct kvm_msr_entry entry;
} buffer = {};
memset(&buffer, 0, sizeof(buffer));
buffer.header.nmsrs = 1;
buffer.entry.index = msr_index;
buffer.entry.data = msr_value;
but the extra layer added by the nested structs is counterproductive
to writing efficient, clean code.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
Link: https://lore.kernel.org/r/20230810090945.16053-4-cloudliang@tencent.com
[sean: massage changelog to explain alternative]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Drop the param-based guest assert macros and enable the printf versions
for all selftests. Note! This change can affect tests even if they
don't use directly use guest asserts! E.g. via library code, or due to
the compiler making different optimization decisions.
Link: https://lore.kernel.org/r/20230729003643.1053367-33-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Convert x86's CPUID test to use printf-based GUEST_ASSERT_EQ() so that
the test prints out debug information. Note, the test previously used
REPORT_GUEST_ASSERT_2(), but that was pointless because none of the
guest-side code passed any parameters to the assert.
Link: https://lore.kernel.org/r/20230729003643.1053367-21-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
There is already an ASSERT_EQ macro in the file
tools/testing/selftests/kselftest_harness.h, so currently KVM selftests
can't include test_util.h from the KVM selftests together with that file.
Rename the macro in the KVM selftests to TEST_ASSERT_EQ to avoid the
problem - it is also more similar to the other macros in test_util.h that
way.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20230712075910.22480-2-thuth@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Attempt to set the to-be-queued exception to be both pending and injected
_after_ KVM_CAP_SYNC_REGS's kvm_vcpu_ioctl_x86_set_vcpu_events() squashes
the pending exception (if there's also an injected exception). Buggy KVM
versions will eventually yell loudly about having impossible state when
processing queued excpetions, e.g.
WARNING: CPU: 0 PID: 1115 at arch/x86/kvm/x86.c:10095 kvm_check_and_inject_events+0x220/0x500 [kvm]
arch/x86/kvm/x86.c:kvm_check_and_inject_events():
WARN_ON_ONCE(vcpu->arch.exception.injected &&
vcpu->arch.exception.pending);
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20230728001606.2275586-3-mhal@rbox.co
[sean: split to separate patch, massage changelog and comment]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Attempt to modify the to-be-injected exception vector to an illegal value
_after_ the sanity checks performed by KVM_CAP_SYNC_REGS's
arch/x86/kvm/x86.c:kvm_vcpu_ioctl_x86_set_vcpu_events(). Buggy KVM
versions will eventually yells loudly about attempting to inject a bogus
vector, e.g.
WARNING: CPU: 0 PID: 1107 at arch/x86/kvm/x86.c:547 kvm_check_and_inject_events+0x4a0/0x500 [kvm]
arch/x86/kvm/x86.c:exception_type():
WARN_ON(vector > 31 || vector == NMI_VECTOR)
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20230728001606.2275586-3-mhal@rbox.co
[sean: split to separate patch]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Attempt to modify vcpu->run->s.regs _after_ the sanity checks performed by
KVM_CAP_SYNC_REGS's arch/x86/kvm/x86.c:sync_regs(). This can lead to some
nonsensical vCPU states accompanied by kernel splats, e.g. disabling PAE
while long mode is enabled makes KVM all kinds of confused:
WARNING: CPU: 0 PID: 1142 at arch/x86/kvm/mmu/paging_tmpl.h:358 paging32_walk_addr_generic+0x431/0x8f0 [kvm]
arch/x86/kvm/mmu/paging_tmpl.h:
KVM_BUG_ON(is_long_mode(vcpu) && !is_pae(vcpu), vcpu->kvm)
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20230728001606.2275586-3-mhal@rbox.co
[sean: see link]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add coverage to x86's set_sregs_test to verify KVM rejects vendor-agnostic
illegal CR0 values, i.e. CR0 values whose legality doesn't depend on the
current VMX mode. KVM historically has neglected to reject bad CR0s from
userspace, i.e. would happily accept a completely bogus CR0 via
KVM_SET_SREGS{2}.
Punt VMX specific subtests to future work, as they would require quite a
bit more effort, and KVM gets coverage for CR0 checks in general through
other means, e.g. KVM-Unit-Tests.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230613203037.1968489-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM selftests changes for 6.5:
- Add a test for splitting and reconstituting hugepages during and after
dirty logging
- Add support for CPU pinning in demand paging test
- Generate dependency files so that partial rebuilds work as expected
- Misc cleanups and fixes
KVM x86 changes for 6.5:
* Move handling of PAT out of MTRR code and dedup SVM+VMX code
* Fix output of PIC poll command emulation when there's an interrupt
* Add a maintainer's handbook to document KVM x86 processes, preferred coding
style, testing expectations, etc.
* Misc cleanups
Keep switching between LAPIC_MODE_X2APIC and LAPIC_MODE_DISABLED during
APIC map construction to hunt for TOCTOU bugs in KVM. KVM's optimized map
recalc makes multiple passes over the list of vCPUs, and the calculations
ignore vCPU's whose APIC is hardware-disabled, i.e. there's a window where
toggling LAPIC_MODE_DISABLED is quite interesting.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230602233250.1014316-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add a test for page splitting during dirty logging and for hugepage
recovery after dirty logging.
Page splitting represents non-trivial behavior, which is complicated
by MANUAL_PROTECT mode, which causes pages to be split on the first
clear, instead of when dirty logging is enabled.
Add a test which makes assertions about page counts to help define the
expected behavior of page splitting and to provide needed coverage of the
behavior. This also helps ensure that a failure in eager page splitting
is not covered up by splitting in the vCPU path.
Tested by running the test on an Intel Haswell machine w/wo
MANUAL_PROTECT.
Reviewed-by: Vipin Sharma <vipinsh@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Link: https://lore.kernel.org/r/20230131181820.179033-3-bgardon@google.com
[sean: let the user run without hugetlb, as suggested by Paolo]
Signed-off-by: Sean Christopherson <seanjc@google.com>