Commit Graph

197 Commits

Author SHA1 Message Date
Jani Nikula
2619861c47 drm/i915/gvt: use local INTEL_GVT_OPREGION_SIZE
All of gvt uses INTEL_GVT_OPREGION_SIZE for opregion size. Follow suit
here.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/8ae6e10fc0929934a14547a973312e82a4d7f7d1.1704992868.git.jani.nikula@intel.com
2024-01-16 11:22:40 +02:00
Stefan Hajnoczi
a7bea9f4fe vfio: use __aligned_u64 in struct vfio_device_gfx_plane_info
The memory layout of struct vfio_device_gfx_plane_info is
architecture-dependent due to a u64 field and a struct size that is not
a multiple of 8 bytes:
- On x86_64 the struct size is padded to a multiple of 8 bytes.
- On x32 the struct size is only a multiple of 4 bytes, not 8.
- Other architectures may vary.

Use __aligned_u64 to make memory layout consistent. This reduces the
chance of 32-bit userspace on a 64-bit kernel breakage.

This patch increases the struct size on x32 but this is safe because of
the struct's argsz field. The kernel may grow the struct as long as it
still supports smaller argsz values from userspace (e.g. applications
compiled against older kernel headers).

Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20230918205617.1478722-3-stefanha@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-09-28 12:12:08 -06:00
Linus Torvalds
0c02183427 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Clean up vCPU targets, always returning generic v8 as the preferred
     target

   - Trap forwarding infrastructure for nested virtualization (used for
     traps that are taken from an L2 guest and are needed by the L1
     hypervisor)

   - FEAT_TLBIRANGE support to only invalidate specific ranges of
     addresses when collapsing a table PTE to a block PTE. This avoids
     that the guest refills the TLBs again for addresses that aren't
     covered by the table PTE.

   - Fix vPMU issues related to handling of PMUver.

   - Don't unnecessary align non-stack allocations in the EL2 VA space

   - Drop HCR_VIRT_EXCP_MASK, which was never used...

   - Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu
     parameter instead

   - Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort()

   - Remove prototypes without implementations

  RISC-V:

   - Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest

   - Added ONE_REG interface for SATP mode

   - Added ONE_REG interface to enable/disable multiple ISA extensions

   - Improved error codes returned by ONE_REG interfaces

   - Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V

   - Added get-reg-list selftest for KVM RISC-V

  s390:

   - PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch)

     Allows a PV guest to use crypto cards. Card access is governed by
     the firmware and once a crypto queue is "bound" to a PV VM every
     other entity (PV or not) looses access until it is not bound
     anymore. Enablement is done via flags when creating the PV VM.

   - Guest debug fixes (Ilya)

  x86:

   - Clean up KVM's handling of Intel architectural events

   - Intel bugfixes

   - Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use
     debug registers and generate/handle #DBs

   - Clean up LBR virtualization code

   - Fix a bug where KVM fails to set the target pCPU during an IRTE
     update

   - Fix fatal bugs in SEV-ES intrahost migration

   - Fix a bug where the recent (architecturally correct) change to
     reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to
     skip it)

   - Retry APIC map recalculation if a vCPU is added/enabled

   - Overhaul emergency reboot code to bring SVM up to par with VMX, tie
     the "emergency disabling" behavior to KVM actually being loaded,
     and move all of the logic within KVM

   - Fix user triggerable WARNs in SVM where KVM incorrectly assumes the
     TSC ratio MSR cannot diverge from the default when TSC scaling is
     disabled up related code

   - Add a framework to allow "caching" feature flags so that KVM can
     check if the guest can use a feature without needing to search
     guest CPUID

   - Rip out the ancient MMU_DEBUG crud and replace the useful bits with
     CONFIG_KVM_PROVE_MMU

   - Fix KVM's handling of !visible guest roots to avoid premature
     triple fault injection

   - Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the
     API surface that is needed by external users (currently only
     KVMGT), and fix a variety of issues in the process

  Generic:

   - Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier
     events to pass action specific data without needing to constantly
     update the main handlers.

   - Drop unused function declarations

  Selftests:

   - Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs

   - Add support for printf() in guest code and covert all guest asserts
     to use printf-based reporting

   - Clean up the PMU event filter test and add new testcases

   - Include x86 selftests in the KVM x86 MAINTAINERS entry"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits)
  KVM: x86/mmu: Include mmu.h in spte.h
  KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots
  KVM: x86/mmu: Disallow guest from using !visible slots for page tables
  KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page
  KVM: x86/mmu: Harden new PGD against roots without shadow pages
  KVM: x86/mmu: Add helper to convert root hpa to shadow page
  drm/i915/gvt: Drop final dependencies on KVM internal details
  KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
  KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
  KVM: x86/mmu: Assert that correct locks are held for page write-tracking
  KVM: x86/mmu: Rename page-track APIs to reflect the new reality
  KVM: x86/mmu: Drop infrastructure for multiple page-track modes
  KVM: x86/mmu: Use page-track notifiers iff there are external users
  KVM: x86/mmu: Move KVM-only page-track declarations to internal header
  KVM: x86: Remove the unused page-track hook track_flush_slot()
  drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
  KVM: x86: Add a new page-track hook to handle memslot deletion
  drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
  KVM: x86: Reject memslot MOVE operations if KVMGT is attached
  ...
2023-09-07 13:52:20 -07:00
Sean Christopherson
09c8726ffa drm/i915/gvt: Drop final dependencies on KVM internal details
Open code gpa_to_gfn() in kvmgt_page_track_write() and drop KVMGT's
dependency on kvm_host.h, i.e. include only kvm_page_track.h.  KVMGT
assumes "gfn == gpa >> PAGE_SHIFT" all over the place, including a few
lines below in the same function with the same gpa, i.e. there's no
reason to use KVM's helper for this one case.

No functional change intended.

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Link: https://lore.kernel.org/r/20230729013535.1070024-30-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-31 14:08:19 -04:00
Sean Christopherson
f22b1e8500 KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
Get/put references to KVM when a page-track notifier is (un)registered
instead of relying on the caller to do so.  Forcing the caller to do the
bookkeeping is unnecessary and adds one more thing for users to get
wrong, e.g. see commit 9ed1fdee9e ("drm/i915/gvt: Get reference to KVM
iff attachment to VM is successful").

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20230729013535.1070024-29-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-31 14:08:19 -04:00
Sean Christopherson
96316a0670 KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
Refactor KVM's exported/external page-track, a.k.a. write-track, APIs
to take only the gfn and do the required memslot lookup in KVM proper.
Forcing users of the APIs to get the memslot unnecessarily bleeds
KVM internals into KVMGT and complicates usage of the APIs.

No functional change intended.

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20230729013535.1070024-28-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-31 14:08:18 -04:00
Sean Christopherson
7b574863e7 KVM: x86/mmu: Rename page-track APIs to reflect the new reality
Rename the page-track APIs to capture that they're all about tracking
writes, now that the facade of supporting multiple modes is gone.

Opportunstically replace "slot" with "gfn" in anticipation of removing
the @slot param from the external APIs.

No functional change intended.

Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20230729013535.1070024-25-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-31 14:08:15 -04:00
Sean Christopherson
338068b5be KVM: x86/mmu: Drop infrastructure for multiple page-track modes
Drop "support" for multiple page-track modes, as there is no evidence
that array-based and refcounted metadata is the optimal solution for
other modes, nor is there any evidence that other use cases, e.g. for
access-tracking, will be a good fit for the page-track machinery in
general.

E.g. one potential use case of access-tracking would be to prevent guest
access to poisoned memory (from the guest's perspective).  In that case,
the number of poisoned pages is likely to be a very small percentage of
the guest memory, and there is no need to reference count the number of
access-tracking users, i.e. expanding gfn_track[] for a new mode would be
grossly inefficient.  And for poisoned memory, host userspace would also
likely want to trap accesses, e.g. to inject #MC into the guest, and that
isn't currently supported by the page-track framework.

A better alternative for that poisoned page use case is likely a
variation of the proposed per-gfn attributes overlay (linked), which
would allow efficiently tracking the sparse set of poisoned pages, and by
default would exit to userspace on access.

Link: https://lore.kernel.org/all/Y2WB48kD0J4VGynX@google.com
Cc: Ben Gardon <bgardon@google.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Link: https://lore.kernel.org/r/20230729013535.1070024-24-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-31 14:08:14 -04:00
Yan Zhao
c15fcf12ff drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
Switch from the poorly named and flawed ->track_flush_slot() to the newly
introduced ->track_remove_region().  From KVMGT's perspective, the two
hooks are functionally equivalent, the only difference being that
->track_remove_region() is called only when KVM is 100% certain the
memory region will be removed, i.e. is invoked slightly later in KVM's
memslot modification flow.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
[sean: handle name change, massage changelog, rebase]
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Link: https://lore.kernel.org/r/20230729013535.1070024-20-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-31 14:07:25 -04:00
Sean Christopherson
2ee05a4c27 drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
When handling a slot "flush", don't call back into KVM to drop write
protection for gfns in the slot.  Now that KVM rejects attempts to move
memory slots while KVMGT is attached, the only time a slot is "flushed"
is when it's being removed, i.e. the memslot and all its write-tracking
metadata is about to be deleted.

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Link: https://lore.kernel.org/r/20230729013535.1070024-18-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-31 14:07:24 -04:00
Sean Christopherson
b271e17def KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
Drop @vcpu from KVM's ->track_write() hook provided for external users of
the page-track APIs now that KVM itself doesn't use the page-track
mechanism.

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Link: https://lore.kernel.org/r/20230729013535.1070024-16-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-31 13:49:01 -04:00
Sean Christopherson
3cca6b2628 drm/i915/gvt: Protect gfn hash table with vgpu_lock
Use vgpu_lock instead of KVM's mmu_lock to protect accesses to the hash
table used to track which gfns are write-protected when shadowing the
guest's GTT, and hoist the acquisition of vgpu_lock from
intel_vgpu_page_track_handler() out to its sole caller,
kvmgt_page_track_write().

This fixes a bug where kvmgt_page_track_write(), which doesn't hold
kvm->mmu_lock, could race with intel_gvt_page_track_remove() and trigger
a use-after-free.

Fixing kvmgt_page_track_write() by taking kvm->mmu_lock is not an option
as mmu_lock is a r/w spinlock, and intel_vgpu_page_track_handler() might
sleep when acquiring vgpu->cache_lock deep down the callstack:

  intel_vgpu_page_track_handler()
  |
  |->  page_track->handler / ppgtt_write_protection_handler()
       |
       |-> ppgtt_handle_guest_write_page_table_bytes()
           |
           |->  ppgtt_handle_guest_write_page_table()
                |
                |-> ppgtt_handle_guest_entry_removal()
                    |
                    |-> ppgtt_invalidate_pte()
                        |
                        |-> intel_gvt_dma_unmap_guest_page()
                            |
                            |-> mutex_lock(&vgpu->cache_lock);

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Link: https://lore.kernel.org/r/20230729013535.1070024-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-31 13:48:58 -04:00
Sean Christopherson
16735297fd drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
Use an "unsigned long" instead of an "int" when iterating over the gfns
in a memslot.  The number of pages in the memslot is tracked as an
"unsigned long", e.g. KVMGT could theoretically break if a KVM memslot
larger than 16TiB were deleted (2^32 * 4KiB).

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Link: https://lore.kernel.org/r/20230729013535.1070024-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-31 13:48:57 -04:00
Sean Christopherson
ba193f62c0 drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT
Now that gvt_pin_guest_page() explicitly verifies the pinned PFN is a
transparent hugepage page, don't use KVM's gfn_to_pfn() to pre-check if a
2MiB GTT entry is possible and instead just try to map the GFN with a 2MiB
entry.  Using KVM to query pfn that is ultimately managed through VFIO is
odd, and KVM's gfn_to_pfn() is not intended for non-KVM consumption; it's
exported only because of KVM vendor modules (x86 and PPC).

Open code the check on 2MiB support instead of keeping
is_2MB_gtt_possible() around for a single line of code.

Move the call to intel_gvt_dma_map_guest_page() for a 4KiB entry into its
case statement, i.e. fork the common path into the 4KiB and 2MiB "direct"
shadow paths.  Keeping the call in the "common" path is arguably more in
the spirit of "one change per patch", but retaining the local "page_size"
variable is silly, i.e. the call site will be changed either way, and
jumping around the no-longer-common code is more subtle and rather odd,
i.e. would just need to be immediately cleaned up.

Drop the error message from gvt_pin_guest_page() when KVMGT attempts to
shadow a 2MiB guest page that isn't backed by a compatible hugepage in the
host.  Dropping the pre-check on a THP makes it much more likely that the
"error" will be encountered in normal operation.

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Link: https://lore.kernel.org/r/20230729013535.1070024-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-31 13:48:56 -04:00
Yan Zhao
a15e61f337 drm/i915/gvt: Don't try to unpin an empty page range
Attempt to unpin pages in the error path of gvt_pin_guest_page() if and
only if at least one page was successfully pinned.  Unpinning doesn't
cause functional problems, but vfio_device_container_unpin_pages()
rightfully warns about being asked to unpin zero pages.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
[sean: write changelog]
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Link: https://lore.kernel.org/r/20230729013535.1070024-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-31 13:48:53 -04:00
Sean Christopherson
adc7b226b7 drm/i915/gvt: Verify hugepages are contiguous in physical address space
When shadowing a GTT entry with a 2M page, verify that the pfns are
contiguous, not just that the struct page pointers are contiguous.  The
memory map is virtual contiguous if "CONFIG_FLATMEM=y ||
CONFIG_SPARSEMEM_VMEMMAP=y", but not for "CONFIG_SPARSEMEM=y &&
CONFIG_SPARSEMEM_VMEMMAP=n", so theoretically KVMGT could encounter struct
pages that are virtually contiguous, but not physically contiguous.

In practice, this flaw is likely a non-issue as it would cause functional
problems iff a section isn't 2M aligned _and_ is directly adjacent to
another section with discontiguous pfns.

Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/r/20230729013535.1070024-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-31 13:48:52 -04:00
Yi Liu
8cfa718602 vfio-iommufd: Add detach_ioas support for emulated VFIO devices
This prepares for adding DETACH ioctl for emulated VFIO devices.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-16-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-07-25 10:19:18 -06:00
Greg Kroah-Hartman
d989bf543d i915: fix memory leak with using debugfs_lookup()
When calling debugfs_lookup() the result must have dput() called on it,
otherwise the memory will leak over time.  To make things simpler, just
call debugfs_lookup_and_remove() instead which handles all of the logic
at once.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gvt-dev@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20230202141309.2293834-1-gregkh@linuxfoundation.org
2023-02-23 13:42:13 +08:00
Zhi Wang
a06d4b9e15 drm/i915/gvt: use atomic operations to change the vGPU status
Several vGPU status are used to decide the availability of GVT-g core
logics when creating a vGPU. Use atomic operations on changing the vGPU
status to avoid the racing.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: intel-gvt-dev@lists.freedesktop.org
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20221110122034.3382-2-zhi.a.wang@intel.com
2023-01-04 23:21:19 +08:00
Linus Torvalds
785d21ba2f Merge tag 'vfio-v6.2-rc1' of https://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:

 - Replace deprecated git://github.com link in MAINTAINERS (Palmer
   Dabbelt)

 - Simplify vfio/mlx5 with module_pci_driver() helper (Shang XiaoJing)

 - Drop unnecessary buffer from ACPI call (Rafael Mendonca)

 - Correct latent missing include issue in iova-bitmap and fix support
   for unaligned bitmaps. Follow-up with better fix through refactor
   (Joao Martins)

 - Rework ccw mdev driver to split private data from parent structure,
   better aligning with the mdev lifecycle and allowing us to remove a
   temporary workaround (Eric Farman)

 - Add an interface to get an estimated migration data size for a
   device, allowing userspace to make informed decisions, ex. more
   accurately predicting VM downtime (Yishai Hadas)

 - Fix minor typo in vfio/mlx5 array declaration (Yishai Hadas)

 - Simplify module and Kconfig through consolidating SPAPR/EEH code and
   config options and folding virqfd module into main vfio module (Jason
   Gunthorpe)

 - Fix error path from device_register() across all vfio mdev and sample
   drivers (Alex Williamson)

 - Define migration pre-copy interface and implement for vfio/mlx5
   devices, allowing portions of the device state to be saved while the
   device continues operation, towards reducing the stop-copy state size
   (Jason Gunthorpe, Yishai Hadas, Shay Drory)

 - Implement pre-copy for hisi_acc devices (Shameer Kolothum)

 - Fixes to mdpy mdev driver remove path and error path on probe (Shang
   XiaoJing)

 - vfio/mlx5 fixes for incorrect return after copy_to_user() fault and
   incorrect buffer freeing (Dan Carpenter)

* tag 'vfio-v6.2-rc1' of https://github.com/awilliam/linux-vfio: (42 commits)
  vfio/mlx5: error pointer dereference in error handling
  vfio/mlx5: fix error code in mlx5vf_precopy_ioctl()
  samples: vfio-mdev: Fix missing pci_disable_device() in mdpy_fb_probe()
  hisi_acc_vfio_pci: Enable PRE_COPY flag
  hisi_acc_vfio_pci: Move the dev compatibility tests for early check
  hisi_acc_vfio_pci: Introduce support for PRE_COPY state transitions
  hisi_acc_vfio_pci: Add support for precopy IOCTL
  vfio/mlx5: Enable MIGRATION_PRE_COPY flag
  vfio/mlx5: Fallback to STOP_COPY upon specific PRE_COPY error
  vfio/mlx5: Introduce multiple loads
  vfio/mlx5: Consider temporary end of stream as part of PRE_COPY
  vfio/mlx5: Introduce vfio precopy ioctl implementation
  vfio/mlx5: Introduce SW headers for migration states
  vfio/mlx5: Introduce device transitions of PRE_COPY
  vfio/mlx5: Refactor to use queue based data chunks
  vfio/mlx5: Refactor migration file state
  vfio/mlx5: Refactor MKEY usage
  vfio/mlx5: Refactor PD usage
  vfio/mlx5: Enforce a single SAVE command at a time
  vfio: Extend the device migration protocol with PRE_COPY
  ...
2022-12-15 13:12:15 -08:00
Linus Torvalds
08cdc21579 Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd implementation from Jason Gunthorpe:
 "iommufd is the user API to control the IOMMU subsystem as it relates
  to managing IO page tables that point at user space memory.

  It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
  container) which is the VFIO specific interface for a similar idea.

  We see a broad need for extended features, some being highly IOMMU
  device specific:
   - Binding iommu_domain's to PASID/SSID
   - Userspace IO page tables, for ARM, x86 and S390
   - Kernel bypassed invalidation of user page tables
   - Re-use of the KVM page table in the IOMMU
   - Dirty page tracking in the IOMMU
   - Runtime Increase/Decrease of IOPTE size
   - PRI support with faults resolved in userspace

  Many of these HW features exist to support VM use cases - for instance
  the combination of PASID, PRI and Userspace IO Page Tables allows an
  implementation of DMA Shared Virtual Addressing (vSVA) within a guest.
  Dirty tracking enables VM live migration with SRIOV devices and PASID
  support allow creating "scalable IOV" devices, among other things.

  As these features are fundamental to a VM platform they need to be
  uniformly exposed to all the driver families that do DMA into VMs,
  which is currently VFIO and VDPA"

For more background, see the extended explanations in Jason's pull request:

  https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits)
  iommufd: Change the order of MSI setup
  iommufd: Improve a few unclear bits of code
  iommufd: Fix comment typos
  vfio: Move vfio group specific code into group.c
  vfio: Refactor dma APIs for emulated devices
  vfio: Wrap vfio group module init/clean code into helpers
  vfio: Refactor vfio_device open and close
  vfio: Make vfio_device_open() truly device specific
  vfio: Swap order of vfio_device_container_register() and open_device()
  vfio: Set device->group in helper function
  vfio: Create wrappers for group register/unregister
  vfio: Move the sanity check of the group to vfio_create_group()
  vfio: Simplify vfio_create_group()
  iommufd: Allow iommufd to supply /dev/vfio/vfio
  vfio: Make vfio_container optionally compiled
  vfio: Move container related MODULE_ALIAS statements into container.c
  vfio-iommufd: Support iommufd for emulated VFIO devices
  vfio-iommufd: Support iommufd for physical VFIO devices
  vfio-iommufd: Allow iommufd to be used in place of a container fd
  vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent()
  ...
2022-12-14 09:15:43 -08:00
Linus Torvalds
a594533df0 Merge tag 'drm-next-2022-12-13' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
 "The biggest highlight is that the accel subsystem framework is merged.
  Hopefully for 6.3 we will be able to line up a driver to use it.

  In drivers land, i915 enables DG2 support by default now, and nouveau
  has a big stability refactoring and initial ampere support, AMD
  includes new hw IP support and should build on ARM again. There is
  also an ofdrm driver to take over offb on platforms it's used.

  Stuff outside my tree, the dma-buf patches hit a few places, the vc4
  firmware changes also do, and i915 has some interactions with MEI for
  discrete GPUs. I think all of those should have been acked/reviewed by
  relevant parties.

  New driver:
   - ofdrm - replacement for offb

  fbdev:
   - add support for nomodeset

  fourcc:
   - add Vivante tiled modifier

  core:
   - atomic-helpers: CRTC primary plane test fixes, fb access hooks
   - connector: TV API consistency, cmdline parser improvements
   - send connector hotplug on cleanup
   - sort makefile objects

  tests:
   - sort kunit tests
   - improve DP-MST tests
   - add kunit helpers to create a device

  sched:
   - module param for scheduling policy
   - refcounting fix

  buddy:
   - add back random seed log

  ttm:
   - convert ttm_resource to size_t
   - optimize pool allocations

  edid:
   - HFVSDB parsing support fixes
   - logging/debug improvements
   - DSC quirks

  dma-buf:
   - Add unlocked vmap and attachment mapping
   - move drivers to common locking convention
   - locking improvements

  firmware:
   - new API for rPI firmware and vc4

  xilinx:
   - zynqmp: displayport bridge support
   - dpsub fix

  bridge:
   - adv7533: Remove dynamic lane switching
   - it6505: Runtime PM support, sync improvements
   - ps8640: Handle AUX defer messages
   - tc358775: Drop soft-reset over I2C

  panel:
   - panel-edp: Add INX N116BGE-EA2 C2 and C4 support.
   - Jadard JD9365DA-H3
   - NewVision NV3051D

  amdgpu:
   - DCN support on ARM
   - DCN 2.1 secure display
   - Sienna Cichlid mode2 reset fixes
   - new GC 11.x firmware versions
   - drop AMD specific DSC workarounds in favour of drm code
   - clang warning fixes
   - scheduler rework
   - SR-IOV fixes
   - GPUVM locking fixes
   - fix memory leak in CS IOCTL error path
   - flexible array updates
   - enable new GC/PSP/SMU/NBIO IP
   - GFX preemption support for gfx9

  amdkfd:
   - cache size fixes
   - userptr fixes
   - enable cooperative launch on gfx 10.3
   - enable GC 11.0.4 KFD support

  radeon:
   - replace kmap with kmap_local_page
   - ACPI ref count fix
   - HDA audio notifier support

  i915:
   - DG2 enabled by default
   - MTL enablement work
   - hotplug refactoring
   - VBT improvements
   - Display and watermark refactoring
   - ADL-P workaround
   - temp disable runtime_pm for discrete-
   - fix for A380 as a secondary GPU
   - Wa_18017747507 for DG2
   - CS timestamp support fixes for gen5 and earlier
   - never purge busy TTM objects
   - use i915_sg_dma_sizes for all backends
   - demote GuC kernel contexts to normal priority
   - gvt: refactor for new MDEV interface
   - enable DC power states on eDP ports
   - fix gen 2/3 workarounds

  nouveau:
   - fix page fault handling
   - Ampere acceleration support
   - driver stability improvements
   - nva3 backlight support

  msm:
   - MSM_INFO_GET_FLAGS support
   - DPU: XR30 and P010 image formats
   - Qualcomm SM6115 support
   - DSI PHY support for QCM2290
   - HDMI: refactored dev init path
   - remove exclusive-fence hack
   - fix speed-bin detection
   - enable clamp to idle on 7c3
   - improved hangcheck detection

  vmwgfx:
   - fb and cursor refactoring
   - convert to generic hashtable
   - cursor improvements

  etnaviv:
   - hw workarounds
   - softpin MMU fixes

  ast:
   - atomic gamma LUT support
   - convert to SHMEM

  lcdif:
   - support YUV planes
   - Increase DMA burst size
   - FIFO threshold tuning

  meson:
   - fix return type of cvbs mode_valid

  mgag200:
   - fix PLL setup on some revisions

  sun4i:
   - A100 and D1 support

  udl:
   - modesetting improvements
   - hot unplug support

  vc4:
   - support PAL-M
   - fix regression preventing 4K @ 60Hz
   - fix NULL ptr deref

  v3d:
   - switch to drm managed resources

  renesas:
   - RZ/G2L DSI support
   - DU Kconfig cleanup

  mediatek:
   - fixup dpi and hdmi
   - MT8188 dpi support
   - MT8195 AFBC support

  tegra:
   - NVDEC hardware on Tegra234 SoC

  hdlcd:
   - switch to drm managed resources

  ingenic:
   - fix registration error path

  hisilicon:
   - convert to drm_mode_init

  maildp:
   - use managed resources

  mtk:
   - use drm_mode_init

  rockchip:
   - use drm_mode_copy"

* tag 'drm-next-2022-12-13' of git://anongit.freedesktop.org/drm/drm: (1397 commits)
  drm/amdgpu: fix mmhub register base coding error
  drm/amdgpu: add tmz support for GC IP v11.0.4
  drm/amdgpu: enable GFX Clock Gating control for GC IP v11.0.4
  drm/amdgpu: enable GFX Power Gating for GC IP v11.0.4
  drm/amdgpu: enable GFX IP v11.0.4 CG support
  drm/amdgpu: Make amdgpu_ring_mux functions as static
  drm/amdgpu: generally allow over-commit during BO allocation
  drm/amd/display: fix array index out of bound error in DCN32 DML
  drm/amd/display: 3.2.215
  drm/amd/display: set optimized required for comp buf changes
  drm/amd/display: Add debug option to skip PSR CRTC disable
  drm/amd/display: correct DML calc error of UrgentLatency
  drm/amd/display: correct static_screen_event_mask
  drm/amd/display: Ensure commit_streams returns the DC return code
  drm/amd/display: read invalid ddc pin status cause engine busy
  drm/amd/display: Bypass DET swath fill check for max clocks
  drm/amd/display: Disable uclk pstate for subvp pipes
  drm/amd/display: Fix DCN2.1 default DSC clocks
  drm/amd/display: Enable dp_hdmi21_pcon support
  drm/amd/display: prevent seamless boot on displays that don't have the preferred dig
  ...
2022-12-13 11:59:58 -08:00
Jason Gunthorpe
90337f526c Merge tag 'v6.1-rc7' into iommufd.git for-next
Resolve conflicts in drivers/vfio/vfio_main.c by using the iommfd version.
The rc fix was done a different way when iommufd patches reworked this
code.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02 12:04:39 -04:00
Jason Gunthorpe
4741f2e941 vfio-iommufd: Support iommufd for emulated VFIO devices
Emulated VFIO devices are calling vfio_register_emulated_iommu_dev() and
consist of all the mdev drivers.

Like the physical drivers, support for iommufd is provided by the driver
supplying the correct standard ops. Provide ops from the core that
duplicate what vfio_register_emulated_iommu_dev() does.

Emulated drivers are where it is more likely to see variation in the
iommfd support ops. For instance IDXD will probably need to setup both a
iommfd_device context linked to a PASID and an iommufd_access context to
support all their mdev operations.

Link: https://lore.kernel.org/r/7-v4-42cd2eb0e3eb+335a-vfio_iommufd_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yu He <yu.he@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02 11:52:03 -04:00
Yi Liu
4dc334cab1 i915/gvt: Move gvt mapping cache initialization to intel_vgpu_init_dev()
vfio container registers .dma_unmap() callback after the device is opened.
So it's fine for mdev drivers to initialize internal mapping cache in
.open_device(). See vfio_device_container_register().

Now with iommufd an access ops with an unmap callback is registered when
the device is bound to iommufd which is before .open_device() is
called. This implies gvt's .dma_unmap() could be called before its
internal mapping cache is initialized.

The fix is moving gvt mapping cache initialization to vGPU init. While at
it also move ptable initialization together.

Link: https://lore.kernel.org/r/20221202135402.756470-2-yi.l.liu@intel.com
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02 11:49:26 -04:00
Rodrigo Vivi
164312df95 Merge tag 'gvt-next-2022-11-17' of https://github.com/intel/gvt-linux into drm-intel-next
gvt-next-2022-11-17

- kernel doc fixes
- remove vgpu->released sanity check
- small clean up

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
From: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221117064106.GT30028@zhen-hp.sh.intel.com
2022-11-17 08:46:48 -05:00
Zhi Wang
2d3bc87543 drm/i915/gvt: remove the vgpu->released and its sanity check
The life cycle of a vGPU, which is represented by a vfio_device, has been
managed by the VFIO core logic. Remove the vgpu->released, which was used
for a sanity check on the removal path of the vGPU instance. The sanity
check has already been covered in the VFIO core logic.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: intel-gvt-dev@lists.freedesktop.org
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20221104145652.1570-1-zhi.a.wang@intel.com
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2022-11-17 14:07:09 +08:00
Sean Christopherson
3c9fd44b93 drm/i915/gvt: Unconditionally put reference to KVM when detaching vGPU
Always put the KVM reference when closing a vCPU device, as
intel_vgpu_open_device() succeeds if and only if the KVM pointer is
valid and a reference to KVM is acquired.  And if that doesn't hold true,
the call to kvm_page_track_unregister_notifier() a few lines earlier is
doomed.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20221111002225.2418386-3-seanjc@google.com
2022-11-11 13:21:52 +08:00
Sean Christopherson
9ed1fdee9e drm/i915/gvt: Get reference to KVM iff attachment to VM is successful
Get a reference to KVM if and only if a vGPU is successfully attached to
the VM to avoid leaking a reference if there's no available vGPU.  On
open_device() failure, vfio_device_open() doesn't invoke close_device().

Fixes: 421cfe6596 ("vfio: remove VFIO_GROUP_NOTIFY_SET_KVM")
Cc: stable@vger.kernel.org
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20221111002225.2418386-2-seanjc@google.com
2022-11-11 13:13:50 +08:00
Eric Farman
913447d06f vfio: Remove vfio_free_device
With the "mess" sorted out, we should be able to inline the
vfio_free_device call introduced by commit cb9ff3f3b8
("vfio: Add helpers for unifying vfio_device life cycle")
and remove them from driver release callbacks.

Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>	# vfio-ap part
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20221104142007.1314999-8-farman@linux.ibm.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-11-10 11:30:23 -07:00
Christoph Hellwig
685a1537f4 vfio/mdev: consolidate all the description sysfs into the core code
Every driver just emits a string, simply add a method to the mdev_driver
to return it and provide a standard sysfs show function.

Remove the now unused types_attrs field in struct mdev_driver and the
support code for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Link: https://lore.kernel.org/r/20220923092652.100656-14-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Christoph Hellwig
f2fbc72e6d vfio/mdev: consolidate all the available_instance sysfs into the core code
Every driver just print a number, simply add a method to the mdev_driver
to return it and provide a standard sysfs show function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20220923092652.100656-13-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Christoph Hellwig
0bc79069cc vfio/mdev: consolidate all the name sysfs into the core code
Every driver just emits a static string, simply add a field to the
mdev_type for the driver to fill out or fall back to the sysfs name and
provide a standard sysfs show function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20220923092652.100656-12-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Jason Gunthorpe
290aac5df8 vfio/mdev: consolidate all the device_api sysfs into the core code
Every driver just emits a static string, simply feed it through the ops
and provide a standard sysfs show function.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20220923092652.100656-11-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Christoph Hellwig
062e720cd2 vfio/mdev: remove mdev_parent_dev
Just open code the dereferences in the only user.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Link: https://lore.kernel.org/r/20220923092652.100656-9-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Christoph Hellwig
da44c340c4 vfio/mdev: simplify mdev_type handling
Instead of abusing struct attribute_group to control initialization of
struct mdev_type, just define the actual attributes in the mdev_driver,
allocate the mdev_type structures in the caller and pass them to
mdev_register_parent.

This allows the caller to use container_of to get at the containing
structure and thus significantly simplify the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20220923092652.100656-6-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Christoph Hellwig
89345d5177 vfio/mdev: embedd struct mdev_parent in the parent data structure
Simplify mdev_{un}register_device by requiring the caller to pass in
a structure allocate as part of the parent device structure.  This
removes the need for a list of parents and the separate mdev_parent
refcount as we can simplify rely on the reference to the parent device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20220923092652.100656-5-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Christoph Hellwig
bdef2b7896 vfio/mdev: make mdev.h standalone includable
Include <linux/device.h> and <linux/uuid.h> so that users of this headers
don't need to do that and remove those includes that aren't needed
any more.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Link: https://lore.kernel.org/r/20220923092652.100656-4-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Christoph Hellwig
1aa3834f51 drm/i915/gvt: simplify vgpu configuration management
Instead of copying the information from the vgpu_types arrays into each
intel_vgpu_type structure, just reference this constant information
with a pointer to the already existing data structure, and pass it into
the low-level VGPU creation helpers intead of copying the data into yet
anothe params data structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: https://lore.kernel.org/r/20220923092652.100656-3-hch@lst.de
[aw: Fold fix from 20220928121110.GA30738@lst.de]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04 12:06:58 -06:00
Jason Gunthorpe
f423fa1bc9 drm/i915/gvt: Add missing vfio_unregister_group_dev() call
When converting to directly create the vfio_device the mdev driver has to
put a vfio_register_emulated_iommu_dev() in the probe() and a pairing
vfio_unregister_group_dev() in the remove.

This was missed for gvt, add it.

Cc: stable@vger.kernel.org
Fixes: 978cf586ac ("drm/i915/gvt: convert to use vfio_register_emulated_iommu_dev")
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/0-v1-013609965fe8+9d-vfio_gvt_unregister_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-30 11:56:02 -06:00
Kevin Tian
a5ddd2a99a drm/i915/gvt: Use the new device life cycle helpers
Move vfio_device to the start of intel_vgpu as required by the new
helpers.

Change intel_gvt_create_vgpu() to use intel_vgpu as the first param
as other vgpu helpers do.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: https://lore.kernel.org/r/20220921104401.38898-9-kevin.tian@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21 14:15:11 -06:00
Nicolin Chen
34a255e676 vfio: Replace phys_pfn with pages for vfio_pin_pages()
Most of the callers of vfio_pin_pages() want "struct page *" and the
low-level mm code to pin pages returns a list of "struct page *" too.
So there's no gain in converting "struct page *" to PFN in between.

Replace the output parameter "phys_pfn" list with a "pages" list, to
simplify callers. This also allows us to replace the vfio_iommu_type1
implementation with a more efficient one.

And drop the pfn_valid check in the gvt code, as there is no need to
do such a check at a page-backed struct page pointer.

For now, also update vfio_iommu_type1 to fit this new parameter too.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Eric Farman <farman@linux.ibm.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/20220723020256.30081-11-nicolinc@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-25 13:41:22 -06:00
Nicolin Chen
44abdd1646 vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
The vfio_pin/unpin_pages() so far accepted arrays of PFNs of user IOVA.
Among all three callers, there was only one caller possibly passing in
a non-contiguous PFN list, which is now ensured to have contiguous PFN
inputs too.

Pass in the starting address with "iova" alone to simplify things, so
callers no longer need to maintain a PFN list or to pin/unpin one page
at a time. This also allows VFIO to use more efficient implementations
of pin/unpin_pages.

For now, also update vfio_iommu_type1 to fit this new parameter too,
while keeping its input intact (being user_iova) since we don't want
to spend too much effort swapping its parameters and local variables
at that level.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Eric Farman <farman@linux.ibm.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/20220723020256.30081-6-nicolinc@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-25 13:41:22 -06:00
Nicolin Chen
2c9e8c0110 drm/i915/gvt: Replace roundup with DIV_ROUND_UP
It's a bit redundant for the maths here using roundup.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/20220723020256.30081-3-nicolinc@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-23 07:29:10 -06:00
Nicolin Chen
e8f90717ed vfio: Make vfio_unpin_pages() return void
There's only one caller that checks its return value with a WARN_ON_ONCE,
while all other callers don't check the return value at all. Above that,
an undo function should not fail. So, simplify the API to return void by
embedding similar WARN_ONs.

Also for users to pinpoint which condition fails, separate WARN_ON lines,
yet remove the "driver->ops->unpin_pages" check, since it's unreasonable
for callers to unpin on something totally random that wasn't even pinned.
And remove NULL pointer checks for they would trigger oops vs. warnings.
Note that npage is already validated in the vfio core, thus drop the same
check in the type1 code.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/20220723020256.30081-2-nicolinc@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-23 07:29:10 -06:00
Jason Gunthorpe
ce4b4657ff vfio: Replace the DMA unmapping notifier with a callback
Instead of having drivers register the notifier with explicit code just
have them provide a dma_unmap callback op in their driver ops and rely on
the core code to wire it up.

Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v4-681e038e30fd+78-vfio_unmap_notif_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-20 11:57:59 -06:00
Matthew Rosato
421cfe6596 vfio: remove VFIO_GROUP_NOTIFY_SET_KVM
Rather than relying on a notifier for associating the KVM with
the group, let's assume that the association has already been
made prior to device_open.  The first time a device is opened
associate the group KVM with the device.

This fixes a user-triggerable oops in GVT.

Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Zhi Wang <zhi.a.wang@intel.com>
Link: https://lore.kernel.org/r/20220519183311.582380-2-mjrosato@linux.ibm.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-24 08:41:18 -06:00
Jason Gunthorpe
5eb20a78c0 drm/i915/gvt: Change from vfio_group_(un)pin_pages to vfio_(un)pin_pages
Use the existing vfio_device versions of vfio_(un)pin_pages(). There is no
reason to use a group interface here, kvmgt has easy access to a
vfio_device.

Delete kvmgt_vdev::vfio_group since these calls were the last users.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Zhi Wang <zhi.a.wang@intel.com>
Link: https://lore.kernel.org/r/5-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11 13:12:59 -06:00
Jason Gunthorpe
09ea48efff vfio: Make vfio_(un)register_notifier accept a vfio_device
All callers have a struct vfio_device trivially available, pass it in
directly and avoid calling the expensive vfio_group_get_from_dev().

Acked-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11 13:12:58 -06:00
Jason Gunthorpe
6b42f491e1 vfio/mdev: Remove mdev_parent_ops
The last useful member in this struct is the supported_type_groups, move
it to the mdev_driver and delete mdev_parent_ops.

Replace it with mdev_driver as an argument to mdev_register_device()

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20220411141403.86980-33-hch@lst.de
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
2022-04-21 07:36:56 -04:00