linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-05 23:05:25 -04:00

Author	SHA1	Message	Date
Antheas Kapenekakis	0a4d00e2e9	iommu: Fix mapping check for 0x0 to avoid re-mapping it Commit `789a5913b2` ("iommu/amd: Use the generic iommu page table") introduces the shared iommu page table for AMD IOMMU. Some bioses contain an identity mapping for address 0x0, which is not parsed properly (e.g., certain Strix Halo devices). This causes the DMA components of the device to fail to initialize (e.g., the NVMe SSD controller), leading to a failed post. Specifically, on the GPD Win 5, the NVME and SSD GPU fail to mount, making collecting errors difficult. While debugging, it was found that a -EADDRINUSE error was emitted and its source was traced to iommu_iova_to_phys(). After adding some debug prints, it was found that phys_addr becomes 0, which causes the code to try to re-map the 0 address and fail, causing a cascade leading to a failed post. This is because the GPD Win 5 contains a 0x0-0x1 identity mapping for DMA devices, causing it to be repeated for each device. The cause of this failure is the following check in iommu_create_device_direct_mappings(), where address aliasing is handled via the following check: ``` phys_addr = iommu_iova_to_phys(domain, addr); if (!phys_addr) { map_size += pg_size; continue; } ```` Obviously, the iommu_iova_to_phys() signature is faulty and aliases unmapped and 0 together, causing the allocation code to try to re-allocate the 0 address per device. However, it has too many instantiations to fix. Therefore, use a ternary so that when addr is 0, the check is done for address 1 instead. Suggested-by: Robin Murphy <robin.murphy@arm.com> Fixes: `789a5913b2` ("iommu/amd: Use the generic iommu page table") Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-03-17 13:33:33 +01:00
Linus Torvalds	323bbfcf1e	Convert 'alloc_flex' family to use the new default GFP_KERNEL argument This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/\(alloc_objs(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Can Peng	16e3423fc7	iommu: simplify list initialization in iommu_create_device_direct_mappings() Use LIST_HEAD() to declare and initialize the 'mappings' list head in iommu_create_device_direct_mappings() instead of separate declaration and INIT_LIST_HEAD(). This simplifies the code by combining declaration and initialization into a single idiomatic form, improving readability without changing functionality. Signed-off-by: Can Peng <pengcan@kylinos.cn> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-10 10:58:54 +01:00
Mostafa Saleh	ccc21213f0	iommu: Add calls for IOMMU_DEBUG_PAGEALLOC Add calls for the new iommu debug config IOMMU_DEBUG_PAGEALLOC: - iommu_debug_init: Enable the debug mode if configured by the user. - iommu_debug_map: Track iommu pages mapped, using physical address. - iommu_debug_unmap_begin: Track start of iommu unmap operation, with IOVA and size. - iommu_debug_unmap_end: Track the end of unmap operation, passing the actual unmapped size versus the tracked one at unmap_begin. We have to do the unmap_begin/end as once pages are unmapped we lose the information of the physical address. This is racy, but the API is racy by construction as it uses refcounts and doesn't attempt to lock/synchronize with the IOMMU API as that will be costly, meaning that possibility of false negative exists. Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-10 10:52:26 +01:00
Nicolin Chen	c279e83953	iommu: Introduce pci_dev_reset_iommu_prepare/done() PCIe permits a device to ignore ATS invalidation TLPs while processing a reset. This creates a problem visible to the OS where an ATS invalidation command will time out. E.g. an SVA domain will have no coordination with a reset event and can racily issue ATS invalidations to a resetting device. The OS should do something to mitigate this as we do not want production systems to be reporting critical ATS failures, especially in a hypervisor environment. Broadly, OS could arrange to ignore the timeouts, block page table mutations to prevent invalidations, or disable and block ATS. The PCIe r6.0, sec 10.3.1 IMPLEMENTATION NOTE recommends SW to disable and block ATS before initiating a Function Level Reset. It also mentions that other reset methods could have the same vulnerability as well. Provide a callback from the PCI subsystem that will enclose the reset and have the iommu core temporarily change all the attached RID/PASID domains group->blocking_domain so that the IOMMU hardware would fence any incoming ATS queries. And IOMMU drivers should also synchronously stop issuing new ATS invalidations and wait for all ATS invalidations to complete. This can avoid any ATS invaliation timeouts. However, if there is a domain attachment/replacement happening during an ongoing reset, ATS routines may be re-activated between the two function calls. So, introduce a new resetting_domain in the iommu_group structure to reject any concurrent attach_dev/set_dev_pasid call during a reset for a concern of compatibility failure. Since this changes the behavior of an attach operation, update the uAPI accordingly. Note that there are two corner cases: 1. Devices in the same iommu_group Since an attachment is always per iommu_group, this means that any sibling devices in the iommu_group cannot change domain, to prevent race conditions. 2. An SR-IOV PF that is being reset while its VF is not In such case, the VF itself is already broken. So, there is no point in preventing PF from going through the iommu reset. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-10 10:26:44 +01:00
Nicolin Chen	a75b2be249	iommu: Add iommu_driver_get_domain_for_dev() helper There is a need to stage a resetting PCI device to temporarily the blocked domain and then attach back to its previously attached domain after reset. This can be simply done by keeping the "previously attached domain" in the iommu_group->domain pointer while adding an iommu_group->resetting_domain, which gives troubles to IOMMU drivers using the iommu_get_domain_for_dev() for a device's physical domain in order to program IOMMU hardware. And in such for-driver use cases, the iommu_group->mutex must be held, so it doesn't fit in external callers that don't hold the iommu_group->mutex. Introduce a new iommu_driver_get_domain_for_dev() helper, exclusively for driver use cases that hold the iommu_group->mutex, to separate from those external use cases. Add a lockdep_assert_not_held to the existing iommu_get_domain_for_dev() and highlight that in a kdoc. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-10 10:26:43 +01:00
Nicolin Chen	4a73abb965	iommu: Tidy domain for iommu_setup_dma_ops() This function can only be called on the default_domain. Trivally pass it in. In all three existing cases, the default domain was just attached to the device. This avoids iommu_setup_dma_ops() calling iommu_get_domain_for_dev() that will be used by external callers. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-10 10:26:43 +01:00
Nicolin Chen	5d5388b0e1	iommu: Lock group->mutex in iommu_deferred_attach() The iommu_deferred_attach() function invokes __iommu_attach_device(), but doesn't hold the group->mutex like other __iommu_attach_device() callers. Though there is no pratical bug being triggered so far, it would be better to apply the same locking to this __iommu_attach_device(), since the IOMMU drivers nowaday are more aware of the group->mutex -- some of them use the iommu_group_mutex_assert() function that could be potentially in the path of an attach_dev callback function invoked by the __iommu_attach_device(). Worth mentioning that the iommu_deferred_attach() will soon need to check group->resetting_domain that must be locked also. Thus, grab the mutex to guard __iommu_attach_device() like other callers. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2026-01-10 10:26:42 +01:00
Nicolin Chen	fd714986e4	iommu: Pass in old domain to attach_dev callback functions The IOMMU core attaches each device to a default domain on probe(). Then, every new "attach" operation has a fundamental meaning of two-fold: - detach from its currently attached (old) domain - attach to a given new domain Modern IOMMU drivers following this pattern usually want to clean up the things related to the old domain, so they call iommu_get_domain_for_dev() to fetch the old domain. Pass in the old domain pointer from the core to drivers, aligning with the set_dev_pasid op that does so already. Ensure all low-level attach fcuntions in the core can forward the correct old domain pointer. Thus, rework those functions as well. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-10-27 13:55:35 +01:00
Nicolin Chen	2b33598e66	iommu: Do not revert set_domain for the last gdev The last gdev is the device that failed the __iommu_device_set_domain(). So, it doesn't need to be reverted, given it's attached to group->domain already. This is not a problem currently, since it's a simply re-attach. However, the core will need to pass in the old domain to __iommu_device_set_domain so the old domain pointers would be inconsistent between a failed device and all its prior succeeded devices, as all the prior devices need to be reverted. Avoid the re-attach for the last gdev, by breaking before the revert. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-10-27 13:55:35 +01:00
Jason Gunthorpe	e94160488e	iommu: Generic support for RMRs during device release Generally an IOMMU driver should leave the translation as BLOCKED until the translation entry is probed onto a struct device. When the struct device is removed, the translation should be put back to BLOCKED. Drivers that are able to work like this can set their release_domain to the blocking domain, and the core code handles this work. The exception is when the device has an IOMMU_RESV_DIRECT region, in which case the OS should continuously allow translations for the given range. And the core code generally prevents using a BLOCKED domain with this device. Continue this logic for the device release and hoist some open coding from drivers. If the device has dev->iommu->require_direct and the driver uses a BLOCKED release_domain, override it to IDENTITY to preserve the semantics. The only remaining required driver code for IOMMU_RESV_DIRECT should preset an IDENTITY translation during early IOMMU startup for those devices. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-10-27 13:55:34 +01:00
Guixin Liu	2a918911ed	iommufd: Register iommufd mock devices with fwspec Since the bus ops were retired the iommu subsystem changed to using fwspec to match the iommu driver to the iommu device. If a device has a NULL fwspec then it is matched to the first iommu driver with a NULL fwspec, effectively disabling support for systems with more than one non-fwspec iommu driver. Thus, if the iommufd selfest are run in an x86 system that registers a non-fwspec iommu driver they fail to bind their mock devices to the mock iommu driver. Fix this by allocating a software fwnode for mock iommu driver's iommu_device, and set it to the device which mock iommu driver created. This is done by adding a new helper iommu_mock_device_add() which abuses the internals of the fwspec system to establish a fwspec before the device is added and is careful not to leak it. A matching dummy fwspec is automatically added to the mock iommu driver. Test by "make -C toosl/testing/selftests TARGETS=iommu run_tests": PASSED: 229 / 229 tests passed. In addition, this issue is also can be found on amd platform, and also tested on a amd machine. Link: https://patch.msgid.link/r/20250925054730.3877-1-kanie@linux.alibaba.com Fixes: `17de3f5fdd` ("iommu: Retire bus ops") Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Qinyun Tan <qinyuntan@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-09-30 09:54:12 -03:00
Jason Gunthorpe	792ea7b6ca	iommu: Remove ops->pgsize_bitmap No driver uses it now, remove the core code. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/7-v2-68a2e1ba507c+1fb-iommu_rm_ops_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2025-06-27 17:34:11 +02:00
Linus Torvalds	8477ab1430	Merge tag 'iommu-updates-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core: - Introduction of iommu-pages infrastructure to consolitate page-table allocation code among hardware drivers. This is ground-work for more generalization in the future - Remove IOMMU_DEV_FEAT_SVA and IOMMU_DEV_FEAT_IOPF feature flags - Convert virtio-iommu to domain_alloc_paging() - KConfig cleanups - Some small fixes for possible overflows and race conditions Intel VT-d driver: - Restore WO permissions on second-level paging entries - Use ida to manage domain id - Miscellaneous cleanups AMD-Vi: - Make sure notifiers finish running before module unload - Add support for HTRangeIgnore feature - Allow matching ACPI HID devices without matching UIDs ARM-SMMU: - SMMUv2: - Recognise the compatible string for SAR2130P MDSS in the Qualcomm driver, as this device requires an identity domain - Fix Adreno stall handling so that GPU debugging is more robust and doesn't e.g. result in deadlock - SMMUv3: - Fix ->attach_dev() error reporting for unrecognised domains - IO-pgtable: - Allow clients (notably, drivers that process requests from userspace) to silence warnings when mapping an already-mapped IOVA S390: - Add support for additional table regions Mediatek: - Add support for MT6893 MM IOMMU And some smaller fixes and improvements in various other drivers" * tag 'iommu-updates-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (75 commits) iommu/vt-d: Restore context entry setup order for aliased devices iommu/mediatek: Fix compatible typo for mediatek,mt6893-iommu-mm iommu/arm-smmu-qcom: Make set_stall work when the device is on iommu/arm-smmu: Move handing of RESUME to the context fault handler iommu/arm-smmu-qcom: Enable threaded IRQ for Adreno SMMUv2/MMU500 iommu/io-pgtable-arm: Add quirk to quiet WARN_ON() iommu: Clear the freelist after iommu_put_pages_list() iommu/vt-d: Change dmar_ats_supported() to return boolean iommu/vt-d: Eliminate pci_physfn() in dmar_find_matched_satc_unit() iommu/vt-d: Replace spin_lock with mutex to protect domain ida iommu/vt-d: Use ida to manage domain id iommu/vt-d: Restore WO permissions on second-level paging entries iommu/amd: Allow matching ACPI HID devices without matching UIDs iommu: make inclusion of arm/arm-smmu-v3 directory conditional iommu: make inclusion of riscv directory conditional iommu: make inclusion of amd directory conditional iommu: make inclusion of intel directory conditional iommu: remove duplicate selection of DMAR_TABLE iommu/fsl_pamu: remove trailing space after \n iommu/arm-smmu-qcom: Add SAR2130P MDSS compatible ...	2025-05-30 10:44:20 -07:00
Linus Torvalds	23022f5456	Merge tag 'dma-mapping-6.16-2025-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: "New two step DMA mapping API, which is is a first step to a long path to provide alternatives to scatterlist and to remove hacks, abuses and design mistakes related to scatterlists. This new approach optimizes some calls to DMA-IOMMU layer and cache maintenance by batching them, reduces memory usage as it is no need to store mapped DMA addresses to unmap them, and reduces some function call overhead. It is a combination effort of many people, lead and developed by Christoph Hellwig and Leon Romanovsky" * tag 'dma-mapping-6.16-2025-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: docs: core-api: document the IOVA-based API dma-mapping: add a dma_need_unmap helper dma-mapping: Implement link/unlink ranges API iommu/dma: Factor out a iommu_dma_map_swiotlb helper dma-mapping: Provide an interface to allow allocate IOVA iommu: add kernel-doc for iommu_unmap_fast iommu: generalize the batched sync after map interface dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h PCI/P2PDMA: Refactor the p2pdma mapping helpers	2025-05-27 20:09:06 -07:00
Joerg Roedel	879b141b7c	Merge branches 'fixes', 'apple/dart', 'arm/smmu/updates', 'arm/smmu/bindings', 'fsl/pamu', 'mediatek', 'renesas/ipmmu', 's390', 'intel/vt-d', 'amd/amd-vi' and 'core' into next	2025-05-23 17:14:32 +02:00
Tushar Dave	b3f6fcd840	iommu: Skip PASID validation for devices without PASID capability Generally PASID support requires ACS settings that usually create single device groups, but there are some niche cases where we can get multi-device groups and still have working PASID support. The primary issue is that PCI switches are not required to treat PASID tagged TLPs specially so appropriate ACS settings are required to route all TLPs to the host bridge if PASID is going to work properly. pci_enable_pasid() does check that each device that will use PASID has the proper ACS settings to achieve this routing. However, no-PASID devices can be combined with PASID capable devices within the same topology using non-uniform ACS settings. In this case the no-PASID devices may not have strict route to host ACS flags and end up being grouped with the PASID devices. This configuration fails to allow use of the PASID within the iommu core code which wrongly checks if the no-PASID device supports PASID. Fix this by ignoring no-PASID devices during the PASID validation. They will never issue a PASID TLP anyhow so they can be ignored. Fixes: `c404f55c26` ("iommu: Validate the PASID in iommu_attach_device_pasid()") Cc: stable@vger.kernel.org Signed-off-by: Tushar Dave <tdave@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250520011937.3230557-1-tdave@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-05-22 09:10:00 +02:00
Leon Romanovsky	dc2e692943	iommu: add kernel-doc for iommu_unmap_fast Add kernel-doc section for iommu_unmap_fast to document existing limitation of underlying functions which can't split individual ranges. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>	2025-05-06 08:36:53 +02:00
Christoph Hellwig	5c87cffe2d	iommu: generalize the batched sync after map interface For the upcoming IOVA-based DMA API we want to batch the ops->iotlb_sync_map() call after mapping multiple IOVAs from dma-iommu without having a scatterlist. Improve the API. Add a wrapper for the map_sync as iommu_sync_map() so that callers don't need to poke into the methods directly. Formalize __iommu_map() into iommu_map_nosync() which requires the caller to call iommu_sync_map() after all maps are completed. Refactor the existing sanity checks from all the different layers into iommu_map_nosync(). Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Will Deacon <will@kernel.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>	2025-05-06 08:36:53 +02:00
Jason Gunthorpe	e586e22974	iommu: Protect against overflow in iommu_pgsize() On a 32 bit system calling: iommu_map(0, 0x40000000) When using the AMD V1 page table type with a domain->pgsize of 0xfffff000 causes iommu_pgsize() to miscalculate a result of: size=0x40000000 count=2 count should be 1. This completely corrupts the mapping process. This is because the final test to adjust the pagesize malfunctions when the addition overflows. Use check_add_overflow() to prevent this. Fixes: `b1d99dc5f9` ("iommu: Hook up '->unmap_pages' driver callback") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/0-v1-3ad28fc2e3a3+163327-iommu_overflow_pgsize_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:33:30 +02:00
Robin Murphy	da33e87bd2	iommu: Handle yet another race around registration Next up on our list of race windows to close is another one during iommu_device_register() - it's now OK again for multiple instances to run their bus_iommu_probe() in parallel, but an iommu_probe_device() can still also race against a running bus_iommu_probe(). As Johan has managed to prove, this has now become a lot more visible on DT platforms wth driver_async_probe where a client driver is attempting to probe in parallel with its IOMMU driver - although commit `b46064a188` ("iommu: Handle race with default domain setup") resolves this from the client driver's point of view, this isn't before of_iommu_configure() has had the chance to attempt to "replay" a probe that the bus walk hasn't even tried yet, and so still cause the out-of-order group allocation behaviour that we're trying to clean up (and now warning about). The most reliable thing to do here is to explicitly keep track of the "iommu_device_register() is still running" state, so we can then special-case the ops lookup for the replay path (based on dev->iommu again) to let that think it's still waiting for the IOMMU driver to appear at all. This still leaves the longstanding theoretical case of iommu_bus_notifier() being triggered during bus_iommu_probe(), but it's not so simple to defer a notifier, and nobody's ever reported that being a visible issue, so let's quietly kick that can down the road for now... Reported-by: Johan Hovold <johan@kernel.org> Fixes: `bcb81ac6ae` ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/88d54c1b48fed8279aa47d30f3d75173685bb26a.1745516488.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:31:18 +02:00
Lu Baolu	6f7340120a	iommu: Allow attaching static domains in iommu_attach_device_pasid() The idxd driver attaches the default domain to a PASID of the device to perform kernel DMA using that PASID. The domain is attached to the device's PASID through iommu_attach_device_pasid(), which checks if the domain->owner matches the iommu_ops retrieved from the device. If they do not match, it returns a failure. if (ops != domain->owner \|\| pasid == IOMMU_NO_PASID) return -EINVAL; The static identity domain implemented by the intel iommu driver doesn't specify the domain owner. Therefore, kernel DMA with PASID doesn't work for the idxd driver if the device translation mode is set to passthrough. Generally the owner field of static domains are not set because they are already part of iommu ops. Add a helper domain_iommu_ops_compatible() that checks if a domain is compatible with the device's iommu ops. This helper explicitly allows the static blocked and identity domains associated with the device's iommu_ops to be considered compatible. Fixes: `2031c469f8` ("iommu/vt-d: Add support for static identity domain") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220031 Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-iommu/20250422191554.GC1213339@ziepe.ca/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250424034123.2311362-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:23:35 +02:00
Jason Gunthorpe	21c03574df	iommu: Hide ops.domain_alloc behind CONFIG_FSL_PAMU fsl_pamu is the last user of domain_alloc(), and it is using it to create something weird that doesn't really fit into the iommu subsystem architecture. It is a not a paging domain since it doesn't have any map/unmap ops. It may be some special kind of identity domain. For now just leave it as is. Wrap it's definition in CONFIG_FSL_PAMU to discourage any new drivers from attempting to use it. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/5-v4-ff5fb6b03bd1+288-iommu_virtio_domains_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:14:59 +02:00
Jason Gunthorpe	07107e7444	iommu/virtio: Move to domain_alloc_paging() virtio has the complication that it sometimes wants to return a paging domain for IDENTITY which makes this conversion a little different than other drivers. Add a viommu_domain_alloc_paging() that combines viommu_domain_alloc() and viommu_domain_finalise() to always return a fully initialized and finalized paging domain. Use viommu_domain_alloc_identity() to implement the special non-bypass IDENTITY flow by calling viommu_domain_alloc_paging() then viommu_domain_map_identity(). Remove support for deferred finalize and the vdomain->mutex. Remove core support for domain_alloc() IDENTITY as virtio was the last driver using it. Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/3-v4-ff5fb6b03bd1+288-iommu_virtio_domains_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:14:58 +02:00
Jason Gunthorpe	0d609a1450	iommu: Add domain_alloc_identity() virtio-iommu has a mode where the IDENTITY domain is actually a paging domain with an identity mapping covering some of the system address space manually created. To support this add a new domain_alloc_identity() op that accepts the struct device so that virtio can allocate and fully finalize a paging domain to return. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/2-v4-ff5fb6b03bd1+288-iommu_virtio_domains_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:14:57 +02:00
Lu Baolu	f984fb09e6	iommu: Remove iommu_dev_enable/disable_feature() No external drivers use these interfaces anymore. Furthermore, no existing iommu drivers implement anything in the callbacks. Remove them to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20250418080130.1844424-9-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-28 13:04:35 +02:00
Robin Murphy	0c8e9c148e	iommu: Avoid introducing more races Although the lock-juggling is only a temporary workaround, we don't want it to make things avoidably worse. Jason was right to be nervous, since bus_iommu_probe() doesn't care which IOMMU instance it's probing for, so it probably is possible for one walk to finish a probe which a different walk started, thus we do want to check for that. Also there's no need to drop the lock just to have of_iommu_configure() do nothing when a fwspec already exists; check that directly and avoid opening a window at all in that (still somewhat likely) case. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/09d901ad11b3a410fbb6e27f7d04ad4609c3fe4a.1741706365.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-17 16:31:07 +02:00
Robin Murphy	280e5a3010	iommu: Clear iommu-dma ops on cleanup If iommu_device_register() encounters an error, it can end up tearing down already-configured groups and default domains, however this currently still leaves devices hooked up to iommu-dma (and even historically the behaviour in this area was at best inconsistent across architectures/drivers...) Although in the case that an IOMMU is present whose driver has failed to probe, users cannot necessarily expect DMA to work anyway, it's still arguable that we should do our best to put things back as if the IOMMU driver was never there at all, and certainly the potential for crashing in iommu-dma itself is undesirable. Make sure we clean up the dev->dma_iommu flag along with everything else. Reported-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Closes: https://lore.kernel.org/all/CAGXv+5HJpTYmQ2h-GD7GjyeYT7bL9EBCvu0mz5LgpzJZtzfW0w@mail.gmail.com/ Tested-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/e788aa927f6d827dd4ea1ed608fada79f2bab030.1744284228.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-11 09:15:59 +02:00
Fedor Pchelkin	df4bf3fa1b	iommu: Fix crash in report_iommu_fault() The following crash is observed while handling an IOMMU fault with a recent kernel: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle page fault for address: ffff8c708299f700 PGD 19ee01067 P4D 19ee01067 PUD 101c10063 PMD 80000001028001e3 Oops: Oops: 0011 [#1] SMP NOPTI CPU: 4 UID: 0 PID: 139 Comm: irq/25-AMD-Vi Not tainted 6.15.0-rc1+ #20 PREEMPT(lazy) Hardware name: LENOVO 21D0/LNVNB161216, BIOS J6CN50WW 09/27/2024 RIP: 0010:0xffff8c708299f700 Call Trace: <TASK> ? report_iommu_fault+0x78/0xd3 ? amd_iommu_report_page_fault+0x91/0x150 ? amd_iommu_int_thread+0x77/0x180 ? __pfx_irq_thread_fn+0x10/0x10 ? irq_thread_fn+0x23/0x60 ? irq_thread+0xf9/0x1e0 ? __pfx_irq_thread_dtor+0x10/0x10 ? __pfx_irq_thread+0x10/0x10 ? kthread+0xfc/0x240 ? __pfx_kthread+0x10/0x10 ? ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ? ret_from_fork_asm+0x1a/0x30 </TASK> report_iommu_fault() checks for an installed handler comparing the corresponding field to NULL. It can (and could before) be called for a domain with a different cookie type - IOMMU_COOKIE_DMA_IOVA, specifically. Cookie is represented as a union so we may end up with a garbage value treated there if this happens for a domain with another cookie type. Formerly there were two exclusive cookie types in the union. IOMMU_DOMAIN_SVA has a dedicated iommu_report_device_fault(). Call the fault handler only if the passed domain has a required cookie type. Found by Linux Verification Center (linuxtesting.org). Fixes: `6aa63a4ec9` ("iommu: Sort out domain user data") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250408213342.285955-1-pchelkin@ispras.ru Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-04-11 09:03:44 +02:00
Linus Torvalds	48552153cf	Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Two significant new items: - Allow reporting IOMMU HW events to userspace when the events are clearly linked to a device. This is linked to the VIOMMU object and is intended to be used by a VMM to forward HW events to the virtual machine as part of emulating a vIOMMU. ARM SMMUv3 is the first driver to use this mechanism. Like the existing fault events the data is delivered through a simple FD returning event records on read(). - PASID support in VFIO. The "Process Address Space ID" is a PCI feature that allows the device to tag all PCI DMA operations with an ID. The IOMMU will then use the ID to select a unique translation for those DMAs. This is part of Intel's vIOMMU support as VT-D HW requires the hypervisor to manage each PASID entry. The support is generic so any VFIO user could attach any translation to a PASID, and the support should work on ARM SMMUv3 as well. AMD requires additional driver work. Some minor updates, along with fixes: - Prevent using nested parents with fault's, no driver support today - Put a single "cookie_type" value in the iommu_domain to indicate what owns the various opaque owner fields" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (49 commits) iommufd: Test attach before detaching pasid iommufd: Fix iommu_vevent_header tables markup iommu: Convert unreachable() to BUG() iommufd: Balance veventq->num_events inc/dec iommufd: Initialize the flags of vevent in iommufd_viommu_report_event() iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability vfio: VFIO_DEVICE_[AT\|DE]TACH_IOMMUFD_PT support pasid vfio-iommufd: Support pasid [at\|de]tach for physical VFIO devices ida: Add ida_find_first_range() iommufd/selftest: Add coverage for iommufd pasid attach/detach iommufd/selftest: Add test ops to test pasid attach/detach iommufd/selftest: Add a helper to get test device iommufd/selftest: Add set_dev_pasid in mock iommu iommufd: Allow allocating PASID-compatible domain iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support iommufd: Enforce PASID-compatible domain for RID iommufd: Support pasid attach/replace iommufd: Enforce PASID-compatible domain in PASID path iommufd/device: Add pasid_attach array to track per-PASID attach ...	2025-04-01 18:03:46 -07:00
Yi Liu	8a9e1e773f	iommu: Introduce a replace API for device pasid Provide a high-level API to allow replacements of one domain with another for specific pasid of a device. This is similar to iommu_replace_group_handle() and it is expected to be used only by IOMMUFD. Link: https://patch.msgid.link/r/20250321171940.7213-3-yi.l.liu@intel.com Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:30 -03:00
Yi Liu	ada14b9f1a	iommu: Require passing new handles to APIs supporting handle Add kdoc to highligt the caller of iommu_[attach\|replace]_group_handle() and iommu_attach_device_pasid() should always provide a new handle. This can avoid race with lockless reference to the handle. e.g. the find_fault_handler() and iommu_report_device_fault() in the PRI path. Link: https://patch.msgid.link/r/20250321171940.7213-2-yi.l.liu@intel.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:30 -03:00
Nicolin Chen	06d54f00f3	iommu: Drop sw_msi from iommu_domain There are only two sw_msi implementations in the entire system, thus it's not very necessary to have an sw_msi pointer. Instead, check domain->cookie_type to call the two sw_msi implementations directly from the core code. Link: https://patch.msgid.link/r/7ded87c871afcbaac665b71354de0a335087bf0f.1742871535.git.nicolinc@nvidia.com Suggested-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:19 -03:00
Robin Murphy	6aa63a4ec9	iommu: Sort out domain user data When DMA/MSI cookies were made first-class citizens back in commit `46983fcd67` ("iommu: Pull IOVA cookie management into the core"), there was no real need to further expose the two different cookie types. However, now that IOMMUFD wants to add a third type of MSI-mapping cookie, we do have a nicely compelling reason to properly dismabiguate things at the domain level beyond just vaguely guessing from the domain type. Meanwhile, we also effectively have another "cookie" in the form of the anonymous union for other user data, which isn't much better in terms of being vague and unenforced. The fact is that all these cookie types are mutually exclusive, in the sense that combining them makes zero sense and/or would be catastrophic (iommu_set_fault_handler() on an SVA domain, anyone?) - the only combination which might be reasonable is perhaps a fault handler and an MSI cookie, but nobody's doing that at the moment, so let's rule it out as well for the sake of being clear and robust. To that end, we pull DMA and MSI cookies apart a little more, mostly to clear up the ambiguity at domain teardown, then for clarity (and to save a little space), move them into the union, whose ownership we can then properly describe and enforce entirely unambiguously. [nicolinc: rebase on latest tree; use prefix IOMMU_COOKIE_; merge unions in iommu_domain; add IOMMU_COOKIE_IOMMUFD for iommufd_hwpt] Link: https://patch.msgid.link/r/1ace9076c95204bbe193ee77499d395f15f44b23.1742871535.git.nicolinc@nvidia.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:18 -03:00
Robin Murphy	73d2f10957	iommu: Don't warn prematurely about dodgy probes The warning for suspect probe conditions inadvertently got moved too early in a prior respin - it happened to work out OK for fwspecs, but in general still needs to be after the ops->probe_device call so drivers which filter devices for themselves have a chance to do that. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Fixes: `bcb81ac6ae` ("iommu: Get DT/ACPI parsing into the proper probe path") Link: https://lore.kernel.org/r/72a4853e7ef36e7c1c4ca171ed4ed8e1a463a61a.1741791691.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-12 16:58:27 +01:00
Robin Murphy	bcb81ac6ae	iommu: Get DT/ACPI parsing into the proper probe path In hindsight, there were some crucial subtleties overlooked when moving {of,acpi}_dma_configure() to driver probe time to allow waiting for IOMMU drivers with -EPROBE_DEFER, and these have become an ever-increasing source of problems. The IOMMU API has some fundamental assumptions that iommu_probe_device() is called for every device added to the system, in the order in which they are added. Calling it in a random order or not at all dependent on driver binding leads to malformed groups, a potential lack of isolation for devices with no driver, and all manner of unexpected concurrency and race conditions. We've attempted to mitigate the latter with point-fix bodges like iommu_probe_device_lock, but it's a losing battle and the time has come to bite the bullet and address the true source of the problem instead. The crux of the matter is that the firmware parsing actually serves two distinct purposes; one is identifying the IOMMU instance associated with a device so we can check its availability, the second is actually telling that instance about the relevant firmware-provided data for the device. However the latter also depends on the former, and at the time there was no good place to defer and retry that separately from the availability check we also wanted for client driver probe. Nowadays, though, we have a proper notion of multiple IOMMU instances in the core API itself, and each one gets a chance to probe its own devices upon registration, so we can finally make that work as intended for DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to be able to run the iommu_fwspec machinery currently buried deep in the wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be surprisingly straightforward to bootstrap this transformation by pretty much just calling the same path twice. At client driver probe time, dev->driver is obviously set; conversely at device_add(), or a subsequent bus_iommu_probe(), any device waiting for an IOMMU really should not have a driver already, so we can use that as a condition to disambiguate the two cases, and avoid recursing back into the IOMMU core at the wrong times. Obviously this isn't the nicest thing, but for now it gives us a functional baseline to then unpick the layers in between without many more awkward cross-subsystem patches. There are some minor side-effects like dma_range_map potentially being created earlier, and some debug prints being repeated, but these aren't significantly detrimental. Let's make things work first, then deal with making them nice. With the basic flow finally in the right order again, the next step is probably turning the bus->dma_configure paths inside-out, since all we really need from bus code is its notion of which device and input ID(s) to parse the common firmware properties with... Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/e3b191e6fd6ca9a1e84c5e5e40044faf97abb874.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-11 14:05:43 +01:00
Robin Murphy	3832862eb9	iommu: Keep dev->iommu state consistent At the moment, if of_iommu_configure() allocates dev->iommu itself via iommu_fwspec_init(), then suffers a DT parsing failure, it cleans up the fwspec but leaves the empty dev_iommu hanging around. So far this is benign (if a tiny bit wasteful), but we'd like to be able to reason about dev->iommu having a consistent and unambiguous lifecycle. Thus make sure that the of_iommu cleanup undoes precisely whatever it did. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/d219663a3f23001f23d520a883ac622d70b4e642.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-11 14:05:42 +01:00
Robin Murphy	fd598f71b6	iommu: Resolve ops in iommu_init_device() Since iommu_init_device() was factored out, it is in fact the only consumer of the ops which __iommu_probe_device() is resolving, so let it do that itself rather than passing them in. This also puts the ops lookup at a more logical point relative to the rest of the flow through __iommu_probe_device(). Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/fa4b6cfc67a352488b7f4e0b736008307ce9ac2e.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-11 14:05:41 +01:00
Robin Murphy	b46064a188	iommu: Handle race with default domain setup It turns out that deferred default domain creation leaves a subtle race window during iommu_device_register() wherein a client driver may asynchronously probe in parallel and get as far as performing DMA API operations with dma-direct, only to be switched to iommu-dma underfoot once the default domain attachment finally happens, with obviously disastrous consequences. Even the wonky of_iommu_configure() path is at risk, since iommu_fwspec_init() will no longer defer client probe as the instance ops are (necessarily) already registered, and the "replay" iommu_probe_device() call can see dev->iommu_group already set and so think there's nothing to do either. Fortunately we already have the right tool in the right place in the form of iommu_device_use_default_domain(), which just needs to ensure that said default domain is actually ready to be used. Deferring the client probe shouldn't have too much impact, given that this only happens while the IOMMU driver is probing, and thus due to kick the deferred probe list again once it finishes. Reported-by: Charan Teja Kalla <quic_charante@quicinc.com> Fixes: `98ac73f99b` ("iommu: Require a default_domain for all iommu drivers") Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/e88b94c9b575034a2c98a48b3d383654cbda7902.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-11 14:05:40 +01:00
Robin Murphy	29c6e1c2b9	iommu: Unexport iommu_fwspec_free() The drivers doing their own fwspec parsing have no need to call iommu_fwspec_free() since fwspecs were moved into dev_iommu, as returning an error from .probe_device will tear down the whole lot anyway. Move it into the private interface now that it only serves for of_iommu to clean up in an error case. I have no idea what mtk_v1 was doing in effectively guaranteeing a NULL fwspec would be dereferenced if no "iommus" DT property was found, so add a check for that to at least make the code look sane. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/36e245489361de2d13db22a510fa5c79e7126278.1740667667.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-03-11 14:05:39 +01:00
Joerg Roedel	59b6c3469e	Merge tag 'for-joerg' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd into core iommu shared branch with iommufd The three dependent series on a shared branch: - Change the iommufd fault handle into an always present hwpt handle in the domain - Give iommufd its own SW_MSI implementation along with some IRQ layer rework - Improvements to the handle attach API	2025-02-28 16:35:19 +01:00
Yi Liu	5e9f822c9c	iommu: Swap the order of setting group->pasid_array and calling attach op of iommu drivers The current implementation stores entry to the group->pasid_array before the underlying iommu driver has successfully set the new domain. This can lead to issues where PRIs are received on the new domain before the attach operation is completed. This patch swaps the order of operations to ensure that the domain is set in the underlying iommu driver before updating the group->pasid_array. Link: https://patch.msgid.link/r/20250226011849.5102-5-yi.l.liu@intel.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-28 10:07:14 -04:00
Yi Liu	e1ea9d30d8	iommu: Store either domain or handle in group->pasid_array iommu_attach_device_pasid() only stores handle to group->pasid_array when there is a valid handle input. However, it makes the iommu_attach_device_pasid() unable to detect if the pasid has been attached or not previously. To be complete, let the iommu_attach_device_pasid() store the domain to group->pasid_array if no valid handle. The other users of the group->pasid_array should be updated to be consistent. e.g. the iommu_attach_group_handle() and iommu_replace_group_handle(). Link: https://patch.msgid.link/r/20250226011849.5102-4-yi.l.liu@intel.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-28 10:07:14 -04:00
Yi Liu	473ec072a6	iommu: Drop iommu_group_replace_domain() iommufd does not use it now, so drop it. Link: https://patch.msgid.link/r/20250226011849.5102-3-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-28 10:07:14 -04:00
Yi Liu	237603a46a	iommu: Make @handle mandatory in iommu_{attach\|replace}_group_handle() Caller of the two APIs always provide a valid handle, make @handle as mandatory parameter. Take this chance incoporate the handle->domain set under the protection of group->mutex in iommu_attach_group_handle(). Link: https://patch.msgid.link/r/20250226011849.5102-2-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-28 10:07:14 -04:00
Jason Gunthorpe	288683c92b	iommu: Make iommu_dma_prepare_msi() into a generic operation SW_MSI supports IOMMU to translate an MSI message before the MSI message is delivered to the interrupt controller. On such systems, an iommu_domain must have a translation for the MSI message for interrupts to work. The IRQ subsystem will call into IOMMU to request that a physical page be set up to receive MSI messages, and the IOMMU then sets an IOVA that maps to that physical page. Ultimately the IOVA is programmed into the device via the msi_msg. Generalize this by allowing iommu_domain owners to provide implementations of this mapping. Add a function pointer in struct iommu_domain to allow a domain owner to provide its own implementation. Have dma-iommu supply its implementation for IOMMU_DOMAIN_DMA types during the iommu_get_dma_cookie() path. For IOMMU_DOMAIN_UNMANAGED types used by VFIO (and iommufd for now), have the same iommu_dma_sw_msi set as well in the iommu_get_msi_cookie() path. Hold the group mutex while in iommu_dma_prepare_msi() to ensure the domain doesn't change or become freed while running. Races with IRQ operations from VFIO and domain changes from iommufd are possible here. Replace the msi_prepare_lock with a lockdep assertion for the group mutex as documentation. For the dmau_iommu.c each iommu_domain is unique to a group. Link: https://patch.msgid.link/r/4ca696150d2baee03af27c4ddefdb7b0b0280e7b.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-02-21 10:04:12 -04:00
Easwar Hariharan	78be7f0453	iommu: Fix a spelling error Fix spelling error IDENITY -> IDENTITY in drivers/iommu/iommu.c. Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250128190522.70800-1-eahariha@linux.microsoft.com [ joro: Add commit message ] Signed-off-by: Joerg Roedel <jroedel@suse.de>	2025-02-14 09:12:46 +01:00
Joerg Roedel	125f34e4c1	Merge branches 'arm/smmu/updates', 'arm/smmu/bindings', 'qualcomm/msm', 'rockchip', 'riscv', 'core', 'intel/vt-d' and 'amd/amd-vi' into next	2025-01-17 09:02:35 +01:00

1 2 3 4 5 ...

529 Commits