linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-30 04:22:32 -04:00

Author	SHA1	Message	Date
Matthew Brost	8af13c3fc1	drm/xe: Store process name and pid in xe file An xe file can outlive the associated process as the GPU cleanup is just triggered upon file close (process kill) and completes sometime later. If the file close triggers error conditions (GPU hangs) the process cannot be safely referenced to retrieve the name and pid for debug information. Store the process name and pid directly in the xe file to be safe. v2: - Access file->pid via rcu_access_pointer (Matthew Auld) Fixes: `b10d0c5e9d` ("drm/xe: Add process name to devcoredump") Fixes: `f6ca930d97` ("drm/xe: Add process name and PID to job timedout message") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240723151045.1725417-1-matthew.brost@intel.com	2024-07-23 10:45:40 -07:00
Matthew Brost	d930c19fdf	drm/xe: Build PM into GuC CT layer Take PM ref when any G2H are outstanding, drop when none are outstanding. To safely ensure we have PM ref when in the GuC CT layer, a PM ref needs to be held when scheduler messages are pending too. v2: - Add outer PM protections to xe_file_close (CI) v3: - Only take PM ref 0->1 and drop on 1->0 (Matthew Auld) v4: - Add assert to G2H increment function v5: - Rebase v6: - Declare xe as local variable in xe_file_close (CI) Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240719172905.1527927-5-matthew.brost@intel.com	2024-07-19 19:45:34 -07:00
Umesh Nerlige Ramappa	ce8c161cba	drm/xe: Add ref counting for xe_file Add ref counting for xe_file. v2: - Add kernel doc for exported functions (Matt) - Instead of xe_file_destroy, export the get/put helpers (Lucas) v3: Fixup the kernel-doc format and description (Matt, Lucas) Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240718210548.3580382-3-umesh.nerlige.ramappa@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-07-19 00:31:08 -07:00
Umesh Nerlige Ramappa	3d0c4a62cc	drm/xe: Move part of xe_file cleanup to a helper In order to make xe_file ref counted, move destruction of xe_file members to a helper. v2: Move xe_vm_close_and_put back into xe_file_close (Matt) Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240718210548.3580382-2-umesh.nerlige.ramappa@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-07-19 00:31:07 -07:00
Matthew Brost	452bca0edb	drm/xe: Don't suspend device upon wedge When wedging a device we shouldn't be suspending device as state for debug will be lost. Also this appears to not work as the below stack trace pops upon trying to resume a wedged device: [ 304.245044] INFO: task cat:12115 blocked for more than 151 seconds. [ 304.251333] Tainted: G W 6.10.0-rc7-xe+ #3518 [ 304.257617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 304.265459] task:cat state:D stack:13384 pid:12115 tgid:12115 ppid:3986 flags:0x00000006 [ 304.265465] Call Trace: [ 304.265467] <TASK> [ 304.265469] __schedule+0x3c4/0xdf0 [ 304.265478] schedule+0x3c/0x140 [ 304.265481] rpm_resume+0x1cc/0x740 [ 304.265484] ? __pfx_autoremove_wake_function+0x10/0x10 [ 304.265489] __pm_runtime_resume+0x49/0x80 [ 304.265494] guc_info+0x6b/0xb0 [xe] [ 304.265538] ? __pfx___drm_printfn_seq_file+0x10/0x10 [ 304.265541] ? __pfx___drm_puts_seq_file+0x10/0x10 [ 304.265545] seq_read_iter+0x111/0x4c0 [ 304.265551] seq_read+0xfc/0x140 [ 304.265556] full_proxy_read+0x58/0x80 [ 304.265560] vfs_read+0xa7/0x360 [ 304.265563] ? find_held_lock+0x2b/0x80 [ 304.265568] ksys_read+0x64/0xe0 [ 304.265571] do_syscall_64+0x68/0x140 [ 304.265575] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 304.265578] RIP: 0033:0x7f4254d14992 [ 304.265580] RSP: 002b:00007ffc558666f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 304.265583] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f4254d14992 [ 304.265584] RDX: 0000000000020000 RSI: 00007f4254ebb000 RDI: 0000000000000003 [ 304.265586] RBP: 00007f4254ebb000 R08: 00007f4254eba010 R09: 00007f4254eba010 [ 304.265587] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000 [ 304.265588] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [ 304.265593] </TASK> [ 304.265594] Showing all locks held in the system: [ 304.265598] 1 lock held by khungtaskd/57: [ 304.265599] #0: ffffffff8273b860 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x36/0x1c0 [ 304.265607] 3 locks held by kworker/6:1/90: [ 304.265610] 1 lock held by in:imklog/547: [ 304.265611] #0: ffff88810498cd88 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x76/0xc0 [ 304.265620] 1 lock held by dmesg/1310: v2: Drop local 'err' variable (Jonathan) Fixes: `8ed9aaae39` ("drm/xe: Force wedged state and block GT reset upon any GPU hang") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240716063902.1390130-2-matthew.brost@intel.com	2024-07-17 12:01:34 -07:00
Matthew Brost	7dbe8af13c	drm/xe: Wedge the entire device Wedge the entire device, not just GT which may have triggered the wedge. To implement this, cleanup the layering so xe_device_declare_wedged() calls into the lower layers (GT) to ensure entire device is wedged. While we are here, also signal any pending GT TLB invalidations upon wedging device. Lastly, short circuit reset wait if device is wedged. v2: - Short circuit reset wait if device is wedged (Local testing) Fixes: `8ed9aaae39` ("drm/xe: Force wedged state and block GT reset upon any GPU hang") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240716063902.1390130-1-matthew.brost@intel.com	2024-07-17 11:58:26 -07:00
Tejas Upadhyay	71733b8d7f	drm/xe/xe2: Make subsequent L2 flush sequential Issuing the flush on top of an ongoing flush is not desirable. Lets use lock to make it sequential. Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240710052750.3031586-1-tejas.upadhyay@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-07-11 12:36:55 +02:00
Matthew Auld	01570b4469	drm/xe/bmg: implement Wa_16023588340 This involves enabling l2 caching of host side memory access to VRAM through the CPU BAR. The main fallout here is with display since VRAM writes from CPU can now be cached in GPU l2, and display is never coherent with caches, so needs various manual flushing. In the case of fbc we disable it due to complications in getting this to work correctly (in a later patch). Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Vinod Govindapillai <vinod.govindapillai@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240703124338.208220-3-matthew.auld@intel.com	2024-07-05 09:53:12 +01:00
Michal Wajdeczko	3078d9c8b6	drm/xe: Use VF_CAP_REG for device wmb To force a write barrier on the device memory, we write to the SOFTWARE_FLAGS_SPR33 register, but this particular register was selected because it was one of the writable and unused register. Since a write barrier should also work if we use the read-only register, switch to VF_CAP_REG register that is also marked as accessible for VFs. While at it, add simple kernel-doc for xe_device_wmb() function. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240702183704.1022-4-michal.wajdeczko@intel.com	2024-07-04 11:55:40 +02:00
Ashutosh Dixit	8169b2097d	drm/xe/uapi: Rename xe perf layer as xe observation layer In Xe, the perf layer allows capture of HW counter streams. These HW counters are generally performance related but don't have to be necessarily so. Also, the name "perf" is a carryover from i915 and is not preferred. Here we propose the name "observation" for this common layer which allows capture of different types of these counter streams. v2: Rename observability layer to observation layer (Lucas/Rodrigo) v3: Rename sysctl file to "observation_paranoid" (Jose) Fixes: `52c2e956dc` ("drm/xe/perf/uapi: "Perf" layer to support multiple perf counter stream types") Fixes: `fe8929bdf8` ("drm/xe/perf/uapi: Add perf_stream_paranoid sysctl") Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240703164801.2561423-1-ashutosh.dixit@intel.com	2024-07-03 16:46:02 -07:00
Ilia Levi	80bab5c503	drm/xe/irq: remove xe_irq_shutdown The cleanup is done by devres in irq_uninstall. Commit `bbc9651fe9` ("drm/xe/irq: move irq_uninstall over to devm") resolved the ordering issue where irq_uninstall (registered with drmm) was called after pci_free_irq_vectors (registered with devm upon calling pci_alloc_irq_vectors). This happened because drmm action list is registered with devm very early in the init flow - before pci_alloc_irq_vectors. Now that irq_uninstall is registered with devm, it will be called before pci_free_irq_vectors and we can remove xe_irq_shutdown. Signed-off-by: Ilia Levi <illevi@habana.ai> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240606124705.822451-1-illevi@habana.ai Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-06-26 18:25:22 -04:00
Vinay Belgaumkar	3b1592fb78	drm/xe/lnl: Apply Wa_22019338487 This WA requires us to limit media GT frequency requests to a certain cap value during driver load. Freq limits are restored after load completes, so perf will not be affected during normal operations. During normal driver operation, this WA requires dummy writes to media offset 0x380D8C after every ~63 GGTT writes. This will ensure completion of the LMEM writes originating from Gunit. During driver unload(before FLR), the WA requires that we set requested frequency to the cap value again. v3: Do not use WA number in function name. Call WA wrapper from xe_device. Rename some variables, check for locks in the correct function (Rodrigo). Ensure reset path is also covered for this WA. v4: Fix BAT failure v5: Add a function pointer for ggtt_ops (Michal W) v6: Fix name collision and use static function (Rodrigo) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240620224928.3986377-2-vinay.belgaumkar@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-06-26 18:23:45 -04:00
Rodrigo Vivi	8664e76373	Merge drm/drm-next into drm-xe-next Need to sync some header include that propagated through drm-intel-next. v2: After some changes in drm/drm-next Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-06-26 18:22:52 -04:00
Michal Wajdeczko	65336c3fa2	drm/xe/vf: Disable features that do not apply to VFs We already maintain several flags that control the availability of features on a given device. Disable features, like PCODE or GuC PC or GSC, that do not apply to a VF device. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240620100147.949-1-michal.wajdeczko@intel.com	2024-06-20 19:49:34 +02:00
Jani Nikula	d754ed2821	Merge drm/drm-next into drm-intel-next Sync to v6.10-rc3. Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2024-06-19 11:38:31 +03:00
Ashutosh Dixit	cdf02fe1a9	drm/xe/oa/uapi: Add/remove OA config perf ops Introduce add/remove config perf ops for OA. OA configurations consist of a set of event/counter select register address/value pairs. The add_config perf op validates and stores such configurations and also exposes them in the metrics sysfs. These configurations will be programmed to OA unit HW when an OA stream using a configuration is opened. The OA stream can also switch to other stored configurations. v2: Start config id's from 1 and other minor review comments (Umesh) v3: Add 32 bit build v4: Add kernel doc for non-static functions (Michal) Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240618014609.3233427-6-ashutosh.dixit@intel.com	2024-06-18 12:40:31 -07:00
Ashutosh Dixit	67977882a2	drm/xe/oa/uapi: Add OA data formats Add and initialize supported OA data formats for various platforms (including Xe2). User can request OA data in any supported format. Bspec: 52198, 60942, 61101 v2: Start 'xe_oa_format_name' enum from 0 (Umesh) Fix error rewind with OA (Umesh) v3: Use graphics versions rather than absolute platform names v4: Add missing kernel doc for struct memebers and enum and other minor changes (Michal) Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240618014609.3233427-4-ashutosh.dixit@intel.com	2024-06-18 12:40:27 -07:00
Ashutosh Dixit	52c2e956dc	drm/xe/perf/uapi: "Perf" layer to support multiple perf counter stream types In Xe, the plan is to support multiple types of perf counter streams (OA is only one type of these streams). Rather than introduce NxM ioctls for these (N perf streams with M ioctl's per perf stream), we decide to multiplex these (N different stream types and the M ops for each of these stream types) through a single PERF ioctl. This multiplexing is the purpose of the PERF layer. In addition to PERF DRM ioctl's, another set of ioctl's on the PERF fd are defined. These are expected to be common to different PERF stream types and therefore defined at the PERF layer itself. v2: Add param_size to 'struct drm_xe_perf_param' (Umesh) v3: Rename 'enum drm_xe_perf_ops' to 'enum drm_xe_perf_ioctls' (Guy Zadicario) Add DRM_ prefix to ioctl names to indicate uapi names v4: Add 'enum drm_xe_perf_op' previously missed out (Guy Zadicario) v5: Squash the ops and PERF layer patches into a single patch (Umesh) Remove param_size from struct 'drm_xe_perf_param' (Umesh) v6: Add DRM_XE_PERF_IOCTL_STATUS v7: Add DRM_XE_PERF_IOCTL_INFO v8: Fix Copyright years, fix DRM_XE_PERF_TYPE_MAX, move '#include "xe_perf.h"' to xe_perf.c, add kernel doc (Michal) Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Guy Zadicario <gzadicario@habana.ai> Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240618014609.3233427-2-ashutosh.dixit@intel.com	2024-06-18 12:40:24 -07:00
Dave Airlie	7957066ca6	Merge tag 'drm-xe-next-2024-06-06' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next UAPI Changes: - Expose the L3 bank mask (Francois) Cross-subsystem Changes: - Update Xe driver maintainers (Oded) Display (i915): - Add missing include to intel_vga.c (Michal Wajdeczko) Driver Changes: - Fix Display (xe-only) detection for ADL-N (Lucas) - Runtime PM fixes that enabled PC-10 and D3Cold (Francois, Rodrigo) - Fix unexpected silent drm backmerge issues (Thomas) - More (a lot more) preparation for SR-IOV support (Michal Wajdeczko) - Devcoredump fixes and improvements (Jose, Tejas, Matt Brost) - Introduce device 'wedged' state (Rodrigo) - Improve debug and info messages (Michal Wajdeczko, Rodrigo, Nirmoy) - Adding or fixing workarounds (Tejas, Shekhar, Lucas, Bommu) - Check result of drmm_mutex_init (Michal Wajdeczko) - Enlarge the critical dma fence area for preempt fences (Matt Auld) - Prevent UAF in VM's rebind work (Matt Auld) - GuC submit related clean-ups and fixes (Matt Brost, Himal, Jonathan, Niranjana) - Prefer local helpers to perform dma reservation locking (Himal) - Spelling and typo fixes (Colin, Francois) - Prep patches for 1 job per VM bind IOCTL (no uapi change yet) (Matt Brost) - Remove uninitialized end var from xe_gt_tlb_invalidation_range (Nirmoy) - GSC related changes targeting LNL support (Daniele) - Fix assert in L3 bank mask generation (Francois) - Perform dma_map when moving system buffer objects to TT (Thomas) - Add helpers for manipulating macro arguments (Michal Wajdeczko) - Refactor default device atomic settings (Nirmoy) - Add debugfs node to dump mocs (Janga) - Use ordered WQ for G2H handler (Matt Brost) - Clean up and fixes in header includes (Michal Wajdeczko) - Prefer flexible-array over deprecated zero-lenght ones (Lucas) - Add Indirect Ring State support (Niranjana) - Fix UBSAN shift-out-of-bounds failure (Shuicheng) - HWMon fixes and additions (Karthik) - Clean-up refactor around probe init functions (Lucas, Michal Wajdeczko) - Fix PCODE init function (Himal) - Only use reserved BCS instances for usm migrate exec queue (Matt Brost) - Only zap PTEs as needed (Matt Brost) - Per client usage info (Lucas) - Core hotunplug improvements converting stuff towards devm (Matt Auld) - Don't emit false error if running in execlist mode (Michal Wajdeczko) - Remove unused struct (Dr. David) - Support/debug for slow GuC loads (John Harrison) - Decouple job seqno and lrc seqno (Matt Brost) - Allow migrate vm gpu submissions from reclaim context (Thomas) - Rename drm-client running time to run_ticks and fix a UAF (Umesh) - Check empty pinned BO list with lock held (Nirmoy) - Drop undesired prefix from the platform name (Michal Wajdeczko) - Remove unwanted mutex locking on xe file close (Niranjana) - Replace format-less snprintf() with strscpy() (Arnd) - Other general clean-ups on registers definitions and function names (Michal Wajdeczko) - Add kernel-doc to some xe_lrc interfaces (Niranajana) - Use missing lock in relay_needs_worker (Nirmoy) - Drop redundant W=1 warnings from Makefile (Jani) - Simplify if condition in preempt fences code (Thorsten) - Flush engine buffers before signalling user fence on all engines (Andrzej) - Don't overmap identity VRAM mapping (Matt Brost) - Do not dereference NULL job->fence in trace points (Matt Brost) - Add synchronous gt reset debugfs (Jonathan) - Xe gt_idle fixes (Riana) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZmItmuf7vq_xvRjJ@intel.com	2024-06-11 09:09:07 +10:00
Michal Wajdeczko	638d1c79cb	drm/xe: Promote VRAM initialization function to own file There is no point in mixing register access and VRAM code in the same file. Move and rename the VRAM probe function to a new file (there are no other changes other then new simple kernel-doc). Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240530133527.1328-5-michal.wajdeczko@intel.com	2024-05-30 23:50:28 +02:00
Daniele Ceraolo Spurio	1c4324793e	Revert "drm/xe: make gt_remove use devm" This reverts commit `cd506a33b0`. The gt_remove function was explicitly added as part of the remove flow instead of using drmm/devm automatic cleanup due to it being illegal to remove a component after the driver has been detached from the pci device; the GSC proxy component is removed as part of gt_remove, so we need to do it in the pci cleanup flow. The function already has a comment above it to explain this. Note that the change to use the devm also caused an invalid pointer deref in the gsc_proxy unbind function, but I didn't bother to debug which pointer was bad since we shouldn't be calling the unbind that late anyway and this revert fixes it. Both issue were not seen in CI because the GSC loading is temporarily disabled due to a critical bug, which means we're not binding the component. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240528182354.1200424-1-daniele.ceraolospurio@intel.com	2024-05-30 13:24:36 -07:00
Niranjana Vishwanathapura	0568a4086a	drm/xe: Remove unwanted mutex locking Do not hold xef->exec_queue.lock mutex while parsing the xarray xef->exec_queue.xa in xe_file_close() as it is not needed and will cause an unwanted dependency between this lock and the vm->lock. This lock protects the exec queue lookup and reference taking which doesn't apply to this code path. When FD is closing, IOCTLs presumably can't be modifying the xarray. v2: Update commit text (Matt Brost) v3: Add more code comment (Rodrigo Vivi) v4: Further expand code comment (Rodirgo Vivi) Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jagmeet Randhawa <jagmeet.randhawa@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240529221639.23117-1-niranjana.vishwanathapura@intel.com	2024-05-29 23:44:31 -07:00
Michal Wajdeczko	ea797cf4b7	drm/xe/vf: Read VF configuration prior to GGTT initialization Each VF will be assigned with only a limited range of the GGTT address space. Make sure that VF driver will read its own GGTT configuration before starting any GGTT initialization. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524113714.932-2-michal.wajdeczko@intel.com	2024-05-27 18:46:26 +02:00
Matthew Auld	c711741978	drm/xe: reset mmio mappings with devm Set our various mmio mappings to NULL. This should make it easier to catch something rogue trying to mess with mmio after device removal. For example, we might unmap everything and then start hitting some mmio address which has already been unmamped by us and then remapped by something else, causing all kinds of carnage. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-33-matthew.auld@intel.com	2024-05-22 13:22:40 +01:00
Matthew Auld	cd506a33b0	drm/xe: make gt_remove use devm No need to hand roll the onion unwind here, just move gt_remove over to devm which will already have the correct ordering. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-31-matthew.auld@intel.com	2024-05-22 13:22:40 +01:00
Matthew Auld	cee70645a7	drm/xe/device: move xe_device_sanitize over to devm Disable GuC submission when removing the device. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-28-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	bc54f42c0e	drm/xe/device: move flr to devm Should be called when driver is removed, not when this particular driver instance is destroyed. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-27-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	4465b8c6d3	drm/xe/pci: remove broken driver_release This is quite broken since we are nuking the pdev link to the private driver struct, but note here that driver_release is called when the drm_device is released (poor mans drmm), which can be long after the device has been removed. So here what we are actually doing is nuking the pdev link for what is potentially bound to a different drm_device. If that happens before our pci remove callback is triggered (for the new drm_device) we silently exit and skip some important cleanup steps, resulting in hilarity. There should be no reason to implement driver_release, when we already have nicer stuff like drmm, so just remove completely. The actual pdev link is already nuked when removing the device. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-19-matthew.auld@intel.com	2024-05-22 13:22:38 +01:00
Michal Wajdeczko	26a22952c8	drm/xe: Don't rely on indirect includes from xe_mmio.h These compilation units use udelay() or some GT oriented printk functions without explicitly including proper header files, and relying on #includes from the xe_mmio.h instead. Fix that. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240520181814.2392-3-michal.wajdeczko@intel.com	2024-05-22 12:11:26 +02:00
Lucas De Marchi	d1855d284e	drm/xe: Move sw-only pcode initialization Move it to xe_gt_init_early() that initializes the sw-only part for each gt. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240513213751.1017791-5-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-05-13 21:21:13 -07:00
Lucas De Marchi	45b9066ec3	drm/xe: Move xe_force_wake_init_gt() inside gt initialization xe_force_wake_init_gt() is a software-only initialization and doesn't need to be called from xe_device_probe(). Move it to initialize together with the gt. Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240513213751.1017791-4-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-05-13 21:21:13 -07:00
Lucas De Marchi	65c4de2a91	drm/xe: Move xe_gt_init_early() where it belongs Early shall be early enough, stop doing other things with gt before it. Now that xe_gt_init_early() doesn't need forcewake and doesn't depend on the fake engine_mask initialization, move it where it belongs: it doesn't need to be after hwconfig config anymore. Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240513213751.1017791-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-05-13 21:21:13 -07:00
Michal Wajdeczko	62010b3cd6	drm/xe: Move xe_gpu_commands.h file to instructions/ All other files with commands definitions are in instructions/ folder. Move xe_gpu_commands.h also there. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240508174856.1908-1-michal.wajdeczko@intel.com	2024-05-09 21:17:57 +02:00
Michal Wajdeczko	b7f6318a9c	drm/xe: Fix xe_device.h Some explicit includes are needed only from the xe_device.c. And there is no need for redundant forward declarations. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240507110959.2747-4-michal.wajdeczko@intel.com	2024-05-07 23:21:21 +02:00
Nirmoy Das	c01c6066e6	drm/xe/device: implement transient flush Display surfaces can be tagged as transient by mapping it using one of the various L3:XD PAT index modes on Xe2. The expectation is that KMD needs to request transient data flush at the start of flip sequence to ensure all transient data in L3 cache is flushed to memory. Add a routine for this which we can then call from the display code. v2: rebase(RK) Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Co-developed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240430172850.1881525-18-radhakrishna.sripada@intel.com	2024-05-03 13:15:54 -07:00
Dave Airlie	9f9039c6ef	Merge tag 'drm-intel-next-2024-04-30' of https://anongit.freedesktop.org/git/drm/drm-intel into drm-next Core DRM: - Export drm_client_dev_unregister (Thomas Zimmermann) Display i915: - More initial work to make display code more independent from i915 (Jani) - Convert i915/xe fbdev to DRM client (Thomas Zimmermann) - VLV/CHV DPIO register cleanup (Ville) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZjFPcSCTd_5c0XU_@intel.com	2024-05-02 14:30:31 +10:00
Colin Ian King	445237d67a	drm/xe: Fix spelling mistake "forcebly" -> "forcibly" There is a spelling mistake in a drm_dbg message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240426094904.816033-1-colin.i.king@gmail.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-04-26 05:54:05 -07:00
Thomas Zimmermann	f3a36cb5d9	drm/{i915,xe}: Unregister in-kernel clients Unregister all in-kernel clients before unloading the i915 driver. For other drivers, drm_dev_unregister() does this automatically. As i915 and xe do not use this helper, they have to perform the call by themselves. Note that there are currently no in-kernel clients in i915 or xe. The patch prepares the drivers for a related update of their fbdev support. v8: - unregister clients in intel_display_driver_unregister() (Jani) - mention xe in commit message (Rodrigo, Jani) v7: - update xe driver Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Jouni Högander <jouni.hogander@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240409081029.17843-5-tzimmermann@suse.de Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2024-04-25 14:25:50 +03:00
Michal Wajdeczko	cbf7579304	drm/xe: Check result of drmm_mutex_init() Although it's unlikely that drmm_mutex_init() will fail during driver initialization, however we shouldn't ignore this case. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240409153132.1111-1-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-04-24 15:37:06 -07:00
Rodrigo Vivi	6b8ef44cc0	drm/xe: Introduce the wedged_mode debugfs So, the wedged mode can be selected per device at runtime, before the tests or before reproducing the issue. v2: - s/busted/wedged - some locking consistency v3: - remove mutex - toggle guc reset policy on any mode change Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-4-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-24 12:12:58 -04:00
Rodrigo Vivi	8ed9aaae39	drm/xe: Force wedged state and block GT reset upon any GPU hang In many validation situations when debugging GPU Hangs, it is useful to preserve the GT situation from the moment that the timeout occurred. This patch introduces a module parameter that could be used on situations like this. If xe.wedged module parameter is set to 2, Xe will be declared wedged on every single execution timeout (a.k.a. GPU hang) right after devcoredump snapshot capture and without attempting any kind of GT reset and blocking entirely any kind of execution. v2: Really block gt_reset from guc side. (Lucas) s/wedged/busted (Lucas) v3: - s/busted/wedged - Really use global_flags (Dafna) - More robust timeout handling when wedging it. v4: A really robust clean exit done by Matt Brost. No more kernel warns on unbind. v5: Simplify error message (Lucas) Cc: Matthew Brost <matthew.brost@intel.com> Cc: Dafna Hirschfeld <dhirschfeld@habana.ai> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Himanshu Somaiya <himanshu.somaiya@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-3-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-24 12:12:58 -04:00
Rodrigo Vivi	fb74b205cd	drm/xe: Introduce a simple wedged state Introduce a very simple 'wedged' state where any attempt to access the GPU is entirely blocked. On some critical cases, like on gt_reset failure, we need to block any other attempt to use the GPU. Otherwise we are at a risk of reaching cases that would force us to reboot the machine. So, when this cases are identified we corner and block any GPU access. No IOCTL and not even another GT reset should be attempted. The 'wedged' state in Xe is an end state with no way back. Only a device "re-probe" (unbind + bind) can restore the GPU access. v2: - s/wedged/busted (Lucas) - use unbind+bind instead of module reload (Lucas) - added more info on unbind operations and instruction on bug report - only print the message once. v3: - s/busted/wedged (Ashutosh, Tvrtko, Thomas) - don't assume user has sudo and tee available (Lucas) v4: - remove unnecessary cases around ct communication or migration. Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> #v2 Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-24 12:12:58 -04:00
Rodrigo Vivi	783d6cdc82	drm/xe: Kill xe_device_mem_access_{get*,put} Let's simply convert all the current callers towards direct xe_pm_runtime access and remove this extra layer of indirection. No functional change is expected with this patch since xe_mem_access_get was already using the xe_pm_runtime_get_noresume at this point. v2: Convert all the current callers instead of a big refactor at once. v3: - Rebased - Squashed the GSC/HDCP - Added a new case: sriov_pf_policy - Improved commit message to highlight that there's no functional change in this patch. Reviewed-by: Matthew Auld <matthew.auld@intel.com> #v2 Cc: Suraj Kandpal <suraj.kandpal@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240418143049.43231-1-rodrigo.vivi@intel.com	2024-04-22 09:03:09 -04:00
Himal Prasad Ghimiray	5a73dd61a0	drm/xe: Simplify function return using drmm_add_action_or_reset() Instead of assigning the value of drmm_add_action_or_reset() to err and returning err in case of failure and 0 in case of success, simply return the result of drmm_add_action_or_reset(). -v2: cleanup in xe_display too. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-2-himal.prasad.ghimiray@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-04-18 13:26:34 -07:00
Rodrigo Vivi	e1feade077	drm/xe: Ensure all the inner access are using the _noresume variant At this point mem_access references should be only used as inner points of the execution and a get with synchronous resume previously called at an outer point. So, before killing mem_acces in favor of direct accsess, let's ensure that we first convert them towards the new _noresume variant that will WARN us if no inner caller happened. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-9-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-18 08:31:40 -04:00
Rodrigo Vivi	16b57c90bb	drm/xe: Convert mem_access_if_ongoing to direct xe_pm_runtime_get_if_active Now that assert_mem_access is relying directly on the pm_runtime state instead of the counters, there's no reason why we cannot use the pm_runtime functions directly. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-8-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-18 08:31:40 -04:00
Rodrigo Vivi	a382291017	drm/xe: Removing extra mem_access protection from runtime pm This is not needed any longer, now that we have all the protection in place with the runtime pm itself. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-7-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-18 08:31:40 -04:00
Rodrigo Vivi	8ae84a2744	drm/xe: Move lockdep protection from mem_access to xe_pm_runtime The mem_access itself is not holding any lock, but attempting to train lockdep with possible scarring locks happening during runtime pm. We are going soon to kill the mem_access get and put helpers in favor of direct xe_pm_runtime calls, so let's just move this lock around to where it now belongs. v2: s/lockdep_training/lockdep_prime (Matt Auld) Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-4-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-18 08:31:39 -04:00
José Roberto de Souza	4209d635a8	drm/xe: Remove devcoredump during driver release This will remove devcoredump from file system and free its resources during driver unload. This fix the driver unload after gpu hang happened, otherwise this it would report that Xe KMD is still in use and it would leave the kernel in a state that Xe KMD can't be unload without a reboot. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Acked-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240409200206.108452-2-jose.souza@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-11 09:39:48 -04:00
Riana Tauro	797b0e9be0	drm/xe: re-order lmem init check and wait for initialization to complete Lmem init check should be done only after pcode initialization status is complete. Move lmem init check after pcode status check. Also wait for a short while after pcode status check to allow completion of the task. Failing to do so, can lead to aborting the module load leaving the system unusable. Wait until the lmem initialization is complete within a timeout (60s) or till the user aborts. v2: use bool as return type re-order the code comment (Rodrigo) add comment for deferring probe (Himal) v3: rebase Signed-off-by: Riana Tauro <riana.tauro@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240410085005.1126343-3-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-10 12:32:15 -04:00

1 2 3

122 Commits