linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-19 23:34:00 -04:00

Author	SHA1	Message	Date
Brian Welty	19c0222524	drm/xe: Fix modifying exec_queue priority in xe_migrate_init After exec_queue has been created, we cannot simply modify q->priority. This needs to be done by the backend via q->ops. However in this case, it would be more efficient to simply pass a flag when creating the exec_queue and set the desired priority upfront during queue creation. To that end: new flag EXEC_QUEUE_FLAG_HIGH_PRIORITY is introduced. The priority field is moved to be with other scheduling properties and is now exec_queue.sched_props.priority. This is no longer set to initial value by the backend, but is now set within __xe_exec_queue_create(). Fixes: `b4eecedc75` ("drm/xe: Fix potential deadlock handling page faults") Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> (cherry picked from commit `a8004af338`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2024-01-15 15:36:50 +01:00
Brian Welty	fef257eb6d	drm/xe: Fix guc_exec_queue_set_priority We need to set q->priority prior to calling guc_exec_queue_add_msg() as that will call init_policies() and sets the scheduling properties to those stored in the exec_queue. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> (cherry picked from commit `b16483f9f8`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2024-01-15 15:36:48 +01:00
Bommu Krishnaiah	e670f0b4ef	drm/xe/uapi: Return correct error code for xe_wait_user_fence_ioctl Currently xe_wait_user_fence_ioctl is not checking exec_queue state and blocking until timeout, with this patch wakeup the blocking wait if exec_queue reset happen and returning proper error code Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com> Cc: Oak Zeng <oak.zeng@intel.com> Cc: Kempczynski Zbigniew <Zbigniew.Kempczynski@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Acked-by: Mateusz Naklicki <mateusz.naklicki@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:46:20 -05:00
Michal Wajdeczko	b67cb798e4	drm/xe/guc: Include only required GuC ABI headers On i915 we were adding new GuC ABI headers directly to guc_fwif.h file since we were replacing old definitions from that file. On xe driver we could do more and better by including ABI headers only in files that need those definitions. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/741 Cc: Jani Nikula <jani.nikula@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20231128203203.1147-3-michal.wajdeczko@intel.com Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:45:08 -05:00
Thomas Hellström	fdb6a05383	drm/xe: Internally change the compute_mode and no_dma_fence mode naming The name "compute_mode" can be confusing since compute uses either this mode or fault_mode to achieve the long-running semantics, and compute_mode can, moving forward, enable fault_mode under the hood to work around hardware limitations. Also the name no_dma_fence_mode really refers to what we elsewhere call long-running mode and the mode contrary to what its name suggests allows dma-fences as in-fences. So in an attempt to be more consistent, rename no_dma_fence_mode -> lr_mode compute_mode -> preempt_fence_mode And adjust flags so that preempt_fence_mode sets XE_VM_FLAG_LR_MODE fault_mode sets XE_VM_FLAG_LR_MODE \| XE_VM_FLAG_FAULT_MODE v2: - Fix a typo in the commit message (Oak Zeng) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Oak Zeng <oak.zeng@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231127123349.23698-1-thomas.hellstrom@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:44:58 -05:00
Michal Wajdeczko	3d78923bd0	drm/xe/guc: Promote guc_to_gt/xe helpers to .h Duplicating these helpers in almost every .c file is a bad idea. Define them as inlines in .h file to allow proper reuse. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:44:32 -05:00
Matthew Brost	a839e365ac	drm/xe: Use pool of ordered wq for GuC submission To appease lockdep, use a pool of ordered wq for GuC submission rather tha leaving the ordered wq allocation to the drm sched. Without this change eventually lockdep runs out of hash entries (MAX_LOCKDEP_CHAINS is exceeded) as each user allocated exec queue adds more hash table entries to lockdep. A pool old of 256 ordered wq should be enough to have similar behavior with and without lockdep enabled. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:43:39 -05:00
Bommithi Sakeena	28b1d9155c	drm/xe: Ensure mutex are destroyed Add missing mutex_destroy calls to fini functions or convert to drmm_mutex_init where fini function is not available. Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Bommithi Sakeena <bommithi.sakeena@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:41:21 -05:00
Daniele Ceraolo Spurio	cb90d46918	drm/xe: Add child contexts to the GuC context lookup The CAT_ERROR message from the GuC provides the guc id of the context that caused the problem, which can be a child context. We therefore need to be able to match that id to the exec_queue that owns it, which we do by adding child context to the context lookup. While at it, fix the error path of the guc id allocation code to correctly free the ids allocated for parallel queues. v2: rebase on s/XE_WARN_ON/xe_assert Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/590 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:41:14 -05:00
Daniele Ceraolo Spurio	c4991ee01d	drm/xe/uc: Rename guc_submission_enabled() to uc_enabled() The guc_submission_enabled() function is being used as a boolean toggle for all firmwares and all related features, not just GuC submission. We could add additional flags/functions to distinguish and allow different use-cases (e.g. loading HuC but not using GuC submission), but given that not using GuC is a debug-only scenario having a global switch for all FWs is enough. However, we want to make it clear that this switch turns off everything, so rename it to uc_enabled(). v2: rebase on s/XE_WARN_ON/xe_assert Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:41:13 -05:00
Francois Dugast	c73acc1eeb	drm/xe: Use Xe assert macros instead of XE_WARN_ON macro The XE_WARN_ON macro maps to WARN_ON which is not justified in many cases where only a simple debug check is needed. Replace the use of the XE_WARN_ON macro with the new xe_assert macros which relies on drm_*. This takes a struct drm_device argument, which is one of the main changes in this commit. The other main change is that the condition is reversed, as with XE_WARN_ON a message is displayed if the condition is true, whereas with xe_assert it is if the condition is false. v2: - Rebase - Keep WARN splats in xe_wopcm.c (Matt Roper) v3: - Rebase Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:41:08 -05:00
Francois Dugast	5c0553cdc8	drm/xe: Replace XE_WARN_ON with drm_warn when just printing a string Use the generic drm_warn instead of the driver-specific XE_WARN_ON in cases where XE_WARN_ON is used to unconditionally print a debug message. v2: Rebase Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:41:07 -05:00
Daniele Ceraolo Spurio	923e423817	drm/xe: split kernel vs permanent engine flags If an engine is only destroyed on driver unload, we can skip its clean-up steps with the GuC because the GuC is going to be tuned off as well, so it doesn't matter if we're in sync with it or not. Currently, we apply this optimization to all engines marked as kernel, but this stops us to supporting kernel engines that don't stick around until unload. To remove this limitation, add a separate flag to indicate if the engine is expected to only be destryed on driver unload and use that to trigger the optimzation. While at it, add a small comment to explain what each engine flag represents. v2: s/XE_BUG_ON/XE_WARN_ON, s/ENGINE/EXEC_QUEUE v3: rebased Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20230822173334.1664332-3-daniele.ceraolospurio@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:40:27 -05:00
Daniele Ceraolo Spurio	1c66c0f391	drm/xe: fix submissions without vm Kernel queues can submit privileged batches directly in GGTT, so they don't always need a vm. The submission front-end already supports creating and submitting jobs without a vm, but some parts of the back-end assume the vm is always there. Fix this by handling a lack of vm in the back-end as well. v2: s/XE_BUG_ON/XE_WARN_ON, s/engine/exec_queue Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20230822173334.1664332-2-daniele.ceraolospurio@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:40:27 -05:00
Daniele Ceraolo Spurio	0b1d1473b3	drm/xe: common function to assign queue name The queue name assignment is identical in both GuC and execlists backends, so we can move it to a common function. This will make adding a new entry in the next patch slightly cleaner. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230817201831.1583172-2-daniele.ceraolospurio@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:40:20 -05:00
Matthew Brost	a20c75dba1	drm/xe: Call __guc_exec_queue_fini_async direct for KERNEL exec_queues Usually we call __guc_exec_queue_fini_async via a worker as the exec_queue fini can be done from within the GPU scheduler which creates a circular dependency without a worker. Kernel exec_queues are fini'd at driver unload (not from within the GPU scheduler) so it is safe to directly call __guc_exec_queue_fini_async. Suggested-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:40:19 -05:00
Matthew Auld	ef6ea97228	drm/xe/guc_submit: fixup deregister in job timeout Rather check if the engine is still registered before proceeding with deregister steps. Also the engine being marked as disabled doesn't mean the engine has been disabled or deregistered from GuC pov, and here we are signalling fences so we need to be sure GuC is not still using this context. v2: - Drop the read_stopped() for this path. Since we are signalling fences on error here, best play it safe and wait for the GT reset to mark the engine as disabled, rather than it just being queued. v3 (Matt Brost): - Keep the read_stopped() on the wait event, since there is no need to wait for an already scheduled GT reset. If it is set we can then just bail without signalling anything. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:40:19 -05:00
Matthew Auld	17d28aa8bd	drm/xe: don't warn for bogus pagefaults This appears to be easily user triggerable so warning is perhaps too much. Rather just make it debug print. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/534 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:40:19 -05:00
Matthew Auld	31b57683de	drm/xe/guc_submit: prevent repeated unregister It seems that various things can trigger the lr cleanup worker, including CAT error, engine reset and destroying the actual engine, so seems plausible to end up triggering the worker more than once in some cases. If that does happen we can race with an ongoing engine deregister before it has completed, thus triggering it again and also changing the state back into pending_disable. Checking if the engine has been marked as destroyed looks like it should prevent this. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:39:29 -05:00
Tejas Upadhyay	eef55700f3	drm/xe: Add sysfs for default engine scheduler properties For each HW engine under GT we are adding defaults sysfs entry to list all engine scheduler properties and its default values. So that it will be easier for user to fetch default values of these properties anytime to go back to default. For example, DUT# cat /sys/class/drm/card1/device/tileN/gtN/engines/bcs/.defaults/ job_timeout_ms preempt_timeout_us timeslice_duration_us where, @job_timeout_ms: The time after which a job is removed from the scheduler. @preempt_timeout_us: How long to wait (in microseconds) for a preemption event to occur when submitting a new context. @timeslice_duration_us: Each context is scheduled for execution for the timeslice duration, before switching to the next context. V12: - Add missing drmm_add_action_or_reset and remove sysfs files V11: - Rebase V10: - Remove xe_gt.h inclusion from .h - Matt V9 : - Remove jiffies for job_timeout_ms - Matt V8 : - replace xe_engine with xe_hw_engine - Matt V7 : - Push all errors to one error path at every places - Niranjana - Describe struct member to resolve kernel doc err - CI hooks V6 : - Use engine class interface instead of hw engine in sysfs for better interfacing readability - Niranjana V5 : - Scheduling props should apply per class engine not per hardware engine - Matt - Do not record value of job_timeout_ms if changed based on dma_fence - Matt V4 : - Resolve merge conflicts - CI V3 : - Rearrange code in its own file - Rebase - Update commit message to reflect tile addition V2 : - Use sysfs_create_files in this patch - Niranjana - Handle prototype error for xe_add_engine_defaults - CI hooks - Remove unused member sysfs_hwe - Niranjana Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:39:21 -05:00
Francois Dugast	9b9529ce37	drm/xe: Rename engine to exec_queue Engine was inappropriately used to refer to execution queues and it also created some confusion with hardware engines. Where it applies the exec_queue variable name is changed to q and comments are also updated. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/162 Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:39:20 -05:00
Francois Dugast	c22a4ed0c3	drm/xe: Rename xe_engine.[ch] to xe_exec_queue.[ch] This is a preparation commit for a larger renaming of engine to exec queue. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:39:17 -05:00
Francois Dugast	99fea68288	drm/xe: Prefer WARN() over BUG() to avoid crashing the kernel Replace calls to XE_BUG_ON() with calls XE_WARN_ON() which in turn calls WARN() instead of BUG(). BUG() crashes the kernel and should only be used when it is absolutely unavoidable in case of catastrophic and unrecoverable failures, which is not the case here. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:39:17 -05:00
Matthew Brost	e3828ebf6c	drm/xe: Add define WQ_HEADER_SIZE Previously used a a magic '+ 3', use define instead. Suggested-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:39:16 -05:00
Francois Dugast	3e8e7ee6a3	drm/xe: Cleanup style warnings Reduce the number of warnings reported by checkpatch.pl from 118 to 48 by addressing those warnings types: LEADING_SPACE LINE_SPACING BRACES TRAILING_SEMICOLON CONSTANT_COMPARISON BLOCK_COMMENT_STYLE RETURN_VOID ONE_SEMICOLON SUSPECT_CODE_INDENT LINE_CONTINUATIONS UNNECESSARY_ELSE UNSPECIFIED_INT UNNECESSARY_INT MISORDERED_TYPE Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:37:31 -05:00
Francois Dugast	763931d25c	drm/xe: Cleanup CODE_INDENT style issues Remove all existing style issues of type CODE_INDENT reported by checkpatch. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:37:30 -05:00
Matthew Brost	8ae8a2e8dd	drm/xe: Long running job update For long running (LR) jobs with the DRM scheduler we must return NULL in run_job which results in signaling the job's finished fence immediately. This prevents LR jobs from creating infinite dma-fences. Signaling job's finished fence immediately breaks flow controlling ring with the DRM scheduler. To work around this, the ring is flow controlled and written in the exec IOCTL. Signaling job's finished fence immediately also breaks the TDR which is used in reset / cleanup entity paths so write a new path for LR entities. v2: Better commit, white space, remove rmb(), better comment next to emit_job() v3 (Thomas): Change LR reference counting, fix working in commit Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:34:44 -05:00
Rodrigo Vivi	a4db555587	drm/xe: Convert Xe HW Engine print to snapshot capture and print. The goal is to allow for a snapshot capture to be taken at the time of the crash, while the print out can happen at a later time through the exposed devcoredump virtual device. v2: Addressing these Matthew comments: - Handle memory allocation failures. - Do not use GFP_ATOMIC on cases like debugfs prints. - placement of @reg doc. - identation issues. v3: checkpatch v4: Rebase and get back to GFP_ATOMIC only. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com>	2023-12-19 18:33:52 -05:00
Rodrigo Vivi	bbdf97c140	drm/xe: Convert GuC Engine print to snapshot capture and print. The goal is to allow for a snapshot capture to be taken at the time of the crash, while the print out can happen at a later time through the exposed devcoredump virtual device. v2: Handle memory allocation failures. (Matthew) Do not use GFP_ATOMIC on cases like debugfs prints. (Matthew) v3: checkpatch v4: pending_list allocation needs to be atomic because of the spin_lock. (Matthew) get back to GFP_ATOMIC only. (lockdep). Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com>	2023-12-19 18:33:52 -05:00
Rodrigo Vivi	1825c492da	drm/xe: Introduce guc_submit_types.h with relevant structs. These structs and definitions are only used for the guc_submit and they were added specifically for the parallel submission. While doing that also delete the unused struct guc_wq_item. v2: checkpatch fixes. Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com>	2023-12-19 18:33:52 -05:00
Rodrigo Vivi	513260dfd1	drm/xe: Convert GuC CT print to snapshot capture and print. The goal is to allow for a snapshot capture to be taken at the time of the crash, while the print out can happen at a later time through the exposed devcoredump virtual device. v2: Handle memory allocation failures. (Matthew) Do not use GFP_ATOMIC on cases like debugfs prints. (Matthew) v3: checkpatch fixes v4: Do not use atomic in the g2h_worker_func (Matthew) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com>	2023-12-19 18:33:52 -05:00
Rodrigo Vivi	e799485044	drm/xe: Introduce the dev_coredump infrastructure. The goal is to use devcoredump infrastructure to report error states captured at the crash time. The error state will contain useful information for GPU hang debug, such as INSTDONE registers and the current buffers getting executed, as well as any other information that helps user space and allow later replays of the error. The proposal here is to avoid a Xe only error_state like i915 and use a standard dev_coredump infrastructure to expose the error state. For our own case, the data is only useful if it is a snapshot of the time when the GPU crash has happened, since we reset the GPU immediately after and the registers might have changed. So the proposal here is to have an internal snapshot to be printed out later. Also, usually a subsequent GPU hang can be only a cause of the initial one. So we only save the 'first' hang. The dev_coredump has a delayed work queue where it remove the coredump and free all the data within a few moments of the error. When that happens we also reset our capture state and allow further snapshots. Right now this infra only print out the time of the hang. More information will be migrated here on subsequent work. Also, in order to organize the dump better, the goal is to propose dev_coredump changes itself to allow multiple files and different controls. But for now we start Xe usage of it without any dependency on dev_coredump core changes. v2: Add dma_fence annotation for capture that might happen during long running. (Thomas and Matt) Use xe->drm.primary->index on drm_info msg. (Jani) v3: checkpatch fixes v4: Fix building and locking issues found by Francois. Actually let's kill all of the locking in here. gt_reset serialization already guarantee that there will be only one capture at the same time. Also, the devcoredump has its own locking to protect the free and reads and drivers don't need to duplicate it. Besides this, the dma_fence locking was pushed to a following patch since it is not needed in this one. Fix a use after free identified by KASAN: Do not stash the faulty_engine since that will be freed somewhere else. v5: Fix Uptime - ktime_get_boottime actually returns the Uptime. (Francois) Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com>	2023-12-19 18:33:51 -05:00
Lucas De Marchi	541623a406	drm/xe: Fix typo persitent->persistent Fix typo as noticed by Matt Roper: git grep -l persitent \| xargs sed -i 's/persitent/persistent/g' ... and then fix coding style issues. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/20230302013411.3262608-2-lucas.demarchi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:29:43 -05:00
Matt Roper	f659ac1564	drm/xe/mocs: LNCF MOCS settings only need to be restored on pre-Xe_HP Reprogramming the LNCF MOCS registers on render domain reset is not intended to be regular driver programming, but rather the implementation of a specific workaround (Wa_1607983814). This workaround no longer applies on Xe_HP any beyond, so we can expect that these registers, like the rest of the LNCF/LBCF registers, will maintain their values through all engine resets. We should only add these registers to the GuC's save/restore list on platforms that need the workaround. Furthermore, xe_mocs_init_engine() appears to be another attempt to satisfy this same workaround. This is unnecessary on the Xe driver since even on platforms where the workaround is necessary, all single-engine resets are initiated by the GuC and thus the GuC will take care of saving/restoring these registers. The only host-initiated resets we have in Xe are full GT resets which will already (re)initialize these registers as part of the regular xe_mocs_init() flow. v2: - Add needs_wa_1607983814() so that calculate_regset_size() doesn't overallocate regset space when the workaround isn't needed. (Lucas) - On platforms affected by Wa_1607983814, only add the LNCF MOCS registers to the render engine's GuC save/restore list; resets of other engines don't need to save/restore these. Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:29:43 -05:00
Lucas De Marchi	0992884d09	drm/xe: Remove dependency on intel_lrc_reg.h Create regs/xe_lrc_layout.h file with all the offsets used by the xe driver. Eventually the xe driver may use a different way to define them since it doesn't supported below gen12. v2: Rename file to intel_lrc_layout.h since it's not really about registers (Matt Roper) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:29:21 -05:00
Lucas De Marchi	ea9f879d03	drm/xe: Sort includes Sort includes and split them in blocks: 1) .h corresponding to the .c. Example: xe_bb.c should have a "#include "xe_bb.h" first. 2) #include <linux/...> 3) #include <drm/...> 4) local includes 5) i915 includes This is accomplished by running `clang-format --style=file -i --sort-includes drivers/gpu/drm/xe/*.[ch]` and ignoring all the changes after the includes. There are also some manual tweaks to split the blocks. v2: Also sort includes in headers Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:29:20 -05:00
Lucas De Marchi	857912c37e	drm/xe: Fix some log messages on 32b Either use the proper format or cast up to 64b depending on the case. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-12 14:06:01 -05:00
Matthew Brost	dd08ebf6c3	drm/xe: Introduce a new DRM driver for Intel GPUs Xe, is a new driver for Intel GPUs that supports both integrated and discrete platforms starting with Tiger Lake (first Intel Xe Architecture). The code is at a stage where it is already functional and has experimental support for multiple platforms starting from Tiger Lake, with initial support implemented in Mesa (for Iris and Anv, our OpenGL and Vulkan drivers), as well as in NEO (for OpenCL and Level0). The new Xe driver leverages a lot from i915. As for display, the intent is to share the display code with the i915 driver so that there is maximum reuse there. But it is not added in this patch. This initial work is a collaboration of many people and unfortunately the big squashed patch won't fully honor the proper credits. But let's get some git quick stats so we can at least try to preserve some of the credits: Co-developed-by: Matthew Brost <matthew.brost@intel.com> Co-developed-by: Matthew Auld <matthew.auld@intel.com> Co-developed-by: Matt Roper <matthew.d.roper@intel.com> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Co-developed-by: Francois Dugast <francois.dugast@intel.com> Co-developed-by: Lucas De Marchi <lucas.demarchi@intel.com> Co-developed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Co-developed-by: Philippe Lecluse <philippe.lecluse@intel.com> Co-developed-by: Nirmoy Das <nirmoy.das@intel.com> Co-developed-by: Jani Nikula <jani.nikula@intel.com> Co-developed-by: José Roberto de Souza <jose.souza@intel.com> Co-developed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Co-developed-by: Dave Airlie <airlied@redhat.com> Co-developed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Co-developed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Co-developed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com>	2023-12-12 14:05:48 -05:00

38 Commits