linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-21 00:04:01 -04:00

Author	SHA1	Message	Date
Daniele Ceraolo Spurio	dba7d17d50	drm/xe/vf: Fix guc_info debugfs for VFs The guc_info debugfs attempts to read a bunch of registers that the VFs doesn't have access to, so fix it by skipping the reads. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4775 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://lore.kernel.org/r/20250423173908.1571412-1-daniele.ceraolospurio@intel.com	2025-04-29 13:20:57 -07:00
Satyanarayana K V P	e9dea328e8	drm/xe: Introduce fault injection for guc mmio send/recv. Fault can be injected with below steps. FAILTYPE=fail_function FAILFUNC=xe_guc_mmio_send_recv echo > /sys/kernel/debug/$FAILTYPE/inject echo $FAILFUNC > /sys/kernel/debug/$FAILTYPE/inject printf %#x -5 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval echo N > /sys/kernel/debug/$FAILTYPE/task-filter echo 10 > /sys/kernel/debug/$FAILTYPE/probability echo 0 > /sys/kernel/debug/$FAILTYPE/interval echo -1 > /sys/kernel/debug/$FAILTYPE/times echo 0 > /sys/kernel/debug/$FAILTYPE/space echo 1 > /sys/kernel/debug/$FAILTYPE/verbose Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michał Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250403120641.7258-2-satyanarayana.k.v.p@intel.com	2025-04-17 22:11:16 +02:00
Matthew Brost	045448da87	drm/xe: Add XE_BO_FLAG_PINNED_NORESTORE Not all BOs need to be restored on resume / d3cold exit, add XE_BO_FLAG_PINNED_NO_RESTORE which skips restoring of BOs rather just allocates VRAM for the BO. This should slightly speedup resume / d3cold exit flows. Marking GuC ADS, GuC CT, GuC log, GuC PC, and SA as NORESTORE. v2: - s/WONTNEED/NORESTORE (Vivi) - Rebase on newly added g2g and backup object flow Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250403102440.266113-11-matthew.auld@intel.com	2025-04-04 11:41:01 +01:00
Rodrigo Vivi	fc858ddf9c	drm/xe/guc_pc: Remove duplicated pc_start call xe_guc_pc_start() was getting called from both xe_uc_init_hw() and from xe_guc_start(). But both are called from do_gt_restart() and only xe_uc_init_hw() is called at initialization. So, let's remove the duplication in the regular gt_restart path. The only place where xe_guc_pc_start() won't get called now is on the gt_reset failure path. However, if gt_reset has failed, it is really unlikely that the PC start will work or is desired. Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306220643.1014049-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-03-07 11:10:05 -05:00
Riana Tauro	6978c5f5a6	drm/xe/xe_pmu: Add PMU support for engine activity PMU provides two counters (engine-active-ticks, engine-total-ticks) to calculate engine activity. When querying engine activity, user must group these 2 counters using the perf_event group mechanism to ensure both counters are sampled together. To list the events ./perf list xe_0000_03_00.0/engine-active-ticks/ [Kernel PMU event] xe_0000_03_00.0/engine-total-ticks/ [Kernel PMU event] The formats to be used with the above are engine_instance - config:12-19 engine_class - config:20-27 gt - config:60-63 The events can then be read using perf tool ./perf stat -e xe_0000_03_00.0/engine-active-ticks,gt=0, engine_class=0,engine_instance=0/, xe_0000_03_00.0/engine-total-ticks,gt=0, engine_class=0,engine_instance=0/ -I 1000 Engine activity can then be calculated as below engine activity % = (engine active ticks/engine total ticks) * 100 v2: validate gt rename total-ticks to engine-total-ticks add helper to get hwe (Umesh) v3: fix checkpatch warning add details to documentation (Umesh) remove ascii formats from documentation (Lucas) v4: remove unnecessary warn within raw_spinlock (Lucas) Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250224053903.2253539-5-riana.tauro@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-24 13:23:57 -08:00
Michal Wajdeczko	696bfdf273	drm/xe/guc: Introduce the GuC Buffer Cache The purpose of the GuC Buffer Cache is to maintain a set ofreusable buffers that could be used while sending some of the CTB H2G actions that require separate buffer with indirect data. Currently only few PF actions need this so initialize it only when running as a PF. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241220194205.995-9-michal.wajdeczko@intel.com	2025-01-19 00:12:03 +01:00
Daniele Ceraolo Spurio	d9a1ae0d17	drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms The DUAL_QUEUE_WA tells the GuC to not allow concurrent submissions on RCS and CCSes with different address spaces, which on DG2 is required as a WA for an HW bug. On newer platforms, this block has been moved in HW at the CS level, by stalling the RCS/CCS context switch when one of the other RCS/CCSes is busy with a different address space. While functionally correct, having a submission stalled on the HW limits the GuC ability to shuffle things around and can cause complications if the non-stalled submission runs for a long time, because the GuC doesn't know that the stalled submission isn't actually running and might declare it as hung. Therefore, we enable the DUAL_QUEUE_WA on all newer platforms to move management back to the GuC. Note that the GuC specs also recommend enabling this for all platforms starting from MTL that have a CCS. v2: only apply the WA on GTs that have CCS engines v3: split comment (Jonathan) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Jesus Narvaez <jesus.narvaez@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241213181012.2178794-1-daniele.ceraolospurio@intel.com	2024-12-16 13:24:27 -08:00
John Harrison	a9f7b97dda	drm/xe/guc: Add support for G2G communications Some features require inter-GuC communication channels on multi-tile devices. So allocate and enable such. v2: Correct use of xe_bo_get/put (review feedback from Matthew Brost) Add extra assert, re-order a calculation for better clarity and add comments to slot calculation (review feedback from Daniele). Also slightly re-work the slot calc to avoid code duplication. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241120000222.204095-3-John.C.Harrison@Intel.com	2024-11-22 19:10:56 -08:00
Tomasz Lis	f7278da76d	drm/xe/guc: Do not assert CTB state while sending MMIO During VF post-migration recovery, MMIO communication channel to GuC is used, despite CTB channel being enabled. This behavior is rooted in the save-restore architecture specification. Therefore, a VF driver cannot assert that CTB is disabled while sending MMIO messages to GuC. Such assertion needs to be PF only, or be removed. This patch simply removes the assertion. Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241107151357.1623733-2-tomasz.lis@intel.com	2024-11-09 15:11:13 +01:00
Tomasz Lis	6e6d7b41f9	drm/xe/vf: React to MIGRATED interrupt To properly support VF Save/Restore procedure, fixups need to be applied after PF driver finishes its part of VF Restore. The fixups are required to adjust the ongoing execution for a hardware switch that happened, because some GFX resources are not fully virtualized, and assigned to a VF as range from a global pool. The VF on which a VM is restored will often have different ranges provisioned than the VF on which save process happened. Those resource fixups are applied by the VF driver within a restored VM. A VF driver gets informed that it was migrated by receiving an interrupt from each GuC. The interrupt assigned for that purpose is "GUC SW interrupt 0". Seeing that fields set from within the irq handler should be the trigger for fixups. The VF can safely do post-migration fixups on resources associated to each GuC only after that GuC issued the MIGRATED interrupt. This change introduces a worker to be used for post-migration fixups, and a mechanism to schedule said worker when all GuCs sent the irq. v2: renamed and moved functions, updated logged messages, removed unused includes, used anon struct (Michal) v3: ordering, kerneldoc, asserts, debug messages, on_all_tiles -> on_all_gts (Michal) v4: fixed missing header include v5: Explained what fixups are, explained which IRQ is used, style fixes (Michal) Bspec: 50868 Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241104213449.1455694-2-tomasz.lis@intel.com Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>	2024-11-06 14:53:35 +01:00
John Harrison	db38fdb7bf	drm/xe/guc: Separate full CTB content from guc_info debugfs The guc_info debugfs file is meant to be a quick view of the current software state of the GuC interface. Including the full CTB contents makes the file as a whole much less human readable and is not partiular useful in the general case. So don't pollute the info dump with the full buffers. Instead, move those into a separate debugfs entry that can be read when that information is actually required. Also, improve the human readability by adding a few extra blank lines to delimt the sections. v2: Hide the internal capture/print params from external callers that don't need to know (review feedback from Matthew Brost). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241024002554.1983101-3-John.C.Harrison@Intel.com	2024-10-29 13:11:34 -07:00
Fei Yang	6ef3bb6055	drm/xe: enable lite restore The lite restore is a performance improvement feature which avoids unnecessary context switch (flush, save and restore) if the incoming context has a ContextID matching that of the outgoing context. The scheduling is done by the GuC firmware, so on the driver side it's just a matter of setting corresponding GUC_CTL_FEATURE flag. This is supposed to be enabled by default, thus the flag is set unconditionally. Signed-off-by: Fei Yang <fei.yang@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241017162710.942553-2-fei.yang@intel.com	2024-10-21 10:34:45 -07:00
Himal Prasad Ghimiray	31a5dce0a3	drm/xe/guc: Update handling of xe_force_wake_get return xe_force_wake_get() now returns the reference count-incremented domain mask. If it fails for individual domains, the return value will always be 0. However, for XE_FORCEWAKE_ALL, it may return a non-zero value even in the event of failure. Use helper xe_force_wake_ref_has_domain to verify all domains are initialized or not. Update the return handling of xe_force_wake_get() to reflect this behavior, and ensure that the return value is passed as input to xe_force_wake_put(). v3 - return xe_wakeref_t instead of int in xe_force_wake_get() - xe_force_wake_put() error doesn't need to be checked. It internally WARNS on domain ack failure. v5 - return unsigned int from xe_force_wake_get() - Remove redundant xe_gt_WARN_ON v6 - use helper xe_force_wake_ref_has_domain() v7 - Fix commit message v9 - Rebase Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241014075601.2324382-17-himal.prasad.ghimiray@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-10-17 10:17:08 -04:00
Zhanjun Dong	9c8c7a7e6f	drm/xe/guc: Prepare GuC register list and update ADS size for error capture Add referenced registers defines and list of registers. Update GuC ADS size allocation to include space for the lists of error state capture register descriptors. Then, populate GuC ADS with the lists of registers we want GuC to report back to host on engine reset events. This list should include global, engine-class and engine-instance registers for every engine-class type on the current hardware. Ensure we allocate a persistent storage for the register lists that are populated into ADS so that we don't need to allocate memory during GT resets when GuC is reloaded and ADS population happens again. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-2-zhanjun.dong@intel.com	2024-10-08 09:34:04 -07:00
John Harrison	d2c5a5a926	drm/xe/guc: Dead CT helper Add a worker function helper for asynchronously dumping state when an internal/fatal error is detected in CT processing. Being asynchronous is required to avoid deadlocks and scheduling-while-atomic or process-stalled-for-too-long issues. Also check for a bunch more error conditions and improve the handling of some existing checks. v2: Use compile time CONFIG check for new (but not directly CT_DEAD related) checks and use unsigned int for a bitmask, rename CT_DEAD_RESET to CT_DEAD_REARM and add some explaining comments, rename 'hxg' macro parameter to 'ctb' - review feedback from Michal W. Drop CT_DEAD_ALIVE as no need for a bitfield define to just set the entire mask to zero. v3: Fix kerneldoc v4: Nullify some floating pointers after free. v5: Add section headings and device info to make the state dump look more like a devcoredump to allow parsing by the same tools (eventual aim is to just call the devcoredump code itself, but that currently requires an xe_sched_job, which is not available in the CT code). v6: Fix potential for leaking snapshots with concurrent error conditions (review feedback from Julia F). v7: Don't complain about unexpected G2H messages yet because there is a known issue causing them. Fix bit shift bug with v6 change. Add GT id to fake coredump headers and use puts instead of printf. v8: Disable the head mis-match check in g2h_read because it is failing on various discrete platforms due to unknown reasons. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-9-John.C.Harrison@Intel.com	2024-10-07 18:34:59 -07:00
Matt Roper	ee615c2bac	drm/xe: Move IRQ-related registers to dedicated header IRQ registers have a well-defined scope and make sense to collect in a dedicated header file. This also reduces confusion about the GT IRQ registers --- even though those registers relate to the GTs, they actually live outside the GT (in the sgunit) and thus do not need to worry about GT-specific register concepts like forcewake, steering, etc. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240923214514.2031410-2-matthew.d.roper@intel.com	2024-09-26 10:27:07 -07:00
Ilia Levi	4157849ca3	drm/xe: move memirq out of VF Up until now only VF used Memory Based Interrupts (memirq). Moving it out of VF to cater for other usages, specifically MSI-X. Signed-off-by: Ilia Levi <ilia.levi@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918053942.1331811-4-illevi@habana.ai	2024-09-19 10:12:47 +02:00
Matt Roper	c18d4193b5	drm/xe/guc: Convert register access to use xe_mmio Stop using GT pointers for register access. v2: - Don't drop the _Generic wrapper macro for xe_mmio_wait32_not() yet. Defer that to the final patch of the series instead. (Rodrigo) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240910234719.3335472-66-matthew.d.roper@intel.com	2024-09-11 15:32:50 -07:00
Jani Nikula	5ea28f921a	drm/xe: use IS_ENABLED() instead of defined() on config options Prefer IS_ENABLED() instead of defined() for checking whether a kconfig option is enabled. Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240904145231.3902289-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2024-09-06 16:32:49 +03:00
Nitin Gote	cd89de14bb	drm/xe: Replace double space with single space after comma Avoid using double space, ", " in function or macro parameters where it's not required by any alignment purpose. Replace it with a single space, ", ". Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240823080643.2461992-1-nitin.r.gote@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-09-05 18:20:00 +02:00
Matthew Brost	b5de6a5ced	drm/xe: Set firmware state to loadable before registering guc_fini_hw The guc_fini_hw registered calls __xe_uc_fw_status which is only expected to be called after initializing fw state. Move this before registering guc_fini_hw. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240820172958.1095143-3-matthew.brost@intel.com	2024-08-23 09:54:11 -07:00
Matthew Brost	7dbe8af13c	drm/xe: Wedge the entire device Wedge the entire device, not just GT which may have triggered the wedge. To implement this, cleanup the layering so xe_device_declare_wedged() calls into the lower layers (GT) to ensure entire device is wedged. While we are here, also signal any pending GT TLB invalidations upon wedging device. Lastly, short circuit reset wait if device is wedged. v2: - Short circuit reset wait if device is wedged (Local testing) Fixes: `8ed9aaae39` ("drm/xe: Force wedged state and block GT reset upon any GPU hang") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240716063902.1390130-1-matthew.brost@intel.com	2024-07-17 11:58:26 -07:00
Michal Wajdeczko	20baedb803	drm/xe/vf: Skip attempt to start GuC PC if VF We have already marked the GuC PC feature as not applicable for VF devices, but we missed the fact that there may be still some privileged activities performed by this component, who does much more than its name suggests. Explicitly skip xe_guc_pc_start() if running as a VF driver and use a GT oriented message to report any error. v2: also skip xe_guc_pc_stop (Vinay) Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240622094253.1081-1-michal.wajdeczko@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-06-26 18:24:51 -04:00
Vinay Belgaumkar	9d2ab8623e	drm/xe/guc: Request max GT freq during resume We already request max freq in the load path, moving it to __xe_guc_upload will ensure this speeds up GuC load in the resume path as well. v2: Rename xe_guc_pc_init_early since we now call it per GuC load (Michal W) v3: Keep pc_init_early() and init RPx values there (Rodrigo) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240620224928.3986377-3-vinay.belgaumkar@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-06-26 18:23:45 -04:00
Michal Wajdeczko	7a893345a4	drm/xe/guc: Move ARAT interrupts enabling to the upload step Even though ARAT interrupts are enabled by default, we still want to keep the code that enables them. But instead doing that in the CTB enabling step, move this code to the upload step, where we already setup few other registers related to GuC. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240619163413.817-1-michal.wajdeczko@intel.com	2024-06-20 11:03:03 +02:00
Michal Wajdeczko	f0ccd2d805	drm/xe/vf: Don't touch GuC irq registers if using memory irqs On platforms where VFs are using memory based interrupts, we missed invalid access to no longer existing interrupt registers, as we keep them marked with XE_REG_OPTION_VF. To fix that just either setup memirq vectors in GuC or enable legacy interrupts. Fixes: `aef4eb7c7d` ("drm/xe/vf: Setup memory based interrupts in GuC") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240617154736.685-1-michal.wajdeczko@intel.com	2024-06-18 12:33:06 +02:00
Michal Wajdeczko	0d2ca8fd28	drm/xe/uc: Fix and start using xe_uc_fw_sanitize() Helper xe_uc_fw_sanitize() was defined but never used. First fix it by properly exiting also from the LOAD_FAIL state, then use it in GuC and HuC sanitize code. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240613153424.2120-1-michal.wajdeczko@intel.com	2024-06-14 23:33:22 +02:00
Michal Wajdeczko	5bfae679d3	drm/xe/vf: Custom GuC reset The VF drivers can't trigger real GuC firmware reset using GDRST register, but for the VF drivers it is sufficient to send VF_RESET message to reset any VF specific state maintained by the GuC. Use our existing VF bootstrap function as VF_RESET is part of it. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240604212231.1416-4-michal.wajdeczko@intel.com	2024-06-06 16:02:05 +02:00
John Harrison	fa171d49e4	drm/xe/guc: Fix uninitialised count in GuC load debug prints The debug prints about how long the GuC load takes have a loop counter. However that was neither initialised nor incremented! Plus, counting loops is no longer meaningful given the wait function returns early for any change in the status value. So fix it to only count loops due to actual timeouts. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405250151.IbH0l8FG-lkp@intel.com/ Fixes: `b0ac1b42db` ("drm/xe/guc: Port over the slow GuC loading support from i915") Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Oded Gabbay <ogabbay@kernel.org> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Fei Yang <fei.yang@intel.com> Cc: intel-xe@lists.freedesktop.org Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524202603.4011656-1-John.C.Harrison@Intel.com	2024-05-28 14:53:09 -07:00
John Harrison	b0ac1b42db	drm/xe/guc: Port over the slow GuC loading support from i915 GuC loading can take longer than it is supposed to for various reasons. So add in the code to cope with that and to report it when it happens. There are also many different reasons why GuC loading can fail, so add in the code for checking for those and for reporting issues in a meaningful manner rather than just hitting a timeout and saying 'fail: status = %x'. Also, remove the 'FIXME' comment about an i915 bug that has never been applicable to Xe! v2: Actually report the requested and granted frequencies rather than showing granted twice (review feedback from Badal). v3: Locally code all the timeout and end condition handling because a helper function is not allowed (review feedback from Lucas/Rodrigo). v4: Add more documentation comments and rename a define to add units (review feedback from Lucas). v5: Fix copy/paste error in xe_mmio_wait32_not (review feedback from Lucas) and rebase (no more return value from guc_wait_ucode). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240518043700.3264362-3-John.C.Harrison@Intel.com	2024-05-23 10:55:31 -07:00
Rodrigo Vivi	8d490e019b	drm/xe: Stop checking for power_lost on D3Cold GuC reset status is not reliable for this purpose and it is once in a while ending up in a situation of D3Cold, where power_reset is false and without the proper memory restoration the GuC reload and Display will fail to come back from D3Cold. So, let's do a full restoration of everything if we have a risk of losing power, without further optimizations. v2: also remove the gut_in_reset function (Anshuman) Cc: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-6-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-05-23 11:54:07 -04:00
Matthew Auld	19fa7aa4d2	drm/xe/guc: s/guc_fini/guc_fini_hw/ Make it clear that is about cleaning up the HW/FW side, and not software state. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-23-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Matthew Auld	241f5d25ff	drm/xe/guc: move guc_fini over to devm Make sure to actually call this when the device is removed. Currently we only trigger it when the driver instance goes away, but that doesn't work too well with hotunplug, since device can be removed and re-probed with a new driver instance, where the guc_fini() is called too late. Move the fini over to devm to ensure this is called when device is removed. References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1717 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-22-matthew.auld@intel.com	2024-05-22 13:22:39 +01:00
Michal Wajdeczko	d8a417c4bd	drm/xe/vf: Custom GuC initialization if VF The GuC firmware is loaded and initialized by the PF driver. Make sure VF drivers only perform permitted operations. For submission initialization, use number of GuC context IDs from self config. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521092518.624-3-michal.wajdeczko@intel.com	2024-05-22 12:53:45 +02:00
Michal Wajdeczko	7065b19bd5	drm/xe/guc: Allow to initialize submission with limited set of IDs While PF and native drivers may initialize submission code to use all available GuC contexts IDs, the VF driver may only use limited number of IDs. Update init function to accept number of context IDs available for use. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521092518.624-2-michal.wajdeczko@intel.com	2024-05-22 12:53:43 +02:00
Michal Wajdeczko	25275c8a4f	drm/xe/vf: Custom hardware config load step if VF The VF drivers may immediately communicate with the GuC to obtain the hardware config since the firmware shall already be running. With the GuC communication established, VFs can also obtain the values of the runtime registers (fuses) from the PF driver. Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240516110546.2216-6-michal.wajdeczko@intel.com	2024-05-16 20:18:38 +02:00
Michal Wajdeczko	4071e0872f	drm/xe/uc: Move GuC submission init to post hwconfig step We shouldn't need anything from the GuC submission code until we finish GuC initialization in post hwconfig step. While around add diagnostic message if we fail uC init. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240510203810.1952-3-michal.wajdeczko@intel.com	2024-05-14 23:19:27 +02:00
Himal Prasad Ghimiray	c832541ca8	drm/xe: Change xe_guc_submit_stop return to void The function xe_guc_submit_stop consistently returns 0 without an error state, prompting the caller to verify it, which is redundant. Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240424041911.2184868-1-himal.prasad.ghimiray@intel.com	2024-04-25 20:38:49 -07:00
Rodrigo Vivi	692818678e	drm/xe: declare wedged upon GuC load failure Let's block the device upon any GuC load failure. But let's continue with the probe so guc logs can be read from the debugfs. v2: - s/wedged/busted - do not block probe or we lose guc_logs in debugfs (Matt) v3: - s/busted/wedged v4: Do not change __xe_guc_upload return. (Himal) Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-2-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-24 12:12:58 -04:00
Michal Wajdeczko	f2b81483d3	drm/xe/vf: Don't try to read legacy GuC MMIO notification if VF Legacy SOFT_SCRATCH registers are not accessible from the VF. Any G2H notification posted there will be handled by the PF driver. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240405133936.891-4-michal.wajdeczko@intel.com	2024-04-08 14:33:15 +02:00
Michal Wajdeczko	f73155654d	drm/xe/guc: Reuse code while debugging GuC params There is no need to duplicate code to print GuC parameters. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240404155046.627-2-michal.wajdeczko@intel.com	2024-04-05 12:15:52 +02:00
Michal Wajdeczko	12f95f9900	drm/xe/guc: Prefer GT oriented logs for GuC messages A platform can have more than one GuC, so we should use GT-oriented logs to correctly identify the source of the message. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240404155046.627-1-michal.wajdeczko@intel.com	2024-04-05 12:15:50 +02:00
Michal Wajdeczko	7da3f561cb	drm/xe: Move HW GGTT definitions to dedicated file It's better to keep all hardware GGTT definitions separated from the driver code. It also helps to avoid duplicated definitions. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240326131042.319-1-michal.wajdeczko@intel.com	2024-03-27 11:31:11 +01:00
Daniele Ceraolo Spurio	649a125a88	drm/xe: Always check force_wake_get return code A force_wake_get failure means that the HW might not be awake for the access we're doing; this can lead to an immediate error or it can be a more subtle problem (e.g. a register read might return an incorrect value that is still valid, leading the driver to make a wrong choice instead of flagging an error). We avoid an error from the force_wake function because callers might handle or tolerate the error, but this only works if all callers are checking the error code. The majority already do, but a few are not. These are mainly falling into 3 categories, which are each handled differently: 1) error capture: in this case we want to continue the capture, but we log an info message in dmesg to notify the user that the capture might have incorrect data. 2) ioctl: in this case we return a -EIO error to userspace 3) unabortable actions: these are scenarios where we can't simply abort and retry and so it's better to just try it anyway because there is a chance the HW is awake even with the failure. In this case we throw a warning so we know there was a forcewake problem if something fails down the line. v2: use gt_WARN_ON where appropriate Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240318154924.3453513-1-daniele.ceraolospurio@intel.com	2024-03-20 14:13:58 -07:00
Daniele Ceraolo Spurio	43c4ff3ca2	drm/xe/guc: Don't support older GuC 70.x releases Supporting older GuC versions comes with baggage, both on the coding side (due to interfaces only being available from a certain version onwards) and on the testing side (due to having to make sure the driver works as expected with older GuCs). Since all of our Xe platform are still under force probe, we haven't committed to support any specific GuC version and we therefore don't need to support the older once, which means that we can force a bottom limit to what GuC we accept. This allows us to remove any conditional statements based on older GuC versions and also to approach newer additions knowing that we'll never attempt to load something older than our minimum requirement. As an initial value, the minimum expected version is set to 70.19, which is the version currently in the firmware table, but the expectation is that this will be bumbed every time we update the table, until we remove the force probe. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240304162616.824884-1-daniele.ceraolospurio@intel.com	2024-03-19 12:14:02 -07:00
Lucas De Marchi	e89f4967d9	drm/xe: Drop WA 16015675438 With dynamic load-balancing disabled on the compute side, there's no reason left to enable WA 16015675438. Drop it from both PVC and DG2. Note that this can be done because now the driver always set a fixed partition of EUs during initialization via the ccs_mode configuration. Cc: Mateusz Jablonski <mateusz.jablonski@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Acked-by: Mateusz Jablonski <mateusz.jablonski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240304233103.1687412-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-03-06 05:27:08 -08:00
Dafna Hirschfeld	a24d909977	drm/xe: Do not include current dir for generated/xe_wa_oob.h The generated file 'generated/xe_wa_oob.h' is included using: "generated/xe_wa_oob.h" which first look inside the source code. But the file resides in the build directory and should therefore be included using: <generated/xe_wa_oob.h> Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240221083622.1584492-1-dhirschfeld@habana.ai	2024-02-21 21:53:15 -08:00
Michał Winiarski	8a4587ef9f	drm/xe/guc: Move GuC power control init to "post-hwconfig" SLPC is not used at "hwconfig" stage. Move the initialization of data structures used for SLPC to a later point in probe. Also - move the xe_guc_pc_init_early to happen just prior to initial "hwconfig" load. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240219130530.1406044-3-michal.winiarski@intel.com	2024-02-20 14:13:46 -05:00
Michał Winiarski	a44bbace73	drm/xe/guc: Allocate GuC data structures in system memory for initial load GuC load will need to happen at an earlier point in probe, where local memory is not yet available. Use system memory for GuC data structures used for initial "hwconfig" load, and realloc at a later, "post-hwconfig" load if needed, when local memory is available. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240219130530.1406044-1-michal.winiarski@intel.com	2024-02-20 14:13:42 -05:00
Karthik Poosa	cd43106c9b	drm/xe/guc: Reduce a print from warn to debug Reduce debug print from warn to debug to avoid unnecessary warning message in dmesg: the firmware loading logic already has the right printk priority level when checking the firmware version. Fixes: `c5a06c9169` ("drm/xe/guc: Enable WA 14018913170") Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240125165652.3764711-1-karthik.poosa@intel.com [ slightly reword debug and commit messages ] Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-01-30 11:37:38 -08:00

1 2 3

116 Commits