Commit Graph

96 Commits

Author SHA1 Message Date
Matthew Brost
b5de6a5ced drm/xe: Set firmware state to loadable before registering guc_fini_hw
The guc_fini_hw registered calls __xe_uc_fw_status which is only
expected to be called after initializing fw state. Move this before
registering guc_fini_hw.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240820172958.1095143-3-matthew.brost@intel.com
2024-08-23 09:54:11 -07:00
Matthew Brost
7dbe8af13c drm/xe: Wedge the entire device
Wedge the entire device, not just GT which may have triggered the wedge.
To implement this, cleanup the layering so xe_device_declare_wedged()
calls into the lower layers (GT) to ensure entire device is wedged.

While we are here, also signal any pending GT TLB invalidations upon
wedging device.

Lastly, short circuit reset wait if device is wedged.

v2:
 - Short circuit reset wait if device is wedged (Local testing)

Fixes: 8ed9aaae39 ("drm/xe: Force wedged state and block GT reset upon any GPU hang")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240716063902.1390130-1-matthew.brost@intel.com
2024-07-17 11:58:26 -07:00
Michal Wajdeczko
20baedb803 drm/xe/vf: Skip attempt to start GuC PC if VF
We have already marked the GuC PC feature as not applicable for
VF devices, but we missed the fact that there may be still some
privileged activities performed by this component, who does much
more than its name suggests.

Explicitly skip xe_guc_pc_start() if running as a VF driver and
use a GT oriented message to report any error.

v2: also skip xe_guc_pc_stop (Vinay)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240622094253.1081-1-michal.wajdeczko@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-06-26 18:24:51 -04:00
Vinay Belgaumkar
9d2ab8623e drm/xe/guc: Request max GT freq during resume
We already request max freq in the load path, moving it
to __xe_guc_upload will ensure this speeds up GuC load in
the resume path as well.

v2: Rename xe_guc_pc_init_early since we now call it per
GuC load (Michal W)

v3: Keep pc_init_early() and init RPx values there (Rodrigo)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240620224928.3986377-3-vinay.belgaumkar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-06-26 18:23:45 -04:00
Michal Wajdeczko
7a893345a4 drm/xe/guc: Move ARAT interrupts enabling to the upload step
Even though ARAT interrupts are enabled by default, we still want
to keep the code that enables them. But instead doing that in the
CTB enabling step, move this code to the upload step, where we
already setup few other registers related to GuC.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240619163413.817-1-michal.wajdeczko@intel.com
2024-06-20 11:03:03 +02:00
Michal Wajdeczko
f0ccd2d805 drm/xe/vf: Don't touch GuC irq registers if using memory irqs
On platforms where VFs are using memory based interrupts, we
missed invalid access to no longer existing interrupt registers,
as we keep them marked with XE_REG_OPTION_VF. To fix that just
either setup memirq vectors in GuC or enable legacy interrupts.

Fixes: aef4eb7c7d ("drm/xe/vf: Setup memory based interrupts in GuC")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240617154736.685-1-michal.wajdeczko@intel.com
2024-06-18 12:33:06 +02:00
Michal Wajdeczko
0d2ca8fd28 drm/xe/uc: Fix and start using xe_uc_fw_sanitize()
Helper xe_uc_fw_sanitize() was defined but never used. First fix
it by properly exiting also from the LOAD_FAIL state, then use it
in GuC and HuC sanitize code.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240613153424.2120-1-michal.wajdeczko@intel.com
2024-06-14 23:33:22 +02:00
Michal Wajdeczko
5bfae679d3 drm/xe/vf: Custom GuC reset
The VF drivers can't trigger real GuC firmware reset using GDRST
register, but for the VF drivers it is sufficient to send VF_RESET
message to reset any VF specific state maintained by the GuC.
Use our existing VF bootstrap function as VF_RESET is part of it.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240604212231.1416-4-michal.wajdeczko@intel.com
2024-06-06 16:02:05 +02:00
John Harrison
fa171d49e4 drm/xe/guc: Fix uninitialised count in GuC load debug prints
The debug prints about how long the GuC load takes have a loop
counter. However that was neither initialised nor incremented! Plus,
counting loops is no longer meaningful given the wait function returns
early for any change in the status value. So fix it to only count
loops due to actual timeouts.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405250151.IbH0l8FG-lkp@intel.com/
Fixes: b0ac1b42db ("drm/xe/guc: Port over the slow GuC loading support from i915")
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Oded Gabbay <ogabbay@kernel.org>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: intel-xe@lists.freedesktop.org
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240524202603.4011656-1-John.C.Harrison@Intel.com
2024-05-28 14:53:09 -07:00
John Harrison
b0ac1b42db drm/xe/guc: Port over the slow GuC loading support from i915
GuC loading can take longer than it is supposed to for various
reasons. So add in the code to cope with that and to report it when it
happens. There are also many different reasons why GuC loading can
fail, so add in the code for checking for those and for reporting
issues in a meaningful manner rather than just hitting a timeout and
saying 'fail: status = %x'.

Also, remove the 'FIXME' comment about an i915 bug that has never been
applicable to Xe!

v2: Actually report the requested and granted frequencies rather than
showing granted twice (review feedback from Badal).
v3: Locally code all the timeout and end condition handling because a
helper function is not allowed (review feedback from Lucas/Rodrigo).
v4: Add more documentation comments and rename a define to add units
(review feedback from Lucas).
v5: Fix copy/paste error in xe_mmio_wait32_not (review feedback from
Lucas) and rebase (no more return value from guc_wait_ucode).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240518043700.3264362-3-John.C.Harrison@Intel.com
2024-05-23 10:55:31 -07:00
Rodrigo Vivi
8d490e019b drm/xe: Stop checking for power_lost on D3Cold
GuC reset status is not reliable for this purpose and it is
once in a while ending up in a situation of D3Cold, where
power_reset is false and without the proper memory restoration
the GuC reload and Display will fail to come back from D3Cold.

So, let's do a full restoration of everything if we have a risk
of losing power, without further optimizations.

v2: also remove the gut_in_reset function (Anshuman)

Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-6-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23 11:54:07 -04:00
Matthew Auld
19fa7aa4d2 drm/xe/guc: s/guc_fini/guc_fini_hw/
Make it clear that is about cleaning up the HW/FW side, and not software
state.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-23-matthew.auld@intel.com
2024-05-22 13:22:39 +01:00
Matthew Auld
241f5d25ff drm/xe/guc: move guc_fini over to devm
Make sure to actually call this when the device is removed. Currently we
only trigger it when the driver instance goes away, but that doesn't
work too well with hotunplug, since device can be removed and re-probed
with a new driver instance, where the guc_fini() is called too late.
Move the fini over to devm to ensure this is called when device is
removed.

References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1717
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-22-matthew.auld@intel.com
2024-05-22 13:22:39 +01:00
Michal Wajdeczko
d8a417c4bd drm/xe/vf: Custom GuC initialization if VF
The GuC firmware is loaded and initialized by the PF driver. Make
sure VF drivers only perform permitted operations. For submission
initialization, use number of GuC context IDs from self config.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521092518.624-3-michal.wajdeczko@intel.com
2024-05-22 12:53:45 +02:00
Michal Wajdeczko
7065b19bd5 drm/xe/guc: Allow to initialize submission with limited set of IDs
While PF and native drivers may initialize submission code to use
all available GuC contexts IDs, the VF driver may only use limited
number of IDs. Update init function to accept number of context
IDs available for use.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521092518.624-2-michal.wajdeczko@intel.com
2024-05-22 12:53:43 +02:00
Michal Wajdeczko
25275c8a4f drm/xe/vf: Custom hardware config load step if VF
The VF drivers may immediately communicate with the GuC to obtain
the hardware config since the firmware shall already be running.

With the GuC communication established, VFs can also obtain the
values of the runtime registers (fuses) from the PF driver.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240516110546.2216-6-michal.wajdeczko@intel.com
2024-05-16 20:18:38 +02:00
Michal Wajdeczko
4071e0872f drm/xe/uc: Move GuC submission init to post hwconfig step
We shouldn't need anything from the GuC submission code until we
finish GuC initialization in post hwconfig step.

While around add diagnostic message if we fail uC init.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240510203810.1952-3-michal.wajdeczko@intel.com
2024-05-14 23:19:27 +02:00
Himal Prasad Ghimiray
c832541ca8 drm/xe: Change xe_guc_submit_stop return to void
The function xe_guc_submit_stop consistently returns 0 without an error
state, prompting the caller to verify it, which is redundant.

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424041911.2184868-1-himal.prasad.ghimiray@intel.com
2024-04-25 20:38:49 -07:00
Rodrigo Vivi
692818678e drm/xe: declare wedged upon GuC load failure
Let's block the device upon any GuC load failure.
But let's continue with the probe so guc logs can be read
from the debugfs.

v2: - s/wedged/busted
    - do not block probe or we lose guc_logs in debugfs (Matt)

v3: - s/busted/wedged

v4: Do not change __xe_guc_upload return. (Himal)

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-2-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-04-24 12:12:58 -04:00
Michal Wajdeczko
f2b81483d3 drm/xe/vf: Don't try to read legacy GuC MMIO notification if VF
Legacy SOFT_SCRATCH registers are not accessible from the VF. Any
G2H notification posted there will be handled by the PF driver.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240405133936.891-4-michal.wajdeczko@intel.com
2024-04-08 14:33:15 +02:00
Michal Wajdeczko
f73155654d drm/xe/guc: Reuse code while debugging GuC params
There is no need to duplicate code to print GuC parameters.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240404155046.627-2-michal.wajdeczko@intel.com
2024-04-05 12:15:52 +02:00
Michal Wajdeczko
12f95f9900 drm/xe/guc: Prefer GT oriented logs for GuC messages
A platform can have more than one GuC, so we should use GT-oriented
logs to correctly identify the source of the message.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240404155046.627-1-michal.wajdeczko@intel.com
2024-04-05 12:15:50 +02:00
Michal Wajdeczko
7da3f561cb drm/xe: Move HW GGTT definitions to dedicated file
It's better to keep all hardware GGTT definitions separated from
the driver code. It also helps to avoid duplicated definitions.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240326131042.319-1-michal.wajdeczko@intel.com
2024-03-27 11:31:11 +01:00
Daniele Ceraolo Spurio
649a125a88 drm/xe: Always check force_wake_get return code
A force_wake_get failure means that the HW might not be awake for the
access we're doing; this can lead to an immediate error or it can be a
more subtle problem (e.g. a register read might return an incorrect
value that is still valid, leading the driver to make a wrong choice
instead of flagging an error).
We avoid an error from the force_wake function because callers might
handle or tolerate the error, but this only works if all callers
are checking the error code. The majority already do, but a few are not.
These are mainly falling into 3 categories, which are each handled
differently:

1) error capture: in this case we want to continue the capture, but we
   log an info message in dmesg to notify the user that the capture
   might have incorrect data.

2) ioctl: in this case we return a -EIO error to userspace

3) unabortable actions: these are scenarios where we can't simply abort
   and retry and so it's better to just try it anyway because there is a
   chance the HW is awake even with the failure. In this case we throw a
   warning so we know there was a forcewake problem if something fails
   down the line.

v2: use gt_WARN_ON where appropriate

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240318154924.3453513-1-daniele.ceraolospurio@intel.com
2024-03-20 14:13:58 -07:00
Daniele Ceraolo Spurio
43c4ff3ca2 drm/xe/guc: Don't support older GuC 70.x releases
Supporting older GuC versions comes with baggage, both on the coding
side (due to interfaces only being available from a certain version
onwards) and on the testing side (due to having to make sure the driver
works as expected with older GuCs).
Since all of our Xe platform are still under force probe, we haven't
committed to support any specific GuC version and we therefore don't
need to support the older once, which means that we can force a bottom
limit to what GuC we accept. This allows us to remove any conditional
statements based on older GuC versions and also to approach newer
additions knowing that we'll never attempt to load something older
than our minimum requirement.

As an initial value, the minimum expected version is set to 70.19,
which is the version currently in the firmware table, but the
expectation is that this will be bumbed every time we update the
table, until we remove the force probe.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240304162616.824884-1-daniele.ceraolospurio@intel.com
2024-03-19 12:14:02 -07:00
Lucas De Marchi
e89f4967d9 drm/xe: Drop WA 16015675438
With dynamic load-balancing disabled on the compute side, there's no
reason left to enable WA 16015675438. Drop it from both PVC and DG2.

Note that this can be done because now the driver always set a fixed
partition of EUs during initialization via the ccs_mode configuration.

Cc: Mateusz Jablonski <mateusz.jablonski@intel.com>
Cc: Michal Mrozek <michal.mrozek@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Acked-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240304233103.1687412-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-03-06 05:27:08 -08:00
Dafna Hirschfeld
a24d909977 drm/xe: Do not include current dir for generated/xe_wa_oob.h
The generated file 'generated/xe_wa_oob.h' is included using:
"generated/xe_wa_oob.h"
which first look inside the source code. But the file resides
in the build directory and should therefore be included using:
<generated/xe_wa_oob.h>

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240221083622.1584492-1-dhirschfeld@habana.ai
2024-02-21 21:53:15 -08:00
Michał Winiarski
8a4587ef9f drm/xe/guc: Move GuC power control init to "post-hwconfig"
SLPC is not used at "hwconfig" stage. Move the initialization of data
structures used for SLPC to a later point in probe.
Also - move the xe_guc_pc_init_early to happen just prior to initial
"hwconfig" load.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219130530.1406044-3-michal.winiarski@intel.com
2024-02-20 14:13:46 -05:00
Michał Winiarski
a44bbace73 drm/xe/guc: Allocate GuC data structures in system memory for initial load
GuC load will need to happen at an earlier point in probe, where local
memory is not yet available. Use system memory for GuC data structures
used for initial "hwconfig" load, and realloc at a later,
"post-hwconfig" load if needed, when local memory is available.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219130530.1406044-1-michal.winiarski@intel.com
2024-02-20 14:13:42 -05:00
Karthik Poosa
cd43106c9b drm/xe/guc: Reduce a print from warn to debug
Reduce debug print from warn to debug to avoid unnecessary warning
message in dmesg: the firmware loading logic already has the right
printk priority level when checking the firmware version.

Fixes: c5a06c9169 ("drm/xe/guc: Enable WA 14018913170")
Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240125165652.3764711-1-karthik.poosa@intel.com
[ slightly reword debug and commit messages ]
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-01-30 11:37:38 -08:00
Matthew Brost
dc75d03716 drm/xe/guc: Add more GuC CT states
The Guc CT has more than enabled / disables states rather it has 4. The
4 states are not initialized, disabled, stopped, and enabled. Change the
code to reflect this. These states will enable proper return codes from
functions and therefore enable proper error messages.

v2:
- s/XE_GUC_CT_STATE_DROP_MESSAGES/XE_GUC_CT_STATE_STOPPED (Michal)
- Add assert for CT being initialized (Michal)
- Fix kernel for CT state enum (Michal)

v3:
- Kernel doc (Michal)
- s/reiecved/received (Michal)
- assert CT state not initialized in xe_guc_ct_init (Michal)
- add argument xe_guc_ct_set_state to clear g2h (Michal)

v4:
- Drop clear_outstanding_g2h argument (Michal)

v5:
- Move xa_destroy outside of fast lock (CI)

Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240122210156.1517444-2-matthew.brost@intel.com
2024-01-26 15:12:28 -05:00
Rodrigo Vivi
06af1954ae drm/xe: Do not flood dmesg with guc log
This information is already present at
/sys/kernel/debug/dri/0/gt0/uc/guc_log if needed.

v2: add missing chunk
v3: remove spurious line

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240118214856.399952-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-01-19 09:52:18 -05:00
Karthik Poosa
c5a06c9169 drm/xe/guc: Enable WA 14018913170
The GuC handles the WA, the KMD just needs to set the flag to enable
it on the appropriate platforms.

v2:
  - Fixed CI checkpatch warning, alignment should match open parenthesis.
  - Fixed GUC FW version check to use XE_UC_FW_VER_RELEASE which points to
    current GUC FW version instead of XE_UC_FW_VER_COMPATIBILITY which
    holds GUC FW I/F version (Badal).
v3:
  - Removed extra character in debug print.

Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117055035.2417711-1-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-01-18 09:05:37 -05:00
Michal Wajdeczko
3c01e01214 drm/xe/guc: Treat non-response message after BUSY as unexpected
Once GuC replied with GUC_HXG_TYPE_NO_RESPONSE_BUSY message then
we may expect that only RESPONSE_SUCCESS or FAILURE message will
be sent, anything else is a violation of the HXG protocol.

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20240111154838.541-1-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
2024-01-11 19:16:09 +01:00
Michal Wajdeczko
88cbf85020 drm/xe: Split GuC communication initialization
Soon we will be trying to communicate with the GuC firmware very
early during VF driver probe, before we finish normal init steps.
Split GuC communication initialization code so the GuC MMIO based
communication xe_guc_mmio_send() functions will work where needed.

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20240111162051.585-1-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
2024-01-11 19:14:19 +01:00
Vinay Belgaumkar
69cac0a8f3 drm/xe: Check skip_guc_pc before setting SLPC flag
Don't set SLPC GuC feature ctl flag if skip_guc_pc is true.

v2: Skip the freq related sysfs creation as well (Badal)
v3: Remove unnecessary parenthesis (Lucas)

Fixes: 975e4a3795 ("drm/xe: Manually setup C6 when skip_guc_pc is set")
Fixes: bef52b5c7a ("drm/xe: Create a xe_gt_freq component for raw management and sysfs")
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20240108225842.966066-1-vinay.belgaumkar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-01-09 17:48:31 -05:00
Michal Wajdeczko
811fe9f556 drm/xe/guc: Introduce Relay Communication for SR-IOV
There are scenarios where SR-IOV Virtual Function (VF) driver will
need to get additional data that is not available over VF MMIO BAR
nor could be queried from the GuC firmware and must be obtained
from the Physical Function (PF) driver.

To allow such communication between VF and PF drivers, GuC supports
set of H2G and G2H actions which allows relaying embedded messages,
that are otherwise opaque for the GuC.

To allow use of this communication mechanism, provide functions for
sending requests and handling replies and placeholder where we will
put handlers for incoming requests.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20240104222031.277-8-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
2024-01-05 16:25:53 +01:00
Michal Wajdeczko
aef4eb7c7d drm/xe/vf: Setup memory based interrupts in GuC
When Memory Based Interrupts are used, the VF driver must provide
to the GuC references to the Source and Status Report Pages.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231214185955.1791-10-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
2023-12-21 16:31:29 -05:00
Matt Roper
4e124151fc drm/xe/dg2: Drop pre-production workarounds
Pre-production hardware is anything before C0 (for DG2-G10), before B1
(for DG2-G11), or before A1 (for DG2-G12).  Workarounds specific to such
hardware was already removed from i915 in commit eaeb4b3614
("drm/i915/dg2: Drop pre-production GT workarounds") and there's even
less value keeping these around in the Xe driver.

v2:
 - Drop Wa_14011441408 from xe_mocs.c.  (Gustavo)
 - Drop Wa_14010648519, Wa_14010198302, and Wa_1608949956 which were
   mis-implemented; they were only supposed to apply to early steppings
   of DG2-G10, but were being applied unconditionally on all DG2.
   (Gustavo)
 - Drop reference to Wa_16011620976; the implementation stays because it
   still matches Wa_22015475538.  (Gustavo)

Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20231215214531.2576215-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2023-12-21 16:31:29 -05:00
Michał Winiarski
4d637a1de2 drm/xe/guc: Split GuC params used for "hwconfig" and "post-hwconfig"
Move params that are not used for initial "hwconfig" load to
"post-hwconfig" phase.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:45:12 -05:00
Michal Wajdeczko
b67cb798e4 drm/xe/guc: Include only required GuC ABI headers
On i915 we were adding new GuC ABI headers directly to guc_fwif.h
file since we were replacing old definitions from that file.

On xe driver we could do more and better by including ABI headers
only in files that need those definitions.

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/741
Cc: Jani Nikula <jani.nikula@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20231128203203.1147-3-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:45:08 -05:00
Lucas De Marchi
0bc519d20f drm/xe: Remove GEN[0-9]*_ prefixes
After noticing in logs there were still mentions to GEN6 registers, it
was clear commit d9b79ad275 ("drm/xe: Drop gen afixes from registers")
didn't take care of all the afixes. Some were added later, but there are
also constants and strings still using that.  Continue the cleanup
removing the remaining ones.

To keep it consistent with code nearby, a few other changes are made:

	- Remove prefix in INTEL_LEGACY_64B_CONTEXT
	- Remove GEN8_CTX_L3LLC_COHERENT since it's unused
	- Rename GEN9_FREQ_SCALER to GT_FREQUENCY_SCALER

v2: Use XELP_ as prefix for NUM_MOCS_ENTRIES and remove changes to
    MOCS_ENTRIES as this is now done as part of a previous commit
    (Matt Roper)

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231117174049.527192-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:44:39 -05:00
Michal Wajdeczko
cac74742fa drm/xe/guc: Use valid scratch register for posting read
There are only 4 scratch registers VF_SW_FLAG(0..3) on each GuC.
We shouldn't use non-existing register VF_SW_FLAG(4) for posting
read.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:44:39 -05:00
Michal Wajdeczko
1d087cb7d8 drm/xe/guc: Fix handling of GUC_HXG_TYPE_NO_RESPONSE_BUSY
If GuC responds with the NO_RESPONSE_BUSY message, we extend
our timeout while waiting for the actual response, but we wrongly
assumed that the next message will be RESPONSE_SUCCESS, missing
that we still can get RESPONSE_FAILURE.

Change the condition for the expected message type, using only
common bits from RESPONSE_SUCCESS and RESPONSE_FAILURE (as they
differ, by ABI design, only by the last bit).

v2: add comment/checks to the code (Matt)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:44:38 -05:00
Michal Wajdeczko
cd1c9c54c3 drm/xe/guc: Copy response data from proper registers
While copying GuC response from the scratch registers to the buffer,
formula to identify next scratch register is broken. Fix it.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:44:38 -05:00
Michal Wajdeczko
3d78923bd0 drm/xe/guc: Promote guc_to_gt/xe helpers to .h
Duplicating these helpers in almost every .c file is a bad idea.
Define them as inlines in .h file to allow proper reuse.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:44:32 -05:00
Daniele Ceraolo Spurio
757308471d drm/xe/uc: Fix uC status tracking
The current uC status tracking has a few issues:

1) the HuC is moved to "disabled" instead of "not supported"

2) the status is left uninitialized instead of "disabled" when the
   modparam is used to disable support

3) due to #1, a number of checks are done against "disabled" instead of
   the appropriate status.

Address all of those by making sure to follow the appropriate state
transition and checking against the required state.

v2: rebase on s/guc_submission_enabled/uc_enabled/

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:41:14 -05:00
Francois Dugast
c73acc1eeb drm/xe: Use Xe assert macros instead of XE_WARN_ON macro
The XE_WARN_ON macro maps to WARN_ON which is not justified
in many cases where only a simple debug check is needed.
Replace the use of the XE_WARN_ON macro with the new xe_assert
macros which relies on drm_*. This takes a struct drm_device
argument, which is one of the main changes in this commit. The
other main change is that the condition is reversed, as with
XE_WARN_ON a message is displayed if the condition is true,
whereas with xe_assert it is if the condition is false.

v2:
- Rebase
- Keep WARN splats in xe_wopcm.c (Matt Roper)

v3:
- Rebase

Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:41:08 -05:00
Matthew Auld
1da0702c17 drm/xe: nuke GuC on unload
On PVC unloading followed by reloading the module often results in a
completely dead machine (seems to be plaguing CI). Resetting the GuC
like we do at load seems to cure it at least when locally testing this.

v2:
  - Move pc_fini into guc_fini. We want to do the GuC reset just after
    calling pc_fini, otherwise we encounter communication failures. It
    also seems like a good idea to do the reset before we start releasing
    the various other GuC resources. In the case of pc_fini there is an
    explicit stop, but for other stuff like logs, ads, ctb there is not.

References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/542
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/597
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:40:28 -05:00
Matt Roper
0993b22f93 drm/xe/xe2: Program GuC's MOCS on Xe2 and beyond
As with PVC, Xe2 platforms require that the index of an uncached MOCS
entry be programmed into the GUC_SHIM_CONTROL register.  This will
likely be needed on future platforms as well.

Xe2 also extends the size of the MOCS index register field from two bits
to four bits.  Since these extra bits were unused on PVC, it should be
safe to just increase the size of the mask.

Bspec: 60592
Cc: Haridhar Kalvala <haridhar.kalvala@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:40:26 -05:00