linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-03 14:02:43 -04:00

Author	SHA1	Message	Date
Lucas De Marchi	305b3bb522	drm/i915/gt: rename wa_write_masked_or() The use of "masked" in this function is due to its history. Once upon a time it received a mask and a value as parameter. Since commit `eeec73f8a4` ("drm/i915/gt: Skip rmw for masked registers") that is not true anymore and now there is a clear and a set parameter. Depending on the case, that can still be thought as a mask and value, but there are some subtle differences: what we clear doesn't need to be the same bits we are setting, particularly when we are using masked registers. The fact that we also have "masked registers", i.e. registers whose mask is stored in the upper 16 bits of the register, makes it even more confusing, because "masked" in wa_write_masked_or() has little to do with masked registers, but rather refers to the old mask parameter the function received (that can also, but not exclusively, be used to write to masked register). Avoid the ambiguity and misnomer by renaming it to something else, hopefully less confusing: wa_write_clr_set(), to designate that we are doing both clr and set operations in the register. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201209045246.2905675-2-lucas.demarchi@intel.com	2020-12-09 11:37:02 +00:00
Lucas De Marchi	61b3b0d100	drm/i915/gt: stop ignoring read with wa_masked_field_set When using masked registers, there is nothing to clear since a masked register has the mask in the upper 16b: we can just write to the location we want and use the mask to control what bits we are writing to. However that doesn't mean we don't want to read back the register and check the value actually matched what we wanted to write, i.e. that the WA stick. That should be an explicit opt-out for registers that are either write-only or that are affected by hardware misbehavior. Moreover both wa_masked_en() and wa_masked_dis() check the WA stick, so skipping the check just because the field is more than 1 bit is surprising and error-prone. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201209045246.2905675-1-lucas.demarchi@intel.com	2020-12-09 11:37:01 +00:00
Chris Wilson	7c5c15dffe	drm/i915/gt: Declare gen9 has 64 mocs entries! We checked the table size against a hardcoded number of entries, and that number was excluding the special mocs registers at the end. Fixes: `777a7717d6` ("drm/i915/gt: Program mocs:63 for cache eviction on gen9") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: <stable@vger.kernel.org> # v4.3+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201127102540.13117-1-chris@chris-wilson.co.uk (cherry picked from commit `444fbf5d70`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [backported and updated the Fixes sha]	2020-12-08 07:09:58 -08:00
Colin Ian King	88c52d805e	drm/i915: fix size_t greater or equal to zero comparison Currently the check that the unsigned size_t variable i is >= 0 is always true because the unsigned variable will never be negative, causing the loop to run forever. Fix this by changing the pre-decrement check to a zero check on i followed by a decrement of i. Addresses-Coverity: ("Unsigned compared against 0") Fixes: `bfed6708d6` ("drm/i915: use vmap in shmem_pin_map") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201002170354.94627-1-colin.king@canonical.com (cherry picked from commit `e70956a249`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-12-08 07:03:43 -08:00
Chris Wilson	0fe8bf4d3e	drm/i915/gt: Cancel the preemption timeout on responding to it We currently presume that the engine reset is successful, cancelling the expired preemption timer in the process. However, engine resets can fail, leaving the timeout still pending and we will then respond to the timeout again next time the tasklet fires. What we want is for the failed engine reset to be promoted to a full device reset, which is kicked by the heartbeat once the engine stops processing events. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1168 Fixes: `3a7a92aba8` ("drm/i915/execlists: Force preemption") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-2-chris@chris-wilson.co.uk (cherry picked from commit `d997e240ce`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-12-08 07:03:38 -08:00
Chris Wilson	5419d93ffd	drm/i915/gt: Ignore repeated attempts to suspend request flow across reset Before reseting the engine, we suspend the execution of the guilty request, so that we can continue execution with a new context while we slowly compress the captured error state for the guilty context. However, if the reset fails, we will promptly attempt to reset the same request again, and discover the ongoing capture. Ignore the second attempt to suspend and capture the same request. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1168 Fixes: `32ff621fd7` ("drm/i915/gt: Allow temporary suspension of inflight requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: <stable@vger.kernel.org> # v5.7+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-1-chris@chris-wilson.co.uk (cherry picked from commit `b969540500`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-12-08 07:03:32 -08:00
Colin Ian King	e70956a249	drm/i915: fix size_t greater or equal to zero comparison Currently the check that the unsigned size_t variable i is >= 0 is always true because the unsigned variable will never be negative, causing the loop to run forever. Fix this by changing the pre-decrement check to a zero check on i followed by a decrement of i. Addresses-Coverity: ("Unsigned compared against 0") Fixes: `bfed6708d6` ("drm/i915: use vmap in shmem_pin_map") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201002170354.94627-1-colin.king@canonical.com	2020-12-07 10:44:17 +00:00
Lucas De Marchi	6ca07255ac	drm/i915: remove WA_SET_FIELD_MASKED() Remove the last macro and implement it as a function like the rest of the operations that don't assume there is a `wal` list, but rather receive it as argument. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201205092542.2325477-4-lucas.demarchi@intel.com	2020-12-05 16:12:41 +00:00
Lucas De Marchi	6690161428	drm/i915: remove WA_CLR_BIT_MASKED() Just ommitting the list it's operating on doesn't save much typing and adds another way to do the same thing. Just replace it with wa_masked_dis(). Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201205092542.2325477-3-lucas.demarchi@intel.com	2020-12-05 16:12:40 +00:00
Lucas De Marchi	b9bdccd51a	drm/i915: remove WA_SET_BIT_MASKED() Just ommitting the list it's operating on doesn't save much typing and adds another way to do the same thing. Just replace it with wa_masked_en(). Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201205092542.2325477-2-lucas.demarchi@intel.com	2020-12-05 16:12:39 +00:00
Swathi Dhanavanthri	1efa473e65	drm/i915/dg1: Implement WA_16011163337 Set GS Timer to 224 to prevent a HS/DS hang. Bspec: 53508 v2: reword commit message and add comment explaining why read verification is ignored (Chris) Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Swathi Dhanavanthri <swathi.dhanavanthri@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201205092542.2325477-1-lucas.demarchi@intel.com	2020-12-05 16:12:39 +00:00
Chris Wilson	f867b66e47	drm/i915/gt: Clear the execlists timers upon reset Across a reset, we stop the engine but not the timers. This leaves a window where the timers have inconsistent state with the engine, but should only result in a spurious timeout. As we cancel the outstanding events, also cancel their timers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-4-chris@chris-wilson.co.uk	2020-12-04 15:41:33 +00:00
Chris Wilson	cb56a07d2f	drm/i915/gt: Include reset failures in the trace The GT and engine reset failures are completely invisible when looking at a trace for a bug, but are vital to understanding the incomplete flow. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-3-chris@chris-wilson.co.uk	2020-12-04 15:41:33 +00:00
Chris Wilson	d997e240ce	drm/i915/gt: Cancel the preemption timeout on responding to it We currently presume that the engine reset is successful, cancelling the expired preemption timer in the process. However, engine resets can fail, leaving the timeout still pending and we will then respond to the timeout again next time the tasklet fires. What we want is for the failed engine reset to be promoted to a full device reset, which is kicked by the heartbeat once the engine stops processing events. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1168 Fixes: `3a7a92aba8` ("drm/i915/execlists: Force preemption") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-2-chris@chris-wilson.co.uk	2020-12-04 15:41:32 +00:00
Chris Wilson	b969540500	drm/i915/gt: Ignore repeated attempts to suspend request flow across reset Before reseting the engine, we suspend the execution of the guilty request, so that we can continue execution with a new context while we slowly compress the captured error state for the guilty context. However, if the reset fails, we will promptly attempt to reset the same request again, and discover the ongoing capture. Ignore the second attempt to suspend and capture the same request. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1168 Fixes: `32ff621fd7` ("drm/i915/gt: Allow temporary suspension of inflight requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: <stable@vger.kernel.org> # v5.7+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-1-chris@chris-wilson.co.uk	2020-12-04 15:41:31 +00:00
Dave Airlie	46fe37b98e	Merge tag 'drm-intel-next-queued-2020-11-27' of git://anongit.freedesktop.org/drm/drm-intel into drm-next drm/i915 features for v5.11: Highlights: - Enable big joiner to join two pipes to one port to overcome pipe restrictions (Manasi, Ville, Maarten) Display: - More DG1 enabling (Lucas, Aditya) - Fixes to cases without display (Lucas, José, Jani) - Initial PSR state improvements (José) - JSL eDP vswing updates (Tejas) - Handle EDID declared max 16 bpc (Ville) - Display refactoring (Ville) Other: - GVT features - Backmerge Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87czzzkk1s.fsf@intel.com	2020-12-03 13:01:44 +10:00
Chris Wilson	aff76ab795	drm/i915/gt: Limit frequency drop to RPe on parking We treat idling the GT (intel_rps_park) as a downclock event, and reduce the frequency we intend to restart the GT with. Since the two workloads are likely related (e.g. a compositor rendering every 16ms), we want to carry the frequency and load information from across the idling. However, we do also need to update the frequencies so that workloads that run for less than 1ms are autotuned by RPS (otherwise we leave compositors running at max clocks, draining excess power). Conversely, if we try to run too slowly, the next workload has to run longer. Since there is a hysteresis in the power graph, below a certain frequency running a short workload for longer consumes more energy than running it slightly higher for less time. The exact balance point is unknown beforehand, but measurements with 30fps media playback indicate that RPe is a better choice. Reported-by: Edward Baker <edward.baker@intel.com> Tested-by: Edward Baker <edward.baker@intel.com> Fixes: `043cd2d14e` ("drm/i915/gt: Leave rps->cur_freq on unpark") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Edward Baker <edward.baker@intel.com> Cc: Andi Shyti <andi.shyti@intel.com> Cc: Lyude Paul <lyude@redhat.com> Cc: <stable@vger.kernel.org> # v5.8+ Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201124183521.28623-1-chris@chris-wilson.co.uk (cherry picked from commit `f7ed83cc19`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-12-02 17:05:58 -08:00
Venkata Ramana Nayana	78b2eb8a1f	drm/i915/gt: Retain default context state across shrinking As we use a shmemfs file to hold the context state, when not in use it may be swapped out, such as across suspend. Since we wrote into the shmemfs without marking the pages as dirty, the contents may be dropped instead of being written back to swap. On re-using the shmemfs file, such as creating a new context after resume, the contents of that file were likely garbage and so the new context could then hang the GPU. Simply mark the page as being written when copying into the shmemfs file, and it the new contents will be retained across swapout. Fixes: `be1cb55a07` ("drm/i915/gt: Keep a no-frills swappable copy of the default context state") Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Signed-off-by: CQ Tang <cq.tang@intel.com> Signed-off-by: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: <stable@vger.kernel.org> # v5.8+ Link: https://patchwork.freedesktop.org/patch/msgid/20201127120718.454037-161-matthew.auld@intel.com (cherry picked from commit `a9d71f76cc`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-12-02 17:05:58 -08:00
Chris Wilson	2bfdf30246	drm/i915/gt: Split the breadcrumb spinlock between global and contexts As we funnel more and more contexts into the breadcrumbs on an engine, the hold time of b->irq_lock grows. As we may then contend with the b->irq_lock during request submission, this increases the burden upon the engine->active.lock and so directly impacts both our execution latency and client latency. If we split the b->irq_lock by introducing a per-context spinlock to manage the signalers within a context, we then only need the b->irq_lock for enabling/disabling the interrupt and can avoid taking the lock for walking the list of contexts within the signal worker. Even with the current setup, this greatly reduces the number of times we have to take and fight for b->irq_lock. Furthermore, this closes the race between enabling the signaling context while it is in the process of being signaled and removed: <4>[ 416.208555] list_add corruption. prev->next should be next (ffff8881951d5910), but was dead000000000100. (prev=ffff8882781bb870). <4>[ 416.208573] WARNING: CPU: 7 PID: 0 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70 <4>[ 416.208575] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915] <4>[ 416.208611] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G U 5.8.0-CI-CI_DRM_8852+ #1 <4>[ 416.208614] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019 <4>[ 416.208627] RIP: 0010:__list_add_valid+0x4d/0x70 <4>[ 416.208631] Code: c3 48 89 d1 48 c7 c7 60 18 33 82 48 89 c2 e8 ea e0 b6 ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 b0 18 33 82 e8 d3 e0 b6 ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 00 19 33 82 e8 <4>[ 416.208633] RSP: 0018:ffffc90000280e18 EFLAGS: 00010086 <4>[ 416.208636] RAX: 0000000000000000 RBX: ffff888250a44880 RCX: 0000000000000105 <4>[ 416.208639] RDX: 0000000000000105 RSI: ffffffff82320c5b RDI: 00000000ffffffff <4>[ 416.208641] RBP: ffff8882781bb870 R08: 0000000000000000 R09: 0000000000000001 <4>[ 416.208643] R10: 00000000054d2957 R11: 000000006abbd991 R12: ffff8881951d58c8 <4>[ 416.208646] R13: ffff888286073880 R14: ffff888286073848 R15: ffff8881951d5910 <4>[ 416.208669] FS: 0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000 <4>[ 416.208671] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 416.208673] CR2: 0000556231326c48 CR3: 0000000005610001 CR4: 0000000000760ee0 <4>[ 416.208675] PKRU: 55555554 <4>[ 416.208677] Call Trace: <4>[ 416.208679] <IRQ> <4>[ 416.208751] i915_request_enable_breadcrumb+0x278/0x400 [i915] <4>[ 416.208839] __i915_request_submit+0xca/0x2a0 [i915] <4>[ 416.208892] __execlists_submission_tasklet+0x480/0x1830 [i915] <4>[ 416.208942] execlists_submission_tasklet+0xc4/0x130 [i915] <4>[ 416.208947] tasklet_action_common.isra.17+0x6c/0x1c0 <4>[ 416.208954] __do_softirq+0xdf/0x498 <4>[ 416.208960] ? handle_fasteoi_irq+0x150/0x150 <4>[ 416.208964] asm_call_on_stack+0xf/0x20 <4>[ 416.208966] </IRQ> <4>[ 416.208969] do_softirq_own_stack+0xa1/0xc0 <4>[ 416.208972] irq_exit_rcu+0xb5/0xc0 <4>[ 416.208976] common_interrupt+0xf7/0x260 <4>[ 416.208980] asm_common_interrupt+0x1e/0x40 <4>[ 416.208985] RIP: 0010:cpuidle_enter_state+0xb6/0x410 <4>[ 416.208987] Code: 00 31 ff e8 9c 3e 89 ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 31 03 00 00 31 ff e8 e3 6c 90 ff e8 fe a4 94 ff fb 45 85 ed <0f> 88 c7 02 00 00 49 63 c5 4c 2b 24 24 48 8d 14 40 48 8d 14 90 48 <4>[ 416.208989] RSP: 0018:ffffc90000143e70 EFLAGS: 00000206 <4>[ 416.208991] RAX: 0000000000000007 RBX: ffffe8ffffda8070 RCX: 0000000000000000 <4>[ 416.208993] RDX: 0000000000000000 RSI: ffffffff8238b4ee RDI: ffffffff8233184f <4>[ 416.208995] RBP: ffffffff826b4e00 R08: 0000000000000000 R09: 0000000000000000 <4>[ 416.208997] R10: 0000000000000001 R11: 0000000000000000 R12: 00000060e7f24a8f <4>[ 416.208998] R13: 0000000000000003 R14: 0000000000000003 R15: 0000000000000003 <4>[ 416.209012] cpuidle_enter+0x24/0x40 <4>[ 416.209016] do_idle+0x22f/0x2d0 <4>[ 416.209022] cpu_startup_entry+0x14/0x20 <4>[ 416.209025] start_secondary+0x158/0x1a0 <4>[ 416.209030] secondary_startup_64+0xa4/0xb0 <4>[ 416.209039] irq event stamp: 10186977 <4>[ 416.209042] hardirqs last enabled at (10186976): [<ffffffff810b9363>] tasklet_action_common.isra.17+0xe3/0x1c0 <4>[ 416.209044] hardirqs last disabled at (10186977): [<ffffffff81a5e5ed>] _raw_spin_lock_irqsave+0xd/0x50 <4>[ 416.209047] softirqs last enabled at (10186968): [<ffffffff810b9a1a>] irq_enter_rcu+0x6a/0x70 <4>[ 416.209049] softirqs last disabled at (10186969): [<ffffffff81c00f4f>] asm_call_on_stack+0xf/0x20 <4>[ 416.209317] list_del corruption, ffff8882781bb870->next is LIST_POISON1 (dead000000000100) <4>[ 416.209317] WARNING: CPU: 7 PID: 46 at lib/list_debug.c:47 __list_del_entry_valid+0x4e/0x90 <4>[ 416.209317] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915] <4>[ 416.209317] CPU: 7 PID: 46 Comm: ksoftirqd/7 Tainted: G U W 5.8.0-CI-CI_DRM_8852+ #1 <4>[ 416.209317] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019 <4>[ 416.209317] RIP: 0010:__list_del_entry_valid+0x4e/0x90 <4>[ 416.209317] Code: 2e 48 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 48 b8 01 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 38 19 33 82 e8 62 e0 b6 ff <0f> 0b 31 c0 c3 48 89 fe 48 c7 c7 70 19 33 82 e8 4e e0 b6 ff 0f 0b <4>[ 416.209317] RSP: 0018:ffffc90000280de8 EFLAGS: 00010086 <4>[ 416.209317] RAX: 0000000000000000 RBX: ffff8882781bb848 RCX: 0000000000010104 <4>[ 416.209317] RDX: 0000000000010104 RSI: ffffffff8238b4ee RDI: 00000000ffffffff <4>[ 416.209317] RBP: ffff8882781bb880 R08: 0000000000000000 R09: 0000000000000001 <4>[ 416.209317] R10: 000000009fb6666e R11: 00000000feca9427 R12: ffffc90000280e18 <4>[ 416.209317] R13: ffff8881951d5930 R14: dead0000000000d8 R15: ffff8882781bb880 <4>[ 416.209317] FS: 0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000 <4>[ 416.209317] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 416.209317] CR2: 0000556231326c48 CR3: 0000000005610001 CR4: 0000000000760ee0 <4>[ 416.209317] PKRU: 55555554 <4>[ 416.209317] Call Trace: <4>[ 416.209317] <IRQ> <4>[ 416.209317] remove_signaling_context.isra.13+0xd/0x70 [i915] <4>[ 416.209513] signal_irq_work+0x1f7/0x4b0 [i915] This is caused by virtual engines where although we take the breadcrumb lock on each of the active engines, they may be different engines on different requests, It turns out that the b->irq_lock was not a sufficient proxy for the engine->active.lock in the case of more than one request, so introduce an explicit lock around ce->signals. v2: ce->signal_lock is acquired with only RCU protection and so must be treated carefully and not cleared during reallocation. We also then need to confirm that the ce we lock is the same as we found in the breadcrumb list. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2276 Fixes: `c18636f763` ("drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs") Fixes: `2854d86632` ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-4-chris@chris-wilson.co.uk (cherry picked from commit `c744d50363`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-12-02 17:05:58 -08:00
Chris Wilson	9261a1db80	drm/i915/gt: Protect context lifetime with RCU Allow a brief period for continued access to a dead intel_context by deferring the release of the struct until after an RCU grace period. As we are using a dedicated slab cache for the contexts, we can defer the release of the slab pages via RCU, with the caveat that individual structs may be reused from the freelist within an RCU grace period. To handle that, we have to avoid clearing members of the zombie struct. This is required for a later patch to handle locking around virtual requests in the signaler, as those requests may want to move between engines and be destroyed while we are holding b->irq_lock on a physical engine. v2: Drop mutex_reinit(), if we never mark the mutex as destroyed we don't need to reset the debug code, at the loss of having the mutex debug code spot us attempting to destroy a locked mutex. v3: As the intended use will remain strongly referenced counted, with very little inflight access across reuse, drop the ctor. v4: Drop the unrequired change to remove the temporary reference around dropping the active context, and add back some more missing ctor operations. v5: The ctor is back. Tvrtko spotted that ce->signal_lock [introduced later] maybe accessed under RCU and so needs special care not to be reinitialised. v6: Don't mix SLAB_TYPESAFE_BY_RCU and RCU list iteration. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-3-chris@chris-wilson.co.uk (cherry picked from commit `14d1eaf088`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-12-02 16:53:00 -08:00
Swathi Dhanavanthri	5ac84806f5	drm/i915/tgl, rkl, dg1: Apply WA_1406941453 to TGL, RKL and DG1 This workaround is applicable only for tgl,rkl and dg1. Bspec: 52890, 53273, 53508. Signed-off-by: Swathi Dhanavanthri <swathi.dhanavanthri@intel.com> Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201201175735.1377372-1-lucas.demarchi@intel.com	2020-12-01 16:34:45 -08:00
Chris Wilson	777a7717d6	drm/i915/gt: Program mocs:63 for cache eviction on gen9 Ville noticed that the last mocs entry is used unconditionally by the HW when it performs cache evictions, and noted that while the value is not meant to be writable by the driver, we should program it to a reasonable value nevertheless. As it turns out, we can change the value of mocs:63 and the value we were programming into it would cause hard hangs in conjunction with atomic operations. v2: Add details from bspec about how it is used by HW Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2707 Fixes: `3bbaba0cea` ("drm/i915: Added Programming of the MOCS") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: <stable@vger.kernel.org> # v4.3+ Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201126140841.1982-1-chris@chris-wilson.co.uk (cherry picked from commit `977933b5da`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-11-30 16:31:30 -08:00
Chris Wilson	f7ed83cc19	drm/i915/gt: Limit frequency drop to RPe on parking We treat idling the GT (intel_rps_park) as a downclock event, and reduce the frequency we intend to restart the GT with. Since the two workloads are likely related (e.g. a compositor rendering every 16ms), we want to carry the frequency and load information from across the idling. However, we do also need to update the frequencies so that workloads that run for less than 1ms are autotuned by RPS (otherwise we leave compositors running at max clocks, draining excess power). Conversely, if we try to run too slowly, the next workload has to run longer. Since there is a hysteresis in the power graph, below a certain frequency running a short workload for longer consumes more energy than running it slightly higher for less time. The exact balance point is unknown beforehand, but measurements with 30fps media playback indicate that RPe is a better choice. Reported-by: Edward Baker <edward.baker@intel.com> Tested-by: Edward Baker <edward.baker@intel.com> Fixes: `043cd2d14e` ("drm/i915/gt: Leave rps->cur_freq on unpark") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Edward Baker <edward.baker@intel.com> Cc: Andi Shyti <andi.shyti@intel.com> Cc: Lyude Paul <lyude@redhat.com> Cc: <stable@vger.kernel.org> # v5.8+ Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201124183521.28623-1-chris@chris-wilson.co.uk	2020-11-30 17:48:18 +00:00
Venkata Ramana Nayana	a9d71f76cc	drm/i915/gt: Retain default context state across shrinking As we use a shmemfs file to hold the context state, when not in use it may be swapped out, such as across suspend. Since we wrote into the shmemfs without marking the pages as dirty, the contents may be dropped instead of being written back to swap. On re-using the shmemfs file, such as creating a new context after resume, the contents of that file were likely garbage and so the new context could then hang the GPU. Simply mark the page as being written when copying into the shmemfs file, and it the new contents will be retained across swapout. Fixes: `be1cb55a07` ("drm/i915/gt: Keep a no-frills swappable copy of the default context state") Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Signed-off-by: CQ Tang <cq.tang@intel.com> Signed-off-by: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: <stable@vger.kernel.org> # v5.8+ Link: https://patchwork.freedesktop.org/patch/msgid/20201127120718.454037-161-matthew.auld@intel.com	2020-11-27 20:12:55 +00:00
Chris Wilson	444fbf5d70	drm/i915/gt: Declare gen9 has 64 mocs entries! We checked the table size against a hardcoded number of entries, and that number was excluding the special mocs registers at the end. Fixes: `977933b5da` ("drm/i915/gt: Program mocs:63 for cache eviction on gen9") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: <stable@vger.kernel.org> # v4.3+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201127102540.13117-1-chris@chris-wilson.co.uk	2020-11-27 12:27:37 +00:00
Chris Wilson	85cc2917a3	drm/i915/gt: Move the breadcrumb to the signaler if completed upon cancel If while we are cancelling the breadcrumb signaling, we find that the request is already completed, move it to the irq signaler and let it be signaled. v2: Tweak reference counting so that we only acquire a new reference on adding to a signal list, as opposed to a hidden i915_request_put of the caller's reference. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-5-chris@chris-wilson.co.uk	2020-11-26 18:35:17 +00:00
Chris Wilson	c744d50363	drm/i915/gt: Split the breadcrumb spinlock between global and contexts As we funnel more and more contexts into the breadcrumbs on an engine, the hold time of b->irq_lock grows. As we may then contend with the b->irq_lock during request submission, this increases the burden upon the engine->active.lock and so directly impacts both our execution latency and client latency. If we split the b->irq_lock by introducing a per-context spinlock to manage the signalers within a context, we then only need the b->irq_lock for enabling/disabling the interrupt and can avoid taking the lock for walking the list of contexts within the signal worker. Even with the current setup, this greatly reduces the number of times we have to take and fight for b->irq_lock. Furthermore, this closes the race between enabling the signaling context while it is in the process of being signaled and removed: <4>[ 416.208555] list_add corruption. prev->next should be next (ffff8881951d5910), but was dead000000000100. (prev=ffff8882781bb870). <4>[ 416.208573] WARNING: CPU: 7 PID: 0 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70 <4>[ 416.208575] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915] <4>[ 416.208611] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G U 5.8.0-CI-CI_DRM_8852+ #1 <4>[ 416.208614] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019 <4>[ 416.208627] RIP: 0010:__list_add_valid+0x4d/0x70 <4>[ 416.208631] Code: c3 48 89 d1 48 c7 c7 60 18 33 82 48 89 c2 e8 ea e0 b6 ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 b0 18 33 82 e8 d3 e0 b6 ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 00 19 33 82 e8 <4>[ 416.208633] RSP: 0018:ffffc90000280e18 EFLAGS: 00010086 <4>[ 416.208636] RAX: 0000000000000000 RBX: ffff888250a44880 RCX: 0000000000000105 <4>[ 416.208639] RDX: 0000000000000105 RSI: ffffffff82320c5b RDI: 00000000ffffffff <4>[ 416.208641] RBP: ffff8882781bb870 R08: 0000000000000000 R09: 0000000000000001 <4>[ 416.208643] R10: 00000000054d2957 R11: 000000006abbd991 R12: ffff8881951d58c8 <4>[ 416.208646] R13: ffff888286073880 R14: ffff888286073848 R15: ffff8881951d5910 <4>[ 416.208669] FS: 0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000 <4>[ 416.208671] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 416.208673] CR2: 0000556231326c48 CR3: 0000000005610001 CR4: 0000000000760ee0 <4>[ 416.208675] PKRU: 55555554 <4>[ 416.208677] Call Trace: <4>[ 416.208679] <IRQ> <4>[ 416.208751] i915_request_enable_breadcrumb+0x278/0x400 [i915] <4>[ 416.208839] __i915_request_submit+0xca/0x2a0 [i915] <4>[ 416.208892] __execlists_submission_tasklet+0x480/0x1830 [i915] <4>[ 416.208942] execlists_submission_tasklet+0xc4/0x130 [i915] <4>[ 416.208947] tasklet_action_common.isra.17+0x6c/0x1c0 <4>[ 416.208954] __do_softirq+0xdf/0x498 <4>[ 416.208960] ? handle_fasteoi_irq+0x150/0x150 <4>[ 416.208964] asm_call_on_stack+0xf/0x20 <4>[ 416.208966] </IRQ> <4>[ 416.208969] do_softirq_own_stack+0xa1/0xc0 <4>[ 416.208972] irq_exit_rcu+0xb5/0xc0 <4>[ 416.208976] common_interrupt+0xf7/0x260 <4>[ 416.208980] asm_common_interrupt+0x1e/0x40 <4>[ 416.208985] RIP: 0010:cpuidle_enter_state+0xb6/0x410 <4>[ 416.208987] Code: 00 31 ff e8 9c 3e 89 ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 31 03 00 00 31 ff e8 e3 6c 90 ff e8 fe a4 94 ff fb 45 85 ed <0f> 88 c7 02 00 00 49 63 c5 4c 2b 24 24 48 8d 14 40 48 8d 14 90 48 <4>[ 416.208989] RSP: 0018:ffffc90000143e70 EFLAGS: 00000206 <4>[ 416.208991] RAX: 0000000000000007 RBX: ffffe8ffffda8070 RCX: 0000000000000000 <4>[ 416.208993] RDX: 0000000000000000 RSI: ffffffff8238b4ee RDI: ffffffff8233184f <4>[ 416.208995] RBP: ffffffff826b4e00 R08: 0000000000000000 R09: 0000000000000000 <4>[ 416.208997] R10: 0000000000000001 R11: 0000000000000000 R12: 00000060e7f24a8f <4>[ 416.208998] R13: 0000000000000003 R14: 0000000000000003 R15: 0000000000000003 <4>[ 416.209012] cpuidle_enter+0x24/0x40 <4>[ 416.209016] do_idle+0x22f/0x2d0 <4>[ 416.209022] cpu_startup_entry+0x14/0x20 <4>[ 416.209025] start_secondary+0x158/0x1a0 <4>[ 416.209030] secondary_startup_64+0xa4/0xb0 <4>[ 416.209039] irq event stamp: 10186977 <4>[ 416.209042] hardirqs last enabled at (10186976): [<ffffffff810b9363>] tasklet_action_common.isra.17+0xe3/0x1c0 <4>[ 416.209044] hardirqs last disabled at (10186977): [<ffffffff81a5e5ed>] _raw_spin_lock_irqsave+0xd/0x50 <4>[ 416.209047] softirqs last enabled at (10186968): [<ffffffff810b9a1a>] irq_enter_rcu+0x6a/0x70 <4>[ 416.209049] softirqs last disabled at (10186969): [<ffffffff81c00f4f>] asm_call_on_stack+0xf/0x20 <4>[ 416.209317] list_del corruption, ffff8882781bb870->next is LIST_POISON1 (dead000000000100) <4>[ 416.209317] WARNING: CPU: 7 PID: 46 at lib/list_debug.c:47 __list_del_entry_valid+0x4e/0x90 <4>[ 416.209317] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915] <4>[ 416.209317] CPU: 7 PID: 46 Comm: ksoftirqd/7 Tainted: G U W 5.8.0-CI-CI_DRM_8852+ #1 <4>[ 416.209317] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019 <4>[ 416.209317] RIP: 0010:__list_del_entry_valid+0x4e/0x90 <4>[ 416.209317] Code: 2e 48 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 48 b8 01 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 38 19 33 82 e8 62 e0 b6 ff <0f> 0b 31 c0 c3 48 89 fe 48 c7 c7 70 19 33 82 e8 4e e0 b6 ff 0f 0b <4>[ 416.209317] RSP: 0018:ffffc90000280de8 EFLAGS: 00010086 <4>[ 416.209317] RAX: 0000000000000000 RBX: ffff8882781bb848 RCX: 0000000000010104 <4>[ 416.209317] RDX: 0000000000010104 RSI: ffffffff8238b4ee RDI: 00000000ffffffff <4>[ 416.209317] RBP: ffff8882781bb880 R08: 0000000000000000 R09: 0000000000000001 <4>[ 416.209317] R10: 000000009fb6666e R11: 00000000feca9427 R12: ffffc90000280e18 <4>[ 416.209317] R13: ffff8881951d5930 R14: dead0000000000d8 R15: ffff8882781bb880 <4>[ 416.209317] FS: 0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000 <4>[ 416.209317] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 416.209317] CR2: 0000556231326c48 CR3: 0000000005610001 CR4: 0000000000760ee0 <4>[ 416.209317] PKRU: 55555554 <4>[ 416.209317] Call Trace: <4>[ 416.209317] <IRQ> <4>[ 416.209317] remove_signaling_context.isra.13+0xd/0x70 [i915] <4>[ 416.209513] signal_irq_work+0x1f7/0x4b0 [i915] This is caused by virtual engines where although we take the breadcrumb lock on each of the active engines, they may be different engines on different requests, It turns out that the b->irq_lock was not a sufficient proxy for the engine->active.lock in the case of more than one request, so introduce an explicit lock around ce->signals. v2: ce->signal_lock is acquired with only RCU protection and so must be treated carefully and not cleared during reallocation. We also then need to confirm that the ce we lock is the same as we found in the breadcrumb list. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2276 Fixes: `c18636f763` ("drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs") Fixes: `2854d86632` ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-4-chris@chris-wilson.co.uk	2020-11-26 18:35:03 +00:00
Chris Wilson	14d1eaf088	drm/i915/gt: Protect context lifetime with RCU Allow a brief period for continued access to a dead intel_context by deferring the release of the struct until after an RCU grace period. As we are using a dedicated slab cache for the contexts, we can defer the release of the slab pages via RCU, with the caveat that individual structs may be reused from the freelist within an RCU grace period. To handle that, we have to avoid clearing members of the zombie struct. This is required for a later patch to handle locking around virtual requests in the signaler, as those requests may want to move between engines and be destroyed while we are holding b->irq_lock on a physical engine. v2: Drop mutex_reinit(), if we never mark the mutex as destroyed we don't need to reset the debug code, at the loss of having the mutex debug code spot us attempting to destroy a locked mutex. v3: As the intended use will remain strongly referenced counted, with very little inflight access across reuse, drop the ctor. v4: Drop the unrequired change to remove the temporary reference around dropping the active context, and add back some more missing ctor operations. v5: The ctor is back. Tvrtko spotted that ce->signal_lock [introduced later] maybe accessed under RCU and so needs special care not to be reinitialised. v6: Don't mix SLAB_TYPESAFE_BY_RCU and RCU list iteration. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-3-chris@chris-wilson.co.uk	2020-11-26 18:26:02 +00:00
Chris Wilson	a58559898a	drm/i915/gt: Check for a completed last request once Pull the repeated check for the last active request being completed to a single spot, when deciding whether or not execlist preemption is required. In doing so, we remove the tasklet kick, introduced with the completion checks in commit `35f3fd8182` ("drm/i915/execlists: Workaround switching back to a completed context"), if we find the request was completed but have not yet seen the corresponding CS event. This was devolving into a busy spin of the tasklet while we waited for the event as the delivery was not as instantaneous as expected. Under load this is sufficient to exhaust the tasklet softirq timeslice, and force ksoftirqd. Quite noticeable overhead for no apparent improvement in latency. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-2-chris@chris-wilson.co.uk	2020-11-26 18:26:02 +00:00
Chris Wilson	b8e2bd98a2	drm/i915/gt: Decouple completed requests on unwind Since the introduction of preempt-to-busy, requests can complete in the background, even while they are not on the engine->active.requests list. As such, the engine->active.request list itself is not in strict retirement order, and we have to scan the entire list while unwinding to not miss any. However, if the request is completed we currently leave it on the list [until retirement], but we could just as simply remove it and stop treating it as active. We would only have to then traverse it once while unwinding in quick succession. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-1-chris@chris-wilson.co.uk	2020-11-26 18:26:01 +00:00
Chris Wilson	977933b5da	drm/i915/gt: Program mocs:63 for cache eviction on gen9 Ville noticed that the last mocs entry is used unconditionally by the HW when it performs cache evictions, and noted that while the value is not meant to be writable by the driver, we should program it to a reasonable value nevertheless. As it turns out, we can change the value of mocs:63 and the value we were programming into it would cause hard hangs in conjunction with atomic operations. v2: Add details from bspec about how it is used by HW Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2707 Fixes: `3bbaba0cea` ("drm/i915: Added Programming of the MOCS") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: <stable@vger.kernel.org> # v4.3+ Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201126140841.1982-1-chris@chris-wilson.co.uk	2020-11-26 14:17:50 +00:00
Chris Wilson	280ffdb6dd	drm/i915/gt: Free stale request on destroying the virtual engine Since preempt-to-busy, we may unsubmit a request while it is still on the HW and completes asynchronously. That means it may be retired and in the process destroy the virtual engine (as the user has closed their context), but that engine may still be holding onto the unsubmitted compelted request. Therefore we need to potentially cleanup the old request on destroying the virtual engine. We also have to keep the virtual_engine alive until after the sibling's execlists_dequeue() have finished peeking into the virtual engines, for which we serialise with RCU. v2: Be paranoid and flush the tasklet as well. v3: And flush the tasklet before the engines, as the tasklet may re-attach an rb_node after our removal from the siblings. Fixes: `6d06779e86` ("drm/i915: Load balancing across a virtual engine") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-4-chris@chris-wilson.co.uk (cherry picked from commit `46eecfccb4`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-11-24 09:30:57 -08:00
Chris Wilson	2e6ce8313a	drm/i915/gt: Don't cancel the interrupt shadow too early We currently want to keep the interrupt enabled until the interrupt after which we have no more work to do. This heuristic was broken by us kicking the irq-work on adding a completed request without attaching a signaler -- hence it appearing to the irq-worker that an interrupt had fired when we were idle. Fixes: `2854d86632` ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-3-chris@chris-wilson.co.uk (cherry picked from commit `3aef910d26`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-11-24 09:30:56 -08:00
Chris Wilson	eb0104ee49	drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock Make b->signaled_requests a lockless-list so that we can manipulate it outside of the b->irq_lock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-2-chris@chris-wilson.co.uk (cherry picked from commit `6cfe66eb71`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-11-24 09:30:56 -08:00
Chris Wilson	08b49e14ec	drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission Move the register slow register write and readback from out of the critical path for execlists submission and delay it until the following worker, shaving off around 200us. Note that the same signal_irq_work() is allowed to run concurrently on each CPU (but it will only be queued once, once running though it can be requeued and reexecuted) so we have to remember to lock the global interactions as we cannot rely on the signal_irq_work() itself providing the serialisation (in constrast to a tasklet). By pushing the arm/disarm into the central signaling worker we can close the race for disarming the interrupt (and dropping its associated GT wakeref) on parking the engine. If we loose the race, that GT wakeref may be held indefinitely, preventing the machine from sleeping while the GPU is ostensibly idle. v2: Move the self-arming parking of the signal_irq_work to a flush of the irq-work from intel_breadcrumbs_park(). Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2271 Fixes: `e23005604b` ("drm/i915/gt: Hold context/request reference while breadcrumbs are active") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-1-chris@chris-wilson.co.uk (cherry picked from commit `9d5612ca16`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-11-24 09:17:41 -08:00
Tvrtko Ursulin	2f87c053ac	drm/i915/guc: Use correct lock for CT event handler CT event handler is called under the gt->irq_lock from the interrupt handling paths so make it the same from the init path. I don't think this mismatch caused any functional issue but we need to wean the code of the global i915->irq_lock. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201120095636.1987395-2-tvrtko.ursulin@linux.intel.com	2020-11-24 09:11:09 +00:00
Tvrtko Ursulin	016669752c	drm/i915/guc: Use correct lock for accessing guc->mmio_msg Guc->mmio_msg is set under the guc->irq_lock in guc_get_mmio_msg so it should be consumed under the same lock from guc_handle_mmio_msg. I am not sure if the overall flow here makes complete sense but at least the correct lock is now used. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201120095636.1987395-1-tvrtko.ursulin@linux.intel.com	2020-11-24 09:10:37 +00:00
Chris Wilson	46eecfccb4	drm/i915/gt: Free stale request on destroying the virtual engine Since preempt-to-busy, we may unsubmit a request while it is still on the HW and completes asynchronously. That means it may be retired and in the process destroy the virtual engine (as the user has closed their context), but that engine may still be holding onto the unsubmitted compelted request. Therefore we need to potentially cleanup the old request on destroying the virtual engine. We also have to keep the virtual_engine alive until after the sibling's execlists_dequeue() have finished peeking into the virtual engines, for which we serialise with RCU. v2: Be paranoid and flush the tasklet as well. v3: And flush the tasklet before the engines, as the tasklet may re-attach an rb_node after our removal from the siblings. Fixes: `6d06779e86` ("drm/i915: Load balancing across a virtual engine") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-4-chris@chris-wilson.co.uk	2020-11-23 17:26:38 +00:00
Chris Wilson	3aef910d26	drm/i915/gt: Don't cancel the interrupt shadow too early We currently want to keep the interrupt enabled until the interrupt after which we have no more work to do. This heuristic was broken by us kicking the irq-work on adding a completed request without attaching a signaler -- hence it appearing to the irq-worker that an interrupt had fired when we were idle. Fixes: `2854d86632` ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-3-chris@chris-wilson.co.uk	2020-11-23 17:26:38 +00:00
Chris Wilson	6cfe66eb71	drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock Make b->signaled_requests a lockless-list so that we can manipulate it outside of the b->irq_lock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-2-chris@chris-wilson.co.uk	2020-11-23 17:26:38 +00:00
Chris Wilson	9d5612ca16	drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission Move the register slow register write and readback from out of the critical path for execlists submission and delay it until the following worker, shaving off around 200us. Note that the same signal_irq_work() is allowed to run concurrently on each CPU (but it will only be queued once, once running though it can be requeued and reexecuted) so we have to remember to lock the global interactions as we cannot rely on the signal_irq_work() itself providing the serialisation (in constrast to a tasklet). By pushing the arm/disarm into the central signaling worker we can close the race for disarming the interrupt (and dropping its associated GT wakeref) on parking the engine. If we loose the race, that GT wakeref may be held indefinitely, preventing the machine from sleeping while the GPU is ostensibly idle. v2: Move the self-arming parking of the signal_irq_work to a flush of the irq-work from intel_breadcrumbs_park(). Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2271 Fixes: `e23005604b` ("drm/i915/gt: Hold context/request reference while breadcrumbs are active") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201123113717.20500-1-chris@chris-wilson.co.uk	2020-11-23 17:26:28 +00:00
Chris Wilson	4ee7379257	drm/i915/gt: Plug IPS into intel_rps_set The old IPS interface did not match the RPS interface that we tried to plug it into (bool vs int return). Once repaired, our minimal selftesting is finally happy! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201121190352.15996-1-chris@chris-wilson.co.uk	2020-11-21 19:38:56 +00:00
Chris Wilson	16cfcb0f3c	drm/i915/selftests: Small tweak to put the termination conditions together If we run out of ring space, or exceed the desired runtime, we wish to stop the subtest. Put these checks together, so that we always keep the requests flushed on completion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201120140314.24749-3-chris@chris-wilson.co.uk	2020-11-20 16:25:23 +00:00
Chris Wilson	8005f37ca9	drm/i915/selftests: Improve granularity for mocs reset checks Allow us to validate mocs configurations after reset if we have either engine or global reset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201120140314.24749-2-chris@chris-wilson.co.uk	2020-11-20 16:24:43 +00:00
Chris Wilson	b5b349b93b	drm/i915: Lift waiter/signaler iterators Lift the list iteration defines for traversing the signaler/waiter lists into i915_scheduler.h for reuse. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-5-chris@chris-wilson.co.uk	2020-11-19 20:34:18 +00:00
Chris Wilson	0986317a45	drm/i915/gt: Show all active timelines for debugging Include the active timelines for debugfs/i915_engine_info, so that we can see which have unready requests inflight which are not shown otherwise. Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-4-chris@chris-wilson.co.uk	2020-11-19 20:34:18 +00:00
Chris Wilson	562675d09a	drm/i915/gt: Update request status flags for debug pretty-printer We plan to expand upon the number of available statuses for when we pretty-print the requests along the timelines, and so need a new set of flags. We have settled upon: Unready [U] - initial status after being submitted, the request is not ready for execution as it is waiting for external fences Ready [R] - all fences the request was waiting on have been signaled, and the request is now ready for execution and will be in a backend queue - a ready request may still need to wait on semaphores [internal fences] Ready/virtual [V] - same as ready, but queued over multiple backends Executing [E] - the request has been transferred from the backend queue and submitted for execution on HW - a completed request may still be regarded as executing, its status may not be updated until it is retired and removed from the lists Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-3-chris@chris-wilson.co.uk	2020-11-19 20:34:14 +00:00
Chris Wilson	1f0e785a9c	drm/i915: Lift i915_request_show() Extract i915_request_show for reuse in other request chain pretty printers. For a bonus point, quietly change the seqno format from %llx to %lld to match everywhere else. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-2-chris@chris-wilson.co.uk	2020-11-19 20:31:04 +00:00
Chris Wilson	14cb9a7763	drm/i915/gt: Include semaphore status in print_request() When pretty-printing the requests for debug, also show the status of any semaphore waits as part of its runnable status. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-1-chris@chris-wilson.co.uk	2020-11-19 20:31:03 +00:00
Chris Wilson	be33805c65	drm/i915/gt: Fixup tgl mocs for PTE tracking Forcing mocs:1 [used for our winsys follows-pte mode] to be cached caused display glitches. Though it is documented as deprecated (and so likely behaves as uncached) use the follow-pte bit and force it out of L3 cache. Testcase: igt/kms_frontbuffer_tracking Testcase: igt/kms_big_fb Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ayaz A Siddiqui <ayaz.siddiqui@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-4-chris@chris-wilson.co.uk (cherry picked from commit `a04ac82736`) Fixes: `849c0fe9e8` ("drm/i915/gt: Initialize reserved and unspecified MOCS indices") Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Rodrigo: Updated Fixes tag]	2020-11-19 15:10:49 -05:00

... 34 35 36 37 38 ...

3179 Commits