INSTPM is saved in the logical context so we should initialize it using
LRIs on gen8. It actually defaults to 1 starting from HSW, but let's
keep the write around anyway.
Also drop the INSTPM_FORCE_ORDERING setup entirely on gen9+ since it's
now a reserved bit.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
This commit makes sure that on process termination, after
we're destroying all the active queues, we're killing all the
existing wave front of the current process.
By doing this we're making sure that if any of the CUs were blocked
by infinite loop we're enforcing it to end the shader explicitly.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds three new interfaces to kfd2kgd interface file of radeon.
The interfaces are:
- Check if a specific VMID has a valid PASID mapping
- Retrieve the PASID which is mapped to a specific VMID
- Issue a VMID invalidation request to the ATC
Signed-off-by: Alexey Skidanov <Alexey.Skidanov@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
v2:
- rename get_dbgmgr_mutex to kfd_get_dbgmgr_mutex to namespace it
- change void* to uint64_t inside ioctl arguments
- use kmalloc instead of kzalloc because we use copy_from_user
immediately after it
Signed-off-by: Yair Shachar <yair.shachar@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
v2:
- rename get_dbgmgr_mutex to kfd_get_dbgmgr_mutex to namespace it
- change void* to uint64_t inside ioctl arguments
- use kmalloc instead of kzalloc because we use copy_from_user
immediately after it
Signed-off-by: Yair Shachar <yair.shachar@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The address watch operation gives the ability to specify watch points
which will generate a shader breakpoint, based on a specified single
address or range of addresses.
There is support for read/write/any access modes.
Signed-off-by: Yair Shachar <yair.shachar@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The wave control operation supports several command types executed upon
existing wave fronts that belong to the currently debugged process.
The available commands are:
HALT - Freeze wave front(s) execution
RESUME - Resume freezed wave front(s) execution
KILL - Kill existing wave front(s)
Signed-off-by: Yair Shachar <yair.shachar@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds the skeleton H/W debugger module support. This code
enables registration and unregistration of a single HSA process at a
time.
The module saves the process's pasid and use it to verify that only the
registered process is allowed to execute debugger operations through the
kernel driver.
v2: rename get_dbgmgr_mutex to kfd_get_dbgmgr_mutex to namespace it
Signed-off-by: Yair Shachar <yair.shachar@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds support for static user-mode queues in QCM.
Queues which are designated as static can NOT be preempted by
the CP microcode when it is executing its scheduling algorithm.
This is needed for supporting the debugger feature, because we
can't allow the CP to preempt queues which are currently being debugged.
The number of queues that can be designated as static is limited by the
number of HQDs (Hardware Queue Descriptors).
Signed-off-by: Yair Shachar <yair.shachar@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds four new IOCTLs to amdkfd. These IOCTLs expose a H/W
debugger functionality to the userspace.
The IOCTLs are:
- AMDKFD_IOC_DBG_REGISTER:
The purpose of this IOCTL is to notify amdkfd that a process wants to use
GPU debugging facilities on itself only.
It is expected that this IOCTL would be called before any other H/W
debugger requests are sent to amdkfd and for each GPU where the H/W
debugging needs to be enabled. The use of this IOCTL ensures that only
one instance of a debugger is active in the system.
- AMDKFD_IOC_DBG_UNREGISTER:
This IOCTL detaches the debugger/debugged process from the H/W
Debug which was established by the AMDKFD_IOC_DBG_REGISTER IOCTL.
- AMDKFD_IOC_DBG_ADDRESS_WATCH:
This IOCTL allows to set different watchpoints with various conditions as
indicated by the IOCTL's arguments. The available number of watchpoints
is retrieved from topology. This operation is confined to the current
debugged process, which was registered through AMDKFD_IOC_DBG_REGISTER.
- AMDKFD_IOC_DBG_WAVE_CONTROL:
This IOCTL allows to control a wavefront as indicated by the IOCTL's
arguments. For example, you can halt/resume or kill either a
single wavefront or a set of wavefronts. This operation is confined to
the current debugged process, which was registered through
AMDKFD_IOC_DBG_REGISTER.
Because the arguments for the address watch IOCTL and wave control IOCTL
are dynamic, meaning that they could vary in size, the userspace passes a
pointer to a structure (in userspace) that contains the value of the
arguments. The kernel driver is responsible to parse this structure and
validate its contents.
v2: change void* to uint64_t inside ioctl arguments
Signed-off-by: Yair Shachar <yair.shachar@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds new interface functions to the kfd2kgd interface file. The
new functions allow to perform H/W debugger operations by writing to GPU
registers.
Signed-off-by: Yair Shachar <yair.shachar@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Use the generic mechanism to declare a bitmap instead of unsigned long.
It seems that "struct kfd_process.allocated_queue_bitmap" is unused.
Maybe it could be deleted instead.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
BXT supports following intermediate link rates for edp:
2.16GHz, 2.43GHz, 3.24GHz, 4.32GHz.
Adding support for programming the intermediate rates.
v2: Adding clock in bxt_clk_div struct and then look for the entry with
required rate (Ville)
v3: 'clock' has the selected value, no need to use link_bw or rate_select
for selecting pll(Ville)
v4: Make bxt_dp_clk_val const and remove size (Ville)
v5: Rebased
v6: Removed setting of vco while rebasing in v5, adding it back
Signed-off-by: Sonika Jindal <sonika.jindal@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v4)
Reviewed-by: Vandana Kannan <vandana.kannan@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Fixes for 4.2. Nothing too serious (given that it's still pre merge
window). With that it's off for 2 weeks of vacation for me and taking care
of 4.2 fixes for Jani.
* tag 'drm-intel-next-fixes-2015-05-29' of git://anongit.freedesktop.org/drm-intel:
drm/i915: limit PPGTT size to 2GB in 32-bit platforms
drm/i915: Another fbdev hack to avoid PSR on fbcon.
drm/i915: Return the frontbuffer flip to enable intel_crtc_enable_planes.
drm/i915: disable IPS while getting the sink CRCs
drm/i915: Disable 12bpc hdmi for now
drm/i915: Adjust sideband locking a bit for CHV/VLV
drm/i915: s/dpio_lock/sb_lock/
drm/i915: Kill intel_flush_primary_plane()
drm/i915: Throw out WIP CHV power well definitions
drm/i915: Use the default 600ns LDO programming sequence delay
drm/i915: Remove unnecessary null check in execlists_context_unqueue
drm/i915: Use spinlocks for checking when to waitboost
drm/i915: Fix the confusing comment about the ioctl limits
Revert "drm/i915: Force clean compilation with -Werror"
dma_alloc_coherent() can return memory in the vmalloc range.
virt_to_page() cannot handle such addresses and crashes. This
patch detects such cases and obtains the struct page * using
vmalloc_to_page() instead.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Apparently we can have requests even if though the active list is empty,
so do the request retirement regardless of whether there's anything
on the active list.
The way it happened here is that during suspend intel_ring_idle()
notices the olr hanging around and then proceeds to get rid of it by
adding a request. However since there was nothing on the active lists
i915_gem_retire_requests() didn't clean those up, and so the idle work
never runs, and we leave the GPU "busy" during suspend resulting in a
WARN later.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
According to the HSW b-spec we need to try clock divisors of 63
and 72, each 3 or more times, when attempting DP AUX channel
communication on a server chipset. This actually wasn't happening
due to a short-circuit that only checked the DP_AUX_CH_CTL_DONE bit
in status rather than checking that the operation was done and
that DP_AUX_CH_CTL_TIME_OUT_ERROR was not set.
[v2] Implemented alternate solution suggested by Jani Nikula.
Cc: stable@vger.kernel.org
Signed-off-by: Jim Bride <jim.bride@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
We already set this limit for the GGTT.
This is a temporary patch until a full replacement of size_t variables
(inadequate in 32-bit kernel) is in place.
Regression from:
commit a4e0bedca6
Author: Michel Thierry <michel.thierry@intel.com>
Date: Wed Apr 8 12:13:35 2015 +0100
drm/i915: Use complete address space in true PPGTT
v2: Prettify code and explain why this is needed. (Chris)
v3: Don't hide the compilation warning in 32-bit. (Chris)
Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Silence the following -Wmaybe-uninitialized warnings and make the code
more clear.
drivers/gpu/drm/i915/intel_display.c: In function ‘__intel_set_mode’:
drivers/gpu/drm/i915/intel_display.c:11844:14: warning: ‘crtc_state’ may be used uninitialized in this function [-Wmaybe-uninitialized]
return state->mode_changed || state->active_changed;
^
drivers/gpu/drm/i915/intel_display.c:11854:25: note: ‘crtc_state’ was declared here
struct drm_crtc_state *crtc_state;
^
drivers/gpu/drm/i915/intel_display.c:11868:6: warning: ‘crtc’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (crtc != intel_encoder->base.crtc)
^
drivers/gpu/drm/i915/intel_display.c:11853:19: note: ‘crtc’ was declared here
struct drm_crtc *crtc;
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With unified modeset and flip paths introduced recently when switching
to fbcon PSR was being disabled on fb_set_par path but re-enabled on
fb_pan_display one, causing missed screen updates and un unusable
console.
Regression introduced with:
commit bb54662350
Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Date: Tue Apr 21 17:13:13 2015 +0300
drm/i915: Unify modeset and flip paths of intel_crtc_set_config()
Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Without this frontbuffer flip when enabling planes PSR got compromised
and wasn't being enabled waiting forever on the flush that never
arrived.
Another solution would to create a enable_cursor function and split this
frontbuffer flip among the different plane enable and disable functions.
But if necessary this can be done in a follow up work. For now let's
just fix the regression.
It was removed by:
commit 87d4300a7d
Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date: Tue Apr 21 17:12:54 2015 +0300
drm/i915: Move intel_(pre_disable/post_enable)_primary to intel_display.c, and use it there.
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The hotplug status is cached in hp_control, and will be passed on to
bottom halves through intel_hpd_irq_handler(), so we can clear the
sticky bits earlier.
While at it, drop the redundant logging of the hotplug status, which
will also be logged by pch_get_hpd_pins().
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Split intel_hpd_irq_handler into platforms specific and platform
agnostic parts. The platform specific parts decode the registers into
information about which hpd pins triggered, and if they were long
pulses. The platform agnostic parts do further processing, such as
interrupt storm mitigation and scheduling bottom halves.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Nothing in the two consecutive loops over hpd pins depends on state in a
larger context than the single hpd pin. If we skip the rest of the loop
on short hpd pulses, we can merge the two loops into one.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In an unfortunate back and forth stepping, retract the earlier change to
reduce indent. This is to make merging the two loops easier. No
functional changes.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Multiple positive and negative checks for hpd[i] & hotplug_trigger gets
hard to read. Simplify. This should make follow-up patches merging the
two loops easier. No functional changes.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Print a warning if we fall through the .get_display_clock_speed() function
pointer setup. We end up assuming a 133MHz cdclk which should mean that
at least we avoid any 0 deivisions and whatnot. But this could at least
help remind people that they have to provide this function for new platforms.
v2: Rebased to the latest
v3: Rebased to the latest
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1)
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Implement cdclk extraction for g33, 965gm and g4x platforms. The details
came from configdb. Sadly there isn't anything there for other gen3/gen4
chipsets.
So far I've tested this on one ELK where it gave me a HPLL VCO of 5333
MHz and cdclk of 444 MHz which seems perfectly sane for this machine.
v2: Rebased to the latest
v3: Rebased to the latest
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Acked-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It seems 852GM/GMV uses a different HPLLCC encoding than the other
85x platforms. For 852GM/GMV cdclk is always 133MHz. Try to detect that
using the PCI revision (sinc the device ID seems useless for that). I'm
not at all sure this is a good idea, but according to the specs it
should work.
v2: Rebased to the latest
v3: Rebased to the latest
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1)
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Acked-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Actually read the HPLLCC register insted of assuming it's 0. Fix the
HPLLCC bit definitions and all the missing ones from the 852GME spec.
852GME, 854 and 855 all seem to match the same HPLLC encoding even
though only some of the values are valid is some of the platforms.
v2: Rebased to the latest
v3: Rebased to the latest
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1)
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Acked-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The orignal code started by storing the actual central frequency (in Hz,
using a uint64_t) in a uint32_t which codes for the register value. That
can't be right.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Those functions were the only one in existence when they were
introduced. We now know they are only valid for HSW/BDW.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
div_u64() can be either a inline function or a define, but in either
case it's safe to provide expressions as parameters without outer ()
around them.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This part doesn't depend on how we compute the DPLL dividers (p and
p0/p1/p2) and can be reused even if we change the algorithm to do so.
(something that is planned for a followup patch)
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We can coalesce the WARN() condition with the WARN() itself and, as we
are returning early, we can de-intent the rest of the function.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
At the moment, even if we fail to find a suitable divider, we'll still
try to set the mode with bogus parameters.
Just fail the modeset if we can't generate the frequency.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>