Commit Graph

18 Commits

Author SHA1 Message Date
Karthik Poosa
49a4983384 drm/xe/hwmon: Expose individual VRAM channel temperature
Expose individual VRAM temperature attributes.
Update Xe hwmon documentation for this entry.

v2:
 - Avoid using default switch case for VRAM individual temperatures.
 - Append labels with VRAM channel number.
 - Update kernel version in Xe hwmon documentation.

v3:
 - Add missing brackets in Xe hwmon documentation from VRAM channel sysfs.
 - Reorder BMG_VRAM_TEMPERATURE_N macro in xe_pcode_regs.h.
 - Add api to check if VRAM is available on the channel.

v4:
 - Improve VRAM label handling to eliminate temp variable by
   introducing a dedicated array vram_label in xe_hwmon_thermal_info.
 - Remove a magic number.
 - Change the label from vram_X to vram_ch_X.

v5:
 - Address review comments from Raag.
 - Change vram to VRAM in commit title and subject.
 - Refactor BMG_VRAM_TEMPERATURE_N macro.
 - Refactor is_vram_ch_available().
 - Rephrase a comment.
 - Check individual VRAM temperature limits in addition to VRAM
   availability in xe_hwmon_temp_is_visible. (Raag)
 - Move VRAM label change out of this patch.

v6:
 - Use in_range() for VRAM_N index check instead of if check. (Raag)
 - Minor aesthetic changes.

Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patch.msgid.link/20260112203521.1014388-5-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-12 17:00:29 -05:00
Karthik Poosa
8d2511686e drm/xe/hwmon: Expose GPU PCIe temperature
Expose GPU PCIe average temperature and its limits via hwmon sysfs entry
temp5_xxx.
Update Xe hwmon sysfs documentation for this.

v2: Update kernel version in Xe hwmon documentation. (Raag)

v3:
 - Address review comments from Raag.
 - Remove redundant debug log.
 - Update kernel version in Xe hwmon documentation. (Raag)

v4:
 - Address review comments from Raag.
 - Group new temperature attributes with existing temperature attributes
   as per channel index in Xe hwmon documentation.
 - Use TEMP_MASK instead of TEMP_MASK_MAILBOX.
 - Add PCIE_SENSOR_MASK which uses REG_FIELD_GET as replacement of
   PCIE_SENSOR_SHIFT.

v5:
 - Address review comments from Raag.
 - Use REG_FIELD_GET to get PCIe temperature.
 - Move PCIE_SENSOR_GROUP_ID and PCIE_SENSOR_MASK to xe_pcode_api.h
 - Cosmetic change.

Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patch.msgid.link/20260112203521.1014388-4-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-12 17:00:29 -05:00
Karthik Poosa
3a0cb885e1 drm/xe/hwmon: Expose memory controller temperature
Expose GPU memory controller average temperature and its limits under
temp4_xxx.
Update Xe hwmon documentation for this.

v2:
 - Rephrase commit message. (Badal)
 - Update kernel version in Xe hwmon documentation. (Raag)

v3:
 - Update kernel version in Xe hwmon documentation.
 - Address review comments from Raag.
 - Remove obvious comments.
 - Remove redundant debug logs.
 - Remove unnecessary checks.
 - Avoid magic numbers.
 - Add new comments.
 - Use temperature sensors count to make memory controller visible.
 - Use temperature limits of package for memory controller.

v4:
 - Address review comments from Raag.
 - Group new temperature attributes with existing temperature attributes
   as per channel index in Xe hwmon documentation.
 - Use DIV_ROUND_UP to calculate dwords needed for temperature limits.
 - Minor aesthetic refinements.
 - Remove unused TEMP_MASK_MAILBOX.

v5:
 - Use REG_FIELD_GET to get count from READ_THERMAL_DATA output. (Raag)
 - Change count print from decimal to hexadecimal.
 - Cosmetic changes.

Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patch.msgid.link/20260112203521.1014388-3-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-12 17:00:29 -05:00
Karthik Poosa
c332fba805 drm/xe/hwmon: Expose temperature limits
Read temperature limits using pcode mailbox and expose shutdown
temperature limit as tempX_emergency, critical temperature limit as
tempX_crit and GPU max temperature limit as temp2_max.

Update Xe hwmon documentation with above entries.

v2:
 - Resolve a documentation warning.
 - Address below review comments from Raag.
 - Update date and kernel version in Xe hwmon documentation.
 - Remove explicit disable of has_mbx_thermal_info for unsupported
   platforms.
 - Remove unnecessary default case in switches.
 - Remove obvious comments.
 - Use TEMP_LIMIT_MAX to compute number of dwords needed in
   xe_hwmon_thermal_info.
 - Remove THERMAL_LIMITS_DWORDS macro.
 - Use has_mbx_thermal_info for checking thermal mailbox support.

v3:
 - Address below minor comments. (Raag)
 - Group new temperature attributes with existing temperature attributes
   as per channel index in Xe hwmon documentation.
 - Rename enums of xe_temp_limit to improve clarity.
 - Use DIV_ROUND_UP to calculate dwords needed for temperature limits.
 - Use return instead of breaks in xe_hwmon_temp_read.
 - Minor aesthetic refinements.

v4:
 - Remove a redundant break. (Raag)
 - Update drm_dbg to drm_warn to inform user of unavailability for
   thermal mailbox on expected platforms.

Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patch.msgid.link/20260112203521.1014388-2-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-12 17:00:29 -05:00
Karthik Poosa
719d8a5959 drm/xe/hwmon: Expose powerX_cap_interval
Expose powerX_cap_interval to manage burst power limit time window.

Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250529163458.2354509-5-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-05-30 11:30:01 -04:00
Karthik Poosa
c713b9a23c drm/xe/hwmon: Add support to manage PL2 though mailbox
Add support to manage power limit PL2 (burst limit) through
pcode mailbox commands.

v2:
 - Update power1_cap definition in hwmon documentation. (Badal)
 - Clamp PL2 power limit to GPU firmware default value.

v3:
 - Activate the power label when either the PL1 or PL2 power
   limit is enabled.

v4:
 - Update description of pl2_on_boot variable to fix kernel-doc
   error.

v5:
 - Remove unnecessary drm_warn.
 - Rectify powerX_label permission to read-only on platforms
   without mailbox power limits support.
 - Expose powerX_cap entries only on platforms with mailbox
   support.

v6:
 - Improve commit message, refer to BIOS as GPU firmware.
 - Refer to card firmware as GPU firmware in code.
 - Remove unnecessary drm_dbg.
 - Print supported and unsupported power limits. (Rodrigo)
 - Enable powerN_cap/max_xxx entries only when power limits
   supported in GPU firmware.

Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250529163458.2354509-4-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-05-30 11:30:01 -04:00
Karthik Poosa
25e963a09e drm/xe/hwmon: Move card reactive critical power under channel card
Move power2/curr2_crit to channel 1 i.e power1/curr1_crit as this
represents the entire card critical power/current.

v2: Update the date of curr1_crit also in hwmon documentation.

Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Fixes: 345dadc4f6 ("drm/xe/hwmon: Add infra to support card power and energy attributes")
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250529163458.2354509-3-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-05-30 11:30:01 -04:00
Lucas De Marchi
f9e4d8bb6a drm/xe/hwmon: Fix kernel version documentation for fan speed
The version in the sysfs attribute should correspond to the version in
which this is enabled and visible for end users. It usually doesn't
correspond to the version in which the patch was developed, but rather a
release that will contain it. Update them to 6.16.

Fixes: 28f79ac609 ("drm/xe/hwmon: expose fan speed")
Reported-by: Ulisses Furquim <ulisses.furquim@intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4841
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250421-hwmon-doc-fix-v1-2-9f68db702249@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-24 08:49:08 -07:00
Lucas De Marchi
8500393a8e drm/xe/hwmon: Fix kernel version documentation for temperature
The version in the sysfs attribute should correspond to the version in
which this is enabled and visible for end users. It usually doesn't
correspond to the version in which the patch was developed, but rather a
release that will contain it. Update them to 6.15.

Fixes: dac328dea7 ("drm/xe/hwmon: expose package and vram temperature")
Reported-by: Ulisses Furquim <ulisses.furquim@intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4840
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250421-hwmon-doc-fix-v1-1-9f68db702249@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-24 08:49:08 -07:00
Raag Jadav
28f79ac609 drm/xe/hwmon: expose fan speed
Add hwmon support for fan1_input, fan2_input and fan3_input attributes,
which will expose fan speed of respective channels in RPM when supported
by hardware. With this in place we can monitor fan speed using lm-sensors
tool.

v2: Rely on platform checks instead of mailbox error (Aravind, Rodrigo)
v3: Introduce has_fan_control flag (Rodrigo)

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250312085909.755073-1-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-14 14:08:44 -04:00
Raag Jadav
dac328dea7 drm/xe/hwmon: expose package and vram temperature
Add hwmon support for temp2_input and temp3_input attributes, which will
expose package and vram temperature in millidegree Celsius. With this in
place we can monitor temperature using lm-sensors tool.

v2: Reuse existing channels (Badal, Karthik)

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250131054502.1528555-1-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-02-05 08:44:27 -05:00
Karthik Poosa
345dadc4f6 drm/xe/hwmon: Add infra to support card power and energy attributes
Add infra to support card power and energy attributes through channel 0.
Package attributes will be now exposed through channel 1 rather than
channel 0 as shown below.

Channel 0 i.e power1/energy1_xxx used for card and
channel 1 i.e power2/energy2_xxx used for package power,energy attributes.

power1/curr1_crit and in0_input are moved to channel 1, i.e.
power2/curr2_crit and in1_input as these are available for package only.

This would be needed for future platforms where they might be
separate registers for package and card power and energy.

Each discrete GPU supported by Xe driver, would have a directory in
/sys/class/hwmon/ with multiple channels under it.
Each channel would have attributes for power, energy etc.

Ex: /sys/class/hwmon/hwmon2/power1_max
                           /power1_label
                           /energy1_input
                           /energy1_label

Attributes will have a label to get more description of it.
Labelling is as below.
		power1_label/energy1_label - "card",
		power2_label/energy2_label - "pkg".

v2: Fix checkpatch errors.

v3:
 - Update intel-xe-hwmon documentation. (Riana, Badal)
 - Rename hwmon card channel enum from CHANNEL_PLATFORM
   to CHANNEL_CARD. (Riana)

v4:
 - Remove unrelated changes from patch. (Anshuman)
 - Fix typo in commit msg.

v5:
 - Update commit message and intel-xe-hwmon documentation with "Xe"
   instead of xe when using it as a name. (Rodrigo)

Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240328175435.3870957-1-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-29 11:27:21 -04:00
Badal Nilawar
20485e3a81 drm/hwmon: Fix abi doc warnings
This fixes warnings in xe, i915 hwmon docs:

Warning: /sys/devices/.../hwmon/hwmon<i>/curr1_crit is defined 2 times:  Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:35  Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:52
Warning: /sys/devices/.../hwmon/hwmon<i>/energy1_input is defined 2 times:  Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:54  Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:65
Warning: /sys/devices/.../hwmon/hwmon<i>/in0_input is defined 2 times:  Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:46  Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:0
Warning: /sys/devices/.../hwmon/hwmon<i>/power1_crit is defined 2 times:  Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:22  Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:39
Warning: /sys/devices/.../hwmon/hwmon<i>/power1_max is defined 2 times:  Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:0  Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:8
Warning: /sys/devices/.../hwmon/hwmon<i>/power1_max_interval is defined 2 times:  Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:62  Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:30
Warning: /sys/devices/.../hwmon/hwmon<i>/power1_rated_max is defined 2 times:  Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:14  Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:22

Use a path containing the driver name to differentiate the documentation
of each entry.

Fixes: fb1b70607f ("drm/xe/hwmon: Expose power attributes")
Fixes: 92d44a422d ("drm/xe/hwmon: Expose card reactive critical power")
Fixes: fbcdc9d3bf ("drm/xe/hwmon: Expose input voltage attribute")
Fixes: 71d0a32524 ("drm/xe/hwmon: Expose hwmon energy attribute")
Fixes: 4446fcf220 ("drm/xe/hwmon: Expose power1_max_interval")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/all/20240125113345.291118ff@canb.auug.org.au/
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240127165040.2348009-1-badal.nilawar@intel.com
2024-01-29 20:41:10 -08:00
Badal Nilawar
4446fcf220 drm/xe/hwmon: Expose power1_max_interval
Expose power1_max_interval, that is the tau corresponding to PL1, as a
custom hwmon attribute. Some bit manipulation is needed because of the
format of PKG_PWR_LIM_1_TIME in
PACKAGE_RAPL_LIMIT register (1.x * power(2,y))

v2: Get rpm wake ref while accessing power1_max_interval
v3: %s/hwmon/xe_hwmon/
v4:
 - As power1_max_interval is rw attr take lock in read function as well
 - Refine comment about val to fix point conversion (Andi)
 - Update kernel version and date in doc
v5: Fix review comments (Anshuman)

Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231030115618.1382200-4-badal.nilawar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:43:32 -05:00
Badal Nilawar
71d0a32524 drm/xe/hwmon: Expose hwmon energy attribute
Expose hwmon energy attribute to show device level energy usage

v2:
  - %s/hwm_/hwmon_/
  - Convert enums to upper case
v3:
  - %s/hwmon_/xe_hwmon
  - Remove gt specific hwmon attributes
v4:
 - %s/REG_PKG_ENERGY_STATUS/REG_ENERGY_STATUS_ALL (Riana)
 - %s/hwmon_energy_info/xe_hwmon_energy_info (Riana)

Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20230925081842.3566834-5-badal.nilawar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:42:08 -05:00
Badal Nilawar
fbcdc9d3bf drm/xe/hwmon: Expose input voltage attribute
Use Xe HWMON subsystem to display the input voltage.

v2:
  - Rename hwm_get_vltg to hwm_get_voltage (Riana)
  - Use scale factor SF_VOLTAGE (Riana)
v3:
  - %s/gt_perf_status/REG_GT_PERF_STATUS/
  - Remove platform check from hwmon_get_voltage()
v4:
  - Fix review comments (Andi)

Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20230925081842.3566834-4-badal.nilawar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:42:08 -05:00
Badal Nilawar
92d44a422d drm/xe/hwmon: Expose card reactive critical power
Expose the card reactive critical (I1) power. I1 is exposed as
power1_crit in microwatts (typically for client products) or as
curr1_crit in milliamperes (typically for server).

v2: Move PCODE_MBOX macro to pcode file (Riana)
v3: s/IS_DG2/(gt_to_xe(gt)->info.platform == XE_DG2)
v4: Fix review comments (Andi)

Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20230925081842.3566834-3-badal.nilawar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:42:08 -05:00
Badal Nilawar
fb1b70607f drm/xe/hwmon: Expose power attributes
Expose Card reactive sustained (pl1) power limit as power_max and
card default power limit (tdp) as power_rated_max.

v2:
  - Fix review comments (Riana)
v3:
  - Use drmm_mutex_init (Matt Brost)
  - Print error value (Matt Brost)
  - Convert enums to uppercase (Matt Brost)
  - Avoid extra reg read in hwmon_is_visible function (Riana)
  - Use xe_device_assert_mem_access when applicable (Matt Brost)
  - Add intel-xe@lists.freedesktop.org in Documentation (Matt Brost)
v4:
  - Use prefix xe_hwmon prefix for all functions (Matt Brost/Andi)
  - %s/hwmon_reg/xe_hwmon_reg (Andi)
  - Fix review comments (Guenter/Andi)
v5:
  - Fix review comments (Riana)
v6:
  - Use drm_warn in default case (Rodrigo)
  - s/ENODEV/EOPNOTSUPP (Andi)

Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20230925081842.3566834-2-badal.nilawar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:42:08 -05:00