H_HTM (Hardware Trace Macro) hypervisor call is an HCALL to export data
from Hardware Trace Macro (HTM) function. The debugfs interface to
export the HTM function data in an lpar currently supports only dumping
of HTM data in an lpar. To add support for setup, configuration and
control of HTM function via debugfs interface, update the hcall wrapper
function. Rename and update htm_get_dump_hardware to htm_hcall_wrapper()
so that it can be used for other HTM operations as well. Additionally
include parameter "htm_op". Update htmdump module to check the return
code of hcall in a separate function so that it can be reused for other
option too. Add check to disable the interface in guest environment.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250420180844.53128-2-atrajeev@linux.ibm.com
The Fixes commit below tried to add CONFIG_PPC_BOOK3S to one of the
conditions to enable the build of ppc_save_regs.o. But it failed to do
so, in fact. The commit omitted to add a dollar sign.
Therefore, ppc_save_regs.o is built always these days (as
"(CONFIG_PPC_BOOK3S)" is never an empty string).
Fix this by adding the missing dollar sign.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Fixes: fc2a5a6161 ("powerpc/64s: ppc_save_regs is now needed for all 64s builds")
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250417105305.397128-1-jirislaby@kernel.org
When a system is being suspended to RAM, the PCI devices are also
suspended and the PPC code ends up calling pseries_msi_compose_msg() and
this triggers the BUG_ON() in __pci_read_msi_msg() because the device at
this point is in reduced power state. In reduced power state, the memory
mapped registers of the PCI device are not accessible.
To replicate the bug:
1. Make sure deep sleep is selected
# cat /sys/power/mem_sleep
s2idle [deep]
2. Make sure console is not suspended (so that dmesg logs are visible)
echo N > /sys/module/printk/parameters/console_suspend
3. Suspend the system
echo mem > /sys/power/state
To fix this behaviour, read the cached msi message of the device when the
device is not in PCI_D0 power state instead of touching the hardware.
Fixes: a5f3d2c17b ("powerpc/pseries/pci: Add MSI domains")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250305090237.294633-1-gautam@linux.ibm.com
arch_bpf_trampoline_size() provides JIT size of the BPF trampoline
before the buffer for JIT'ing it is allocated. The total number of
instructions emitted for BPF trampoline JIT code depends on where
the final image is located. So, the size arrived at with the dummy
pass in arch_bpf_trampoline_size() can vary from the actual size
needed in arch_prepare_bpf_trampoline(). When the instructions
accounted in arch_bpf_trampoline_size() is less than the number of
instructions emitted during the actual JIT compile of the trampoline,
the below warning is produced:
WARNING: CPU: 8 PID: 204190 at arch/powerpc/net/bpf_jit_comp.c:981 __arch_prepare_bpf_trampoline.isra.0+0xd2c/0xdcc
which is:
/* Make sure the trampoline generation logic doesn't overflow */
if (image && WARN_ON_ONCE(&image[ctx->idx] >
(u32 *)rw_image_end - BPF_INSN_SAFETY)) {
So, during the dummy pass, instead of providing some arbitrary image
location, account for maximum possible instructions if and when there
is a dependency with image location for JIT'ing.
Fixes: d243b62b7b ("powerpc64/bpf: Add support for bpf trampolines")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/all/6168bfc8-659f-4b5a-a6fb-90a916dde3b3@linux.ibm.com/
Cc: stable@vger.kernel.org # v6.13+
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250422082609.949301-1-hbathini@linux.ibm.com
The use of the term "glue" in filenames is a Crypto API-ism that rarely
shows up elsewhere in lib/ or arch/*/lib/. I think adopting it there
was a mistake. The library just uses standard functions, so the amount
of code that could be considered "glue" is quite small. And while often
the C functions just wrap the assembly functions, there are also cases
like crc32c_arch() in arch/x86/lib/crc32-glue.c that blur the line by
in-lining the actual implementation into the C function. That's not
"glue code", but rather the actual code.
Therefore, let's drop "glue" from the filenames and instead use e.g.
crc32.c instead of crc32-glue.c.
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250424002038.179114-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Now that the architecture-optimized Poly1305 kconfig symbols are defined
regardless of CRYPTO, there is no need for CRYPTO_LIB_POLY1305 to select
CRYPTO. So, remove that. This makes the indirection through the
CRYPTO_LIB_POLY1305_INTERNAL symbol unnecessary, so get rid of that and
just use CRYPTO_LIB_POLY1305 directly. Finally, make the fallback to
the generic implementation use a default value instead of a select; this
makes it consistent with how the arch-optimized code gets enabled and
also with how CRYPTO_LIB_BLAKE2S_GENERIC gets enabled.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that the architecture-optimized ChaCha kconfig symbols are defined
regardless of CRYPTO, there is no need for CRYPTO_LIB_CHACHA to select
CRYPTO. So, remove that. This makes the indirection through the
CRYPTO_LIB_CHACHA_INTERNAL symbol unnecessary, so get rid of that and
just use CRYPTO_LIB_CHACHA directly. Finally, make the fallback to the
generic implementation use a default value instead of a select; this
makes it consistent with how the arch-optimized code gets enabled and
also with how CRYPTO_LIB_BLAKE2S_GENERIC gets enabled.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Continue disentangling the crypto library functions from the generic
crypto infrastructure by moving the powerpc ChaCha and Poly1305 library
functions into a new directory arch/powerpc/lib/crypto/ that does not
depend on CRYPTO. This mirrors the distinction between crypto/ and
lib/crypto/.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
arch/powerpc/crypto/Kconfig is sourced only when CONFIG_PPC=y, so there
is no need for the symbols defined inside it to depend on PPC.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
'commit b2accfe7ca ("powerpc/boot: Check for ld-option support")' suppressed
linker warnings, but the expressed used did not go well with POSIX shell (dash)
resulting with this warning
arch/powerpc/boot/wrapper: 237: [: 0: unexpected operator
ld: warning: arch/powerpc/boot/zImage.epapr has a LOAD segment with RWX permissions
Fix the check to handle the reported warning. Patch also fixes
couple of shellcheck reported errors for the same line.
In arch/powerpc/boot/wrapper line 237:
if [ $(${CROSS}ld -v --no-warn-rwx-segments &>/dev/null; echo $?) -eq 0 ]; then
^-- SC2046 (warning): Quote this to prevent word splitting.
^------^ SC2086 (info): Double quote to prevent globbing and word splitting.
^---------^ SC3020 (warning): In POSIX sh, &> is undefined.
Fixes: b2accfe7ca ("powerpc/boot: Check for ld-option support")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250423082154.30625-1-maddy@linux.ibm.com
Though the module_exit functions are now no-ops, they should still be
defined, since otherwise the modules become unremovable.
Fixes: 1f81c58279 ("crypto: arm/poly1305 - remove redundant shash algorithm")
Fixes: f4b1a73aec ("crypto: arm64/poly1305 - remove redundant shash algorithm")
Fixes: 378a337ab4 ("crypto: powerpc/poly1305 - implement library instead of shash")
Fixes: 21969da642 ("crypto: x86/poly1305 - remove redundant shash algorithm")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Though the module_exit functions are now no-ops, they should still be
defined, since otherwise the modules become unremovable.
Fixes: 08820553f3 ("crypto: arm/chacha - remove the redundant skcipher algorithms")
Fixes: 8c28abede1 ("crypto: arm64/chacha - remove the skcipher algorithms")
Fixes: f7915484c0 ("crypto: powerpc/chacha - remove the skcipher algorithms")
Fixes: ceba0eda83 ("crypto: riscv/chacha - implement library instead of skcipher")
Fixes: 632ab0978f ("crypto: x86/chacha - remove the skcipher algorithms")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The RTAS call ibm,physical-attestation is used to retrieve
information about the trusted boot state of the firmware and
hypervisor on the system, and also Trusted Platform Modules (TPM)
data if the system is TCG 2.0 compliant.
This RTAS interface expects the caller to define different command
structs such as RetrieveTPMLog, RetrievePlatformCertificat and etc,
in a work area with a maximum size of 4K bytes and the response
buffer will be returned in the same work area.
The current implementation of this RTAS function is in the user
space but allocation of the work area is restricted with the system
lockdown. So this patch implements this RTAS function in the kernel
and expose to the user space with open/ioctl/read interfaces.
PAPR (2.13+ 21.3 ibm,physical-attestation) defines RTAS function:
- Pass the command struct to obtain the response buffer for the
specific command.
- This RTAS function is sequence RTAS call and has to issue RTAS
call multiple times to get the complete response buffer (max 64K).
The hypervisor expects the first RTAS call with the sequence 1 and
the subsequent calls with the sequence number returned from the
previous calls.
Expose these interfaces to user space with a
/dev/papr-physical-attestation character device using the following
programming model:
int devfd = open("/dev/papr-physical-attestation");
int fd = ioctl(devfd, PAPR_PHY_ATTEST_IOC_HANDLE,
struct papr_phy_attest_io_block);
- The user space defines the command struct and requests the
response for any command.
- Obtain the complete response buffer and returned the buffer as
blob to the command specific FD.
size = read(fd, buf, len);
- Can retrieve the response buffer once or multiple times until the
end of BLOB buffer.
Implemented this new kernel ABI support in librtas library for
system lockdown
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416225743.596462-8-haren@linux.ibm.com
ibm,platform-dump RTAS call in combination with writable mapping
/dev/mem is issued to collect platform dump from the hypervisor
and may need multiple calls to get the complete dump. The current
implementation uses rtas_platform_dump() API provided by librtas
library to issue these RTAS calls. But /dev/mem access by the
user space is prohibited under system lockdown.
The solution should be to restrict access to RTAS function in user
space and provide kernel interfaces to collect dump. This patch
adds papr-platform-dump character driver and expose standard
interfaces such as open / ioctl/ read to user space in ways that
are compatible with lockdown.
PAPR (7.3.3.4.1 ibm,platform-dump) provides a method to obtain
the complete dump:
- Each dump will be identified by ID called dump tag.
- A sequence of RTAS calls have to be issued until retrieve the
complete dump. The hypervisor expects the first RTAS call with
the sequence 0 and the subsequent calls with the sequence
number returned from the previous calls.
- The hypervisor returns "dump complete" status once the complete
dump is retrieved. But expects one more RTAS call from the
partition with the NULL buffer to invalidate dump which means
the dump will be removed in the hypervisor.
- Sequence of calls are allowed with different dump IDs at the
same time but not with the same dump ID.
Expose these interfaces to user space with a /dev/papr-platform-dump
character device using the following programming model:
int devfd = open("/dev/papr-platform-dump", O_RDONLY);
int fd = ioctl(devfd,PAPR_PLATFORM_DUMP_IOC_CREATE_HANDLE, &dump_id)
- Restrict user space to access with the same dump ID.
Typically we do not expect user space requests the dump
again for the same dump ID.
char *buf = malloc(size);
length = read(fd, buf, size);
- size should be minimum 1K based on PAPR and <= 4K based
on RTAS work area size. It will be restrict to RTAS work
area size. Using 4K work area based on the current
implementation in librtas library
- Each read call issue RTAS call to get the data based on
the size requirement and returns bytes returned from the
hypervisor
- If the previous call returns dump complete status, the
next read returns 0 like EOF.
ret = ioctl(PAPR_PLATFORM_DUMP_IOC_INVALIDATE, &dump_id)
- RTAS call with NULL buffer to invalidates the dump.
The read API should use the file descriptor obtained from ioctl
based on dump ID so that gets dump contents for the corresponding
dump ID. Implemented support in librtas (rtas_platform_dump()) for
this new ABI to support system lockdown.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416225743.596462-7-haren@linux.ibm.com
The RTAS call ibm,get-dynamic-sensor-state is used to get the
sensor state identified by the location code and the sensor
token. The librtas library provides an API
rtas_get_dynamic_sensor() which uses /dev/mem access for work
area allocation but is restricted under system lockdown.
This patch provides an interface with new ioctl
PAPR_DYNAMIC_SENSOR_IOC_GET to the papr-indices character
driver which executes this HCALL and copies the sensor state
in the user specified ioctl buffer.
Refer PAPR 7.3.19 ibm,get-dynamic-sensor-state for more
information on this RTAS call.
- User input parameters to the RTAS call: location code string
and the sensor token
Expose these interfaces to user space with a /dev/papr-indices
character device using the following programming model:
int fd = open("/dev/papr-indices", O_RDWR);
int ret = ioctl(fd, PAPR_DYNAMIC_SENSOR_IOC_GET,
struct papr_indices_io_block)
- The user space specifies input parameters in
papr_indices_io_block struct
- Returned state for the specified sensor is copied to
papr_indices_io_block.dynamic_param.state
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416225743.596462-6-haren@linux.ibm.com
The RTAS call ibm,set-dynamic-indicator is used to set the new
indicator state identified by a location code. The current
implementation uses rtas_set_dynamic_indicator() API provided by
librtas library which allocates RMO buffer and issue this RTAS
call in the user space. But /dev/mem access by the user space
is prohibited under system lockdown.
This patch provides an interface with new ioctl
PAPR_DYNAMIC_INDICATOR_IOC_SET to the papr-indices character
driver and expose this interface to the user space that is
compatible with lockdown.
Refer PAPR 7.3.18 ibm,set-dynamic-indicator for more
information on this RTAS call.
- User input parameters to the RTAS call: location code
string, indicator token and new state
Expose these interfaces to user space with a /dev/papr-indices
character device using the following programming model:
int fd = open("/dev/papr-indices", O_RDWR);
int ret = ioctl(fd, PAPR_DYNAMIC_INDICATOR_IOC_SET,
struct papr_indices_io_block)
- The user space passes input parameters in papr_indices_io_block
struct
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416225743.596462-5-haren@linux.ibm.com
The RTAS call ibm,get-indices is used to obtain indices and
location codes for a specified indicator or sensor token. The
current implementation uses rtas_get_indices() API provided by
librtas library which allocates RMO buffer and issue this RTAS
call in the user space. But writable mapping /dev/mem access by
the user space is prohibited under system lockdown.
To overcome the restricted access in the user space, the kernel
provide interfaces to collect indices data from the hypervisor.
This patch adds papr-indices character driver and expose standard
interfaces such as open / ioctl/ read to user space in ways that
are compatible with lockdown.
PAPR (2.13 7.3.17 ibm,get-indices RTAS Call) describes the
following steps to retrieve all indices data:
- User input parameters to the RTAS call: sensor or indicator,
and indice type
- ibm,get-indices is sequence RTAS call which means has to issue
multiple times to get the entire list of indicators or sensors
of a particular type. The hypervisor expects the first RTAS call
with the sequence 1 and the subsequent calls with the sequence
number returned from the previous calls.
- The OS may not interleave calls to ibm,get-indices for different
indicator or sensor types. Means other RTAS calls with different
type should not be issued while the previous type sequence is in
progress. So collect the entire list of indices and copied to
buffer BLOB during ioctl() and expose this buffer to the user
space with the file descriptor.
- The hypervisor fills the work area with a specific format but
does not return the number of bytes written to the buffer.
Instead of parsing the data for each call to determine the data
length, copy the work area size (RTAS_GET_INDICES_BUF_SIZE) to
the buffer. Return work-area size of data to the user space for
each read() call.
Expose these interfaces to user space with a /dev/papr-indices
character device using the following programming model:
int devfd = open("/dev/papr-indices", O_RDONLY);
int fd = ioctl(devfd, PAPR_INDICES_IOC_GET,
struct papr_indices_io_block)
- Collect all indices data for the specified token to the buffer
char *buf = malloc(RTAS_GET_INDICES_BUF_SIZE);
length = read(fd, buf, RTAS_GET_INDICES_BUF_SIZE)
- RTAS_GET_INDICES_BUF_SIZE of data is returned to the user
space.
- The user space retrieves the indices and their location codes
from the buffer
- Should issue multiple read() calls until reaches the end of
BLOB buffer.
The read() should use the file descriptor obtained from ioctl to
get the data that is exposed to file descriptor. Implemented
support in librtas (rtas_get_indices()) for this new ABI for
system lockdown.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416225743.596462-4-haren@linux.ibm.com
To issue ibm,get-indices, ibm,set-dynamic-indicator and
ibm,get-dynamic-sensor-state in the user space, the RMO buffer is
allocated for the work area which is restricted under system
lockdown. So instead of user space execution, the kernel will
provide /dev/papr-indices interface to execute these RTAS calls.
The user space assigns data in papr_indices_io_block struct
depends on the specific HCALL and passes to the following ioctls:
PAPR_INDICES_IOC_GET: Use for ibm,get-indices. Returns a
get-indices handle fd to read data.
PAPR_DYNAMIC_SENSOR_IOC_GET: Use for ibm,get-dynamic-sensor-state.
Updates the sensor state in
papr_indices_io_block.dynamic_param.state
PAPR_DYNAMIC_INDICATOR_IOC_SET: Use for ibm,set-dynamic-indicator.
Sets the new state for the input
indicator.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416225743.596462-3-haren@linux.ibm.com
The RTAS call can be normal where retrieves the data form the
hypervisor once or sequence based RTAS call which has to
issue multiple times until the complete data is obtained. For
some of these sequence RTAS calls, the OS should not interleave
calls with different input until the sequence is completed.
The data is collected for each call and copy to the buffer
for the entire sequence during ioctl() handle and then expose
this buffer to the user space with read() handle.
One such sequence RTAS call is ibm,get-vpd and its support is
already included in the current code. To add the similar support
for other sequence based calls, move the common functions in to
separate file and update papr_rtas_sequence struct with the
following callbacks so that RTAS call specific code will be
defined and executed to complete the sequence.
struct papr_rtas_sequence {
int error;
void params;
void (*begin) (struct papr_rtas_sequence *);
void (*end) (struct papr_rtas_sequence *);
const char * (*work) (struct papr_rtas_sequence *, size_t *);
};
params: Input parameters used to pass for RTAS call.
Begin: RTAS call specific function to initialize data
including work area allocation.
End: RTAS call specific function to free up resources
(free work area) after the sequence is completed.
Work: The actual RTAS call specific function which collects
the data from the hypervisor.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Tested-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416225743.596462-2-haren@linux.ibm.com
The powerpc crc code was relying on pagefault_disable from being
pulled in by random header files.
Fix this by explicitly including uaccess.h. Also add other missing
header files to prevent similar problems in future.
Reported-by: Eric Biggers <ebiggers@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 7ba8df4781 ("asm-generic: Make simd.h more resilient")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Once the lazy preemption is supported, it would be desirable to change
the preemption models at runtime. So add support for dynamic preemption
using DYNAMIC_KEY.
::Tested lightly on Power10 LPAR
Performance numbers indicate that, preempt=none(no dynamic) and
preempt=none(dynamic) are close.
cat /sys/kernel/debug/sched/preempt
(none) voluntary full lazy
perf stat -e probe:__cond_resched -a sleep 1
Performance counter stats for 'system wide':
1,253 probe:__cond_resched
echo full > /sys/kernel/debug/sched/preempt
cat /sys/kernel/debug/sched/preempt
none voluntary (full) lazy
perf stat -e probe:__cond_resched -a sleep 1
Performance counter stats for 'system wide':
0 probe:__cond_resched
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250210184334.567383-2-sshegde@linux.ibm.com
Update 'kvm-hv-pmu.c' to add five new perf-events mapped to the five
Hostwide counters. Since these newly introduced perf events are at system
wide scope and can be read from any L1-Lpar CPU, 'kvmppc_pmu' scope and
capabilities are updated appropriately.
Also introduce two new helpers. First is kvmppc_update_l0_stats() that uses
the infrastructure introduced in previous patches to issues the
H_GUEST_GET_STATE hcall L0-PowerVM to fetch guest-state-buffer holding the
latest values of these counters which is then parsed and 'l0_stats'
variable updated.
Second helper is kvmppc_pmu_event_update() which is called from
'kvmppv_pmu' callbacks and uses kvmppc_update_l0_stats() to update
'l0_stats' and the update the 'struct perf_event's event-counter.
Some minor updates to kvmppc_pmu_{add, del, read}() to remove some debug
scaffolding code.
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416162740.93143-7-vaibhav@linux.ibm.com
Implement and setup necessary structures to send a prepolulated
Guest-State-Buffer(GSB) requesting hostwide counters to L0-PowerVM and have
the returned GSB holding the values of these counters parsed. This is done
via existing GSB implementation and with the newly added support of
Hostwide elements in GSB.
The request to L0-PowerVM to return Hostwide counters is done using a
pre-allocated GSB named 'gsb_l0_stats'. To be able to populate this GSB
with the needed Guest-State-Elements (GSIDs) a instance of 'struct
kvmppc_gs_msg' named 'gsm_l0_stats' is introduced. The 'gsm_l0_stats' is
tied to an instance of 'struct kvmppc_gs_msg_ops' named 'gsb_ops_l0_stats'
which holds various callbacks to be compute the size ( hostwide_get_size()
), populate the GSB ( hostwide_fill_info() ) and
refresh ( hostwide_refresh_info() ) the contents of
'l0_stats' that holds the Hostwide counters returned from L0-PowerVM.
To protect these structures from simultaneous access a spinlock
'lock_l0_stats' has been introduced. The allocation and initialization of
the above structures is done in newly introduced kvmppc_init_hostwide() and
similarly the cleanup is performed in newly introduced
kvmppc_cleanup_hostwide().
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416162740.93143-6-vaibhav@linux.ibm.com
Introduce a new PMU named 'kvm-hv' inside a new module named 'kvm-hv-pmu'
to report Book3s kvm-hv specific performance counters. This will expose
KVM-HV specific performance attributes to user-space via kernel's PMU
infrastructure and would enableusers to monitor active kvm-hv based guests.
The patch creates necessary scaffolding to for the new PMU callbacks and
introduces the new kernel module name 'kvm-hv-pmu' which is built with
CONFIG_KVM_BOOK3S_HV_PMU. The patch doesn't introduce any perf-events yet,
which will be introduced in later patches
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416162740.93143-5-vaibhav@linux.ibm.com
Update 'test-guest-state-buffer.c' to add two new KUNIT test cases for
validating correctness of changes to Guest-state-buffer management
infrastructure for adding support for Hostwide GSB elements.
The newly introduced test test_gs_hostwide_msg() checks if the Hostwide
elements can be set and parsed from a Guest-state-buffer. The second kunit
test test_gs_hostwide_counters() checks if the Hostwide GSB elements can be
send to the L0-PowerVM hypervisor via the H_GUEST_SET_STATE hcall and
ensures that the returned guest-state-buffer has all the 5 Hostwide stat
counters present.
Below is the KATP test report with the newly added KUNIT tests:
KTAP version 1
# Subtest: guest_state_buffer_test
# module: test_guest_state_buffer
1..7
ok 1 test_creating_buffer
ok 2 test_adding_element
ok 3 test_gs_bitmap
ok 4 test_gs_parsing
ok 5 test_gs_msg
ok 6 test_gs_hostwide_msg
# test_gs_hostwide_counters: Guest Heap Size=0 bytes
# test_gs_hostwide_counters: Guest Heap Size Max=10995367936 bytes
# test_gs_hostwide_counters: Guest Page-table Size=2178304 bytes
# test_gs_hostwide_counters: Guest Page-table Size Max=2147483648 bytes
# test_gs_hostwide_counters: Guest Page-table Reclaim Size=0 bytes
ok 7 test_gs_hostwide_counters
# guest_state_buffer_test: pass:7 fail:0 skip:0 total:7
# Totals: pass:7 fail:0 skip:0 total:7
ok 1 guest_state_buffer_test
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416162740.93143-4-vaibhav@linux.ibm.com
Add support for adding and parsing Hostwide elements to the
Guest-state-buffer data structure used in apiv2. These elements are used to
share meta-information pertaining to entire L1-Lpar and this
meta-information is maintained by L0-PowerVM hypervisor. Example of this
include the amount of the page-table memory currently used by L0-PowerVM
for hosting the Shadow-Pagetable of all active L2-Guests. More of the are
documented in kernel-documentation at [1]. The Hostwide GSB elements are
currently only support with H_GUEST_SET_STATE hcall with a special flag
namely 'KVMPPC_GS_FLAGS_HOST_WIDE'.
The patch introduces new defs for the 5 new Hostwide GSB elements including
their GSIDs as well as introduces a new class of GSB elements namely
'KVMPPC_GS_CLASS_HOSTWIDE' to indicate to GSB construction/parsing
infrastructure in 'kvm/guest-state-buffer.c'. Also
gs_msg_ops_vcpu_get_size(), kvmppc_gsid_type() and
kvmppc_gse_{flatten,unflatten}_iden() are updated to appropriately indicate
the needed size for these Hostwide GSB elements as well as how to
flatten/unflatten their GSIDs so that they can be marked as available in
GSB bitmap.
[1] Documention/arch/powerpc/kvm-nested.rst
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250416162740.93143-3-vaibhav@linux.ibm.com
Commit 579aee9fc5 ("powerpc: suppress some linker warnings in recent linker versions")
enabled support to add linker option "--no-warn-rwx-segments",
if the version is greater than 2.39. Similar build warning were
reported recently from linker version 2.35.2.
ld: warning: arch/powerpc/boot/zImage.epapr has a LOAD segment with RWX permissions
ld: warning: arch/powerpc/boot/zImage.pseries has a LOAD segment with RWX permissions
Fix the warning by checking for "--no-warn-rwx-segments"
option support in linker to enable it, instead of checking
for the version range.
Fixes: 579aee9fc5 ("powerpc: suppress some linker warnings in recent linker versions")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/linuxppc-dev/61cf556c-4947-4bd6-af63-892fc0966dad@linux.ibm.com/
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250401004218.24869-1-maddy@linux.ibm.com
Following the example of the crc32, crc32c, and chacha code, make the
crypto subsystem register both generic and architecture-optimized
poly1305 shash algorithms, both implemented on top of the appropriate
library functions. This eliminates the need for every architecture to
implement the same shash glue code.
Note that the poly1305 shash requires that the key be prepended to the
data, which differs from the library functions where the key is simply a
parameter to poly1305_init(). Previously this was handled at a fairly
low level, polluting the library code with shash-specific code.
Reorganize things so that the shash code handles this quirk itself.
Also, to register the architecture-optimized shashes only when
architecture-optimized code is actually being used, add a function
poly1305_is_arch_optimized() and make each arch implement it. Change
each architecture's Poly1305 module_init function to arch_initcall so
that the CPU feature detection is guaranteed to run before
poly1305_is_arch_optimized() gets called by crypto/poly1305.c. (In
cases where poly1305_is_arch_optimized() just returns true
unconditionally, using arch_initcall is not strictly needed, but it's
still good to be consistent across architectures.)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>