Pull bitmap updates from Yury Norov:
- new API: bitmap_weight_from() and bitmap_weighted_xor() (Yury)
- drop unused __find_nth_andnot_bit() (Yury)
- new tests and test improvements (Andy, Akinobu, Yury)
- fixes for count_zeroes API (Yury)
- cleanup bitmap_print_to_pagebuf() mess (Yury)
- documentation updates (Andy, Kai, Kit).
* tag 'bitmap-for-v7.1' of https://github.com/norov/linux: (24 commits)
bitops: Update kernel-doc for sign_extendXX()
powerpc/xive: simplify xive_spapr_debug_show()
thermal: intel: switch cpumask_get() to using cpumask_print_to_pagebuf()
coresight: don't use bitmap_print_to_pagebuf()
lib/prime_numbers: drop temporary buffer in dump_primes()
drm/xe: switch xe_pagefault_queue_init() to using bitmap_weighted_or()
ice: use bitmap_empty() in ice_vf_has_no_qs_ena
ice: use bitmap_weighted_xor() in ice_find_free_recp_res_idx()
bitmap: introduce bitmap_weighted_xor()
bitmap: add test_zero_nbits()
bitmap: exclude nbits == 0 cases from bitmap test
bitmap: test bitmap_weight() for more
asm-generic/bitops: Fix a comment typo in instrumented-atomic.h
bitops: fix kernel-doc parameter name for parity8()
lib: count_zeros: unify count_{leading,trailing}_zeros()
lib: count_zeros: fix 32/64-bit inconsistency in count_trailing_zeros()
lib: crypto: fix comments for count_leading_zeros()
x86/topology: use bitmap_weight_from()
bitmap: add bitmap_weight_from()
lib/find_bit_benchmark: avoid clearing randomly filled bitmap in test_find_first_bit()
...
Pull crypto library updates from Eric Biggers:
- Migrate more hash algorithms from the traditional crypto subsystem to
lib/crypto/
Like the algorithms migrated earlier (e.g. SHA-*), this simplifies
the implementations, improves performance, enables further
simplifications in calling code, and solves various other issues:
- AES CBC-based MACs (AES-CMAC, AES-XCBC-MAC, and AES-CBC-MAC)
- Support these algorithms in lib/crypto/ using the AES library
and the existing arm64 assembly code
- Reimplement the traditional crypto API's "cmac(aes)",
"xcbc(aes)", and "cbcmac(aes)" on top of the library
- Convert mac80211 to use the AES-CMAC library. Note: several
other subsystems can use it too and will be converted later
- Drop the broken, nonstandard, and likely unused support for
"xcbc(aes)" with key lengths other than 128 bits
- Enable optimizations by default
- GHASH
- Migrate the standalone GHASH code into lib/crypto/
- Integrate the GHASH code more closely with the very similar
POLYVAL code, and improve the generic GHASH implementation to
resist cache-timing attacks and use much less memory
- Reimplement the AES-GCM library and the "gcm" crypto_aead
template on top of the GHASH library. Remove "ghash" from the
crypto_shash API, as it's no longer needed
- Enable optimizations by default
- SM3
- Migrate the kernel's existing SM3 code into lib/crypto/, and
reimplement the traditional crypto API's "sm3" on top of it
- I don't recommend using SM3, but this cleanup is worthwhile
to organize the code the same way as other algorithms
- Testing improvements:
- Add a KUnit test suite for each of the new library APIs
- Migrate the existing ChaCha20Poly1305 test to KUnit
- Make the KUnit all_tests.config enable all crypto library tests
- Move the test kconfig options to the Runtime Testing menu
- Other updates to arch-optimized crypto code:
- Optimize SHA-256 for Zhaoxin CPUs using the Padlock Hash Engine
- Remove some MD5 implementations that are no longer worth keeping
- Drop big endian and voluntary preemption support from the arm64
code, as those configurations are no longer supported on arm64
- Make jitterentropy and samples/tsm-mr use the crypto library APIs
* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (66 commits)
lib/crypto: arm64: Assume a little-endian kernel
arm64: fpsimd: Remove obsolete cond_yield macro
lib/crypto: arm64/sha3: Remove obsolete chunking logic
lib/crypto: arm64/sha512: Remove obsolete chunking logic
lib/crypto: arm64/sha256: Remove obsolete chunking logic
lib/crypto: arm64/sha1: Remove obsolete chunking logic
lib/crypto: arm64/poly1305: Remove obsolete chunking logic
lib/crypto: arm64/gf128hash: Remove obsolete chunking logic
lib/crypto: arm64/chacha: Remove obsolete chunking logic
lib/crypto: arm64/aes: Remove obsolete chunking logic
lib/crypto: Include <crypto/utils.h> instead of <crypto/algapi.h>
lib/crypto: aesgcm: Don't disable IRQs during AES block encryption
lib/crypto: aescfb: Don't disable IRQs during AES block encryption
lib/crypto: tests: Migrate ChaCha20Poly1305 self-test to KUnit
lib/crypto: sparc: Drop optimized MD5 code
lib/crypto: mips: Drop optimized MD5 code
lib: Move crypto library tests to Runtime Testing menu
crypto: sm3 - Remove 'struct sm3_state'
crypto: sm3 - Remove the original "sm3_block_generic()"
crypto: sm3 - Remove sm3_base.h
...
Since support for big-endian arm64 kernels was removed, the CPU_LE()
macro now unconditionally emits the code it is passed, and the CPU_BE()
macro now unconditionally discards the code it is passed.
Simplify the assembly code in lib/crypto/arm64/ accordingly.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401003331.144065-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Since commit aefbab8e77 ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64. And since commit 7dadeaa6e8 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either. Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.
Simplify the SHA-3 code accordingly.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Since commit aefbab8e77 ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64. And since commit 7dadeaa6e8 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either. Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.
Simplify the SHA-512 code accordingly.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Since commit aefbab8e77 ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64. And since commit 7dadeaa6e8 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either. Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.
Simplify the SHA-256 code accordingly.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Since commit aefbab8e77 ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64. And since commit 7dadeaa6e8 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either. Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.
Simplify the SHA-1 code accordingly.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Since commit aefbab8e77 ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64. And since commit 7dadeaa6e8 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either. Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.
Simplify the Poly1305 code accordingly.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Since commit aefbab8e77 ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64. And since commit 7dadeaa6e8 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either. Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.
Simplify the GHASH and POLYVAL code accordingly.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Since commit aefbab8e77 ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64. And since commit 7dadeaa6e8 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either. Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.
Simplify the ChaCha code accordingly.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Since commit aefbab8e77 ("arm64: fpsimd: Preserve/restore kernel mode
NEON at context switch"), kernel-mode NEON sections have been
preemptible on arm64. And since commit 7dadeaa6e8 ("sched: Further
restrict the preemption modes"), voluntary preemption is no longer
supported on arm64 either. Therefore, there's no longer any need to
limit the length of kernel-mode NEON sections on arm64.
Simplify the AES-CBC-MAC code accordingly.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401000548.133151-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Since the lib/crypto/ files that include <crypto/algapi.h> need it only
for the transitive inclusion of <crypto/utils.h> (and not all the
traditional crypto API stuff that the rest of <crypto/algapi.h> is
filled with), replace these inclusions with direct inclusions of
<crypto/utils.h>.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260331024438.51783-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
aes_encrypt() now uses AES instructions when available instead of always
using table-based code. AES instructions are constant-time and don't
benefit from disabling IRQs as a constant-time hardening measure.
In fact, on two architectures (arm and riscv) disabling IRQs is
counterproductive because it prevents the AES instructions from being
used. (See the may_use_simd() implementation on those architectures.)
Therefore, let's remove the IRQ disabling/enabling and leave the choice
of constant-time hardening measures to the AES library code.
Note that currently the arm table-based AES code (which runs on arm
kernels that don't have ARMv8 CE) disables IRQs, while the generic
table-based AES code does not. So this does technically regress in
constant-time hardening when that generic code is used. But as
discussed in commit a22fd0e3c4 ("lib/crypto: aes: Introduce improved
AES library") I think just leaving IRQs enabled is the right choice.
Disabling them is slow and can cause problems, and AES instructions
(which modern CPUs have) solve the problem in a much better way anyway.
Link: https://lore.kernel.org/r/20260331024430.51755-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
aes_encrypt() now uses AES instructions when available instead of always
using table-based code. AES instructions are constant-time and don't
benefit from disabling IRQs as a constant-time hardening measure.
In fact, on two architectures (arm and riscv) disabling IRQs is
counterproductive because it prevents the AES instructions from being
used. (See the may_use_simd() implementation on those architectures.)
Therefore, let's remove the IRQ disabling/enabling and leave the choice
of constant-time hardening measures to the AES library code.
Note that currently the arm table-based AES code (which runs on arm
kernels that don't have ARMv8 CE) disables IRQs, while the generic
table-based AES code does not. So this does technically regress in
constant-time hardening when that generic code is used. But as
discussed in commit a22fd0e3c4 ("lib/crypto: aes: Introduce improved
AES library") I think just leaving IRQs enabled is the right choice.
Disabling them is slow and can cause problems, and AES instructions
(which modern CPUs have) solve the problem in a much better way anyway.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260331024414.51545-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Move the ChaCha20Poly1305 test from an ad-hoc self-test to a KUnit test.
Keep the same test logic for now, just translated to KUnit.
Moving to KUnit has multiple benefits, such as:
- Consistency with the rest of the lib/crypto/ tests.
- Kernel developers familiar with KUnit, which is used kernel-wide, can
quickly understand the test and how to enable and run it.
- The test will be automatically run by anyone using
lib/crypto/.kunitconfig or KUnit's all_tests.config.
- Results are reported using the standard KUnit mechanism.
- It eliminates one of the few remaining back-references to crypto/ from
lib/crypto/, specifically a reference to CONFIG_CRYPTO_SELFTESTS.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260327224229.137532-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
MD5 is obsolete. Continuing to maintain architecture-optimized
implementations of MD5 is unnecessary and risky. It diverts resources
from the modern algorithms that are actually important.
While there was demand for continuing to maintain the PowerPC optimized
MD5 code to accommodate userspace programs that are misusing AF_ALG
(https://lore.kernel.org/linux-crypto/c4191597-341d-4fd7-bc3d-13daf7666c41@csgroup.eu/),
no such demand has been seen for the SPARC optimized MD5 code.
Thus, let's drop it and focus effort on the more modern SHA algorithms,
which already have optimized code for SPARC.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260326203341.60393-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
MD5 is obsolete. Continuing to maintain architecture-optimized
implementations of MD5 is unnecessary and risky. It diverts resources
from the modern algorithms that are actually important.
While there was demand for continuing to maintain the PowerPC optimized
MD5 code to accommodate userspace programs that are misusing AF_ALG
(https://lore.kernel.org/linux-crypto/c4191597-341d-4fd7-bc3d-13daf7666c41@csgroup.eu/),
no such demand has been seen for the MIPS Cavium Octeon optimized MD5
code. Note that this code runs on only one particular line of SoCs.
Thus, let's drop it and focus effort on the more modern SHA algorithms,
which already have optimized code for the same SoCs.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260326204824.62010-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Since the ChaCha permutation is invertible, the local variable
'permuted_state' is sufficient to compute the original 'state', and thus
the key, even after the permutation has been done.
While the kernel is quite inconsistent about zeroizing secrets on the
stack (and some prominent userspace crypto libraries don't bother at all
since it's not guaranteed to work anyway), the kernel does try to do it
as a best practice, especially in cases involving the RNG.
Thus, explicitly zeroize 'permuted_state' before it goes out of scope.
Fixes: c08d0e6473 ("crypto: chacha20 - Add a generic ChaCha20 stream cipher implementation")
Cc: stable@vger.kernel.org
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260326032920.39408-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Currently the kconfig options for the crypto library KUnit tests appear
in the menu:
-> Library routines
-> Crypto library routines
However, this is the only content of "Crypto library routines". I.e.,
it is empty when CONFIG_KUNIT=n. This is because the crypto library
routines themselves don't have (or need to have) prompts.
Since this usually ends up as an unnecessary empty menu, let's remove
this menu and instead source the lib/crypto/tests/Kconfig file from
lib/Kconfig.debug inside the "Runtime Testing" menu:
-> Kernel hacking
-> Kernel Testing and Coverage
-> Runtime Testing
This puts the prompts alongside the ones for most of the other lib/
KUnit tests. This seems to be a much better match to how the kconfig
menus are organized.
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20260322032438.286296-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Since the architecture-optimized SM3 code was migrated into lib/crypto/,
sm3_block_generic() is no longer called. Remove it. Then, since this
frees up the name, rename sm3_transform() to sm3_block_generic()
(matching the naming convention used in other hash algorithms).
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260321040935.410034-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Instead of exposing the x86-optimized SM3 code via an x86-specific
crypto_shash algorithm, instead just implement the sm3_blocks() library
function. This is much simpler, it makes the SM3 library functions be
x86-optimized, and it fixes the longstanding issue where the
x86-optimized SM3 code was disabled by default. SM3 still remains
available through crypto_shash, but individual architectures no longer
need to handle it.
Tweak the prototype of sm3_transform_avx() to match what the library
expects, including changing the block count to size_t. Note that the
assembly code actually already treated this argument as size_t.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260321040935.410034-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Instead of exposing the riscv-optimized SM3 code via a riscv-specific
crypto_shash algorithm, instead just implement the sm3_blocks() library
function. This is much simpler, it makes the SM3 library functions be
riscv-optimized, and it fixes the longstanding issue where the
riscv-optimized SM3 code was disabled by default. SM3 still remains
available through crypto_shash, but individual architectures no longer
need to handle it.
Tweak the prototype of sm3_transform_zvksh_zvkb() to match what the
library expects, including changing the block count to size_t.
Note that the assembly code already treated it as size_t.
Note: to see the diff from arch/riscv/crypto/sm3-riscv64-glue.c to
lib/crypto/riscv/sm3.h, view this commit with 'git show -M10'.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260321040935.410034-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Instead of exposing the arm64-optimized SM3 code via arm64-specific
crypto_shash algorithms, instead just implement the sm3_blocks() library
function. This is much simpler, it makes the SM3 library functions be
arm64-optimized, and it fixes the longstanding issue where the
arm64-optimized SM3 code was disabled by default. SM3 still remains
available through crypto_shash, but individual architectures no longer
need to handle it.
Tweak the SM3 assembly function prototypes to match what the library
expects, including changing the block count from 'int' to 'size_t'.
sm3_ce_transform() had to be updated to access 'x2' instead of 'w2',
while sm3_neon_transform() already used 'x2'.
Remove the CFI stubs which are no longer needed because the SM3 assembly
functions are no longer ever indirectly called.
Remove the dependency on KERNEL_MODE_NEON. It was unnecessary, because
KERNEL_MODE_NEON is always enabled on arm64.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260321040935.410034-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Add a straightforward library API for SM3, mirroring the ones for the
other hash algorithms. It uses the existing generic implementation of
SM3's compression function in lib/crypto/sm3.c. Hooks are added for
architecture-optimized implementations, which later commits will wire up
to the existing optimized SM3 code for arm64, riscv, and x86.
Note that the rationale for this is *not* that SM3 should be used, or
that any kernel subsystem currently seems like a candidate for switching
from the sm3 crypto_shash to SM3 library. (SM3, in fact, shouldn't be
used. Likewise you shouldn't use MD5, SHA-1, RC4, etc...)
Rather, it's just that this will simplify how the kernel's existing SM3
code is integrated and make it much easier to maintain and test. SM3 is
one of the only hash algorithms with arch-optimized code that is still
integrated in the old way. By converting it to the new lib/crypto/ code
organization, we'll only have to keep track of one way of doing things.
The library will also get a KUnit test suite (as usual for lib/crypto/),
so it will become more easily and comprehensively tested as well.
Skip adding functions for HMAC-SM3 for now, though. There's not as much
point in adding those right now.
Note: similar to the other hash algorithms, the library API uses
'struct sm3_ctx', not 'struct sm3_state'. The existing 'struct
sm3_state' and the sm3_block_generic() function which uses it are
temporarily kept around until their users are updated by later commits.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260321040935.410034-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Make the AES-GCM library use the GHASH library instead of directly
calling gf128mul_lle(). This allows the architecture-optimized GHASH
implementations to be used, or the improved generic implementation if no
architecture-optimized implementation is usable.
Note: this means that <crypto/gcm.h> no longer needs to include
<crypto/gf128mul.h>. Remove that inclusion, and include
<crypto/gf128mul.h> explicitly from arch/x86/crypto/aesni-intel_glue.c
which previously was relying on the transitive inclusion.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-20-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Remove the 4k_lle multiplication functions and the associated
gf128mul_table_le data table. Their only user was the generic
implementation of GHASH, which has now been changed to use a different
implementation based on standard integer multiplication.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-18-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Remove the "ghash-pclmulqdqni" crypto_shash algorithm. Move the
corresponding assembly code into lib/crypto/, and wire it up to the
GHASH library.
This makes the GHASH library be optimized with x86's carryless
multiplication instructions. It also greatly reduces the amount of
x86-specific glue code that is needed, and it fixes the issue where this
GHASH optimization was disabled by default.
Rename and adjust the prototypes of the assembly functions to make them
fit better with the library. Remove the byte-swaps (pshufb
instructions) that are no longer necessary because the library keeps the
accumulator in POLYVAL format rather than GHASH format.
Rename clmul_ghash_mul() to polyval_mul_pclmul() to reflect that it
really does a POLYVAL style multiplication. Wire it up to both
ghash_mul_arch() and polyval_mul_arch().
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-15-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Remove the "ghash-s390" crypto_shash algorithm, and replace it with an
implementation of ghash_blocks_arch() for the GHASH library.
This makes the GHASH library be optimized with CPACF. It also greatly
reduces the amount of s390-specific glue code that is needed, and it
fixes the issue where this GHASH optimization was disabled by default.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-14-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Remove the "ghash-riscv64-zvkg" crypto_shash algorithm. Move the
corresponding assembly code into lib/crypto/, modify it to take the
length in blocks instead of bytes, and wire it up to the GHASH library.
This makes the GHASH library be optimized with the RISC-V Vector
Cryptography Extension. It also greatly reduces the amount of
riscv-specific glue code that is needed, and it fixes the issue where
this optimized GHASH code was disabled by default.
Note that this RISC-V code has multiple opportunities for improvement,
such as adding more parallelism, providing an optimized multiplication
function, and directly supporting POLYVAL. But for now, this commit
simply tweaks ghash_zvkg() slightly to make it compatible with the
library, then wires it up to ghash_blocks_arch().
ghash_preparekey_arch() is also implemented to store the copy of the raw
key needed by the vghsh.vv instruction.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Remove the "p8_ghash" crypto_shash algorithm. Move the corresponding
assembly code into lib/crypto/, and wire it up to the GHASH library.
This makes the GHASH library be optimized for POWER8. It also greatly
reduces the amount of powerpc-specific glue code that is needed, and it
fixes the issue where this optimized GHASH code was disabled by default.
Note that previously the C code defined the POWER8 GHASH key format as
"u128 htable[16]", despite the assembly code only using four entries.
Fix the C code to use the correct key format. To fulfill the library
API contract, also make the key preparation work in all contexts.
Note that the POWER8 assembly code takes the accumulator in GHASH
format, but it actually byte-reflects it to get it into POLYVAL format.
The library already works with POLYVAL natively. For now, just wire up
this existing code by converting it to/from GHASH format in C code.
This should be cleaned up to eliminate the unnecessary conversion later.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Remove the "ghash-neon" crypto_shash algorithm. Move the corresponding
assembly code into lib/crypto/, and wire it up to the GHASH library.
This makes the GHASH library be optimized on arm64 (though only with
NEON, not PMULL; for now the goal is just parity with crypto_shash). It
greatly reduces the amount of arm64-specific glue code that is needed,
and it fixes the issue where this optimization was disabled by default.
To integrate the assembly code correctly with the library, make the
following tweaks:
- Change the type of 'blocks' from int to size_t
- Change the types of 'dg' and 'h' to polyval_elem. Note that this
simply reflects the format that the code was already using.
- Remove the 'head' argument, which is no longer needed.
- Remove the CFI stubs, as indirect calls are no longer used.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Remove the "ghash-neon" crypto_shash algorithm. Move the corresponding
assembly code into lib/crypto/, and wire it up to the GHASH library.
This makes the GHASH library be optimized on arm (though only with NEON,
not PMULL; for now the goal is just parity with crypto_shash). It
greatly reduces the amount of arm-specific glue code that is needed, and
it fixes the issue where this optimization was disabled by default.
To integrate the assembly code correctly with the library, make the
following tweaks:
- Change the type of 'blocks' from int to size_t.
- Change the types of 'dg' and 'h' to polyval_elem. Note that this
simply reflects the format that the code was already using, at least
on little endian CPUs. For big endian CPUs, add byte-swaps.
- Remove the 'head' argument, which is no longer needed.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Add GHASH support to the gf128hash module.
This will replace the GHASH support in the crypto_shash API. It will be
used by the "gcm" template and by the AES-GCM library (when an
arch-optimized implementation of the full AES-GCM is unavailable).
This consists of a simple API that mirrors the existing POLYVAL API, a
generic implementation of that API based on the existing efficient and
side-channel-resistant polyval_mul_generic(), and the framework for
architecture-optimized implementations of the GHASH functions.
The GHASH accumulator is stored in POLYVAL format rather than GHASH
format, since this is what most modern GHASH implementations actually
need. The few implementations that expect the accumulator in GHASH
format will just convert the accumulator to/from GHASH format
temporarily. (Supporting architecture-specific accumulator formats
would be possible, but doesn't seem worth the complexity.)
However, architecture-specific formats of struct ghash_key will be
supported, since a variety of formats will be needed there anyway. The
default format is just the key in POLYVAL format.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Currently, some architectures (arm64 and x86) have optimized code for
both GHASH and POLYVAL. Others (arm, powerpc, riscv, and s390) have
optimized code only for GHASH. While POLYVAL support could be
implemented on these other architectures, until then we need to support
the case where arch-optimized functions are present only for GHASH.
Therefore, update the support for arch-optimized POLYVAL functions to
allow architectures to opt into supporting these functions individually.
The new meaning of CONFIG_CRYPTO_LIB_GF128HASH_ARCH is that some level
of GHASH and/or POLYVAL acceleration is provided.
Also provide an implementation of polyval_mul() based on
polyval_blocks_arch(), for when polyval_mul_arch() isn't implemented.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Currently, the standalone GHASH code is coupled with crypto_shash. This
has resulted in unnecessary complexity and overhead, as well as the code
being unavailable to library code such as the AES-GCM library. Like was
done with POLYVAL, it needs to find a new home in lib/crypto/.
GHASH and POLYVAL are closely related and can each be implemented in
terms of each other. Optimized code for one can be reused with the
other. But also since GHASH tends to be difficult to implement directly
due to its unnatural bit order, most modern GHASH implementations
(including the existing arm, arm64, powerpc, and x86 optimized GHASH
code, and the new generic GHASH code I'll be adding) actually
reinterpret the GHASH computation as an equivalent POLYVAL computation,
pre and post-processing the inputs and outputs to map to/from POLYVAL.
Given this close relationship, it makes sense to group the GHASH and
POLYVAL code together in the same module. This gives us a wide range of
options for implementing them, reusing code between the two and properly
utilizing whatever instructions each architecture provides.
Thus, GHASH support will be added to the library module that is
currently called "polyval". Rename it to an appropriate name:
"gf128hash". Rename files, options, functions, etc. where appropriate
to reflect the upcoming sharing with GHASH. (Note: polyval_kunit is not
renamed, as ghash_kunit will be added alongside it instead.)
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260319061723.1140720-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
count_leading_zeros() is based on fls(), which is defined for x == 0,
contrary to __ffs() family. The comment in crypto/mpi erroneously
states that the function may return undef in such case.
Fix the comment together with the outdated function signature, and now
that COUNT_LEADING_ZEROS_0 is not referenced in the codebase, get rid of
it too.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Yury Norov <ynorov@nvidia.com>
CONFIG_KERNEL_MODE_NEON is always enabled on arm64, and it always has
been since its introduction in 2013. Given that and the fact that the
usefulness of kernel-mode NEON has only been increasing over time,
checking for this option in arm64-specific code is unnecessary. Remove
these checks from lib/crypto/ to simplify the code and prevent any
future bugs where e.g. code gets disabled due to a typo in this logic.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260314175049.26931-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Defaulting the crypto KUnit tests to KUNIT_ALL_TESTS || CRYPTO_SELFTESTS
instead of simply KUNIT_ALL_TESTS was originally intended to make it
easy to enable all the crypto KUnit tests. This additional default is
nonstandard for KUnit tests, though, and it can cause all the KUnit
tests to be built-in unexpectedly if CRYPTO_SELFTESTS is set. It also
constitutes a back-reference to crypto/ from lib/crypto/, which is
something that we should be avoiding in order to get clean layering.
Now that we provide a lib/crypto/.kunitconfig file that enables all
crypto KUnit tests, let's consider that to be the supported way to
enable all these tests, and drop the default of CRYPTO_SELFTESTS.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260317040626.5697-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
For kunit.py to run all the crypto library tests when passed the
--alltests option, tools/testing/kunit/configs/all_tests.config needs to
enable options that satisfy the test dependencies.
This is the same as what lib/crypto/.kunitconfig already does.
However, the strategy that lib/crypto/.kunitconfig currently uses to
select all the hidden library options isn't going to scale up well when
it needs to be repeated in two places.
Instead let's go ahead and introduce an option
CRYPTO_LIB_ENABLE_ALL_FOR_KUNIT that depends on KUNIT and selects all
the crypto library options that have corresponding KUnit tests.
Update lib/crypto/.kunitconfig to use this option.
Link: https://lore.kernel.org/r/20260314035927.51351-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Add a FIPS cryptographic algorithm self-test for AES-CMAC to fulfill the
self-test requirement when this code is built into a FIPS 140
cryptographic module. This provides parity with the traditional crypto
API, which uses crypto/testmgr.c to meet the FIPS self-test requirement.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260218213501.136844-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Instead of exposing the arm64-optimized CMAC, XCBC-MAC, and CBC-MAC code
via arm64-specific crypto_shash algorithms, instead just implement the
aes_cbcmac_blocks_arch() library function. This is much simpler, it
makes the corresponding library functions be arm64-optimized, and it
fixes the longstanding issue where this optimized code was disabled by
default. The corresponding algorithms still remain available through
crypto_shash, but individual architectures no longer need to handle it.
Note that to be compatible with the library using 'size_t' lengths, the
type of the return value and 'blocks' parameter to the assembly
functions had to be changed to 'size_t', and the assembly code had to be
updated accordingly to use the corresponding 64-bit registers.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260218213501.136844-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
To migrate the support for CBC-based MACs into libaes, the corresponding
arm64 assembly code needs to be moved there. However, the arm64 AES
assembly code groups many AES modes together; individual modes aren't
easily separable. (This isn't unique to arm64; other architectures
organize their AES modes similarly.)
Since the other AES modes will be migrated into the library eventually
too, just move the full assembly files for the AES modes into the
library. (This is similar to what I already did for PowerPC and SPARC.)
Specifically: move the assembly files aes-ce.S, aes-modes.S, and
aes-neon.S and their build rules; declare the assembly functions in
<crypto/aes.h>; and export the assembly functions from libaes.
Note that the exports and public declarations of the assembly functions
are temporary. They exist only to keep arch/arm64/crypto/ working until
the AES modes are fully moved into the library.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260218213501.136844-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Add support for CBC-based MACs to the AES library, specifically
AES-CMAC, AES-XCBC-MAC, and AES-CBC-MAC.
Of these three algorithms, AES-CMAC is the most modern and the most
commonly used. Use cases for the AES-CMAC library include the kernel's
SMB client and server, and the bluetooth and mac80211 drivers.
Support for AES-XCBC-MAC and AES-CBC-MAC is included so that there will
be no performance regression in the "xcbc(aes)" and "ccm(aes)" support
in the traditional crypto API once the arm64-optimized code is migrated
into the library. AES-XCBC-MAC is given its own key preparation
function but is otherwise identical to AES-CMAC and just reuses the
AES-CMAC structs and functions.
The implementation automatically uses the optimized AES key expansion
and single block en/decryption functions. It also allows architectures
to provide an optimized implementation of aes_cbcmac_blocks(), which
allows the existing arm64-optimized code for these modes to be used.
Just put the code for these modes directly in the libaes module rather
than in a separate module. This is simpler, it makes it easier to share
code between AES modes, and it increases the amount of inlining that is
possible. (Indeed, for these reasons, most of the
architecture-optimized AES code already provides multiple modes per
module. x86 for example has only a single aesni-intel module. So to a
large extent, this design choice just reflects the status quo.)
However, since there are a lot of AES modes, there's still some value in
omitting modes that are not needed at all in a given kernel. Therefore,
make these modes an optional feature of libaes, controlled by
CONFIG_CRYPTO_LIB_AES_CBC_MACS. This seems like a good middle ground.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260218213501.136844-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Add a .kunitconfig file to the lib/crypto/ directory so that the crypto
library tests can be run more easily using kunit.py. Example with UML:
tools/testing/kunit/kunit.py run --kunitconfig=lib/crypto
Example with QEMU:
tools/testing/kunit/kunit.py run --kunitconfig=lib/crypto --arch=arm64 --make_options LLVM=1
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260301040140.490310-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
The convention for KUnit tests is to have the test kconfig options
visible only when the code they depend on is already enabled. This way
only the tests that are relevant to the particular kernel build can be
enabled, either manually or via KUNIT_ALL_TESTS.
Update lib/crypto/tests/Kconfig to follow that convention, i.e. depend
on the corresponding library options rather than selecting them. This
fixes an issue where enabling KUNIT_ALL_TESTS enabled non-test code.
This does mean that it becomes a bit more difficult to enable *all* the
crypto library tests (which is what I do as a maintainer of the code),
since doing so will now require enabling other options that select the
libraries. Regardless, we should follow the standard KUnit convention.
I'll also add a .kunitconfig file that does enable all these options.
Note: currently most of the crypto library options are selected by
visible options in crypto/Kconfig, which can be used to enable them
without too much trouble. If in the future we end up with more cases
like CRYPTO_LIB_CURVE25519 which is selected only by WIREGUARD (thus
making CRYPTO_LIB_CURVE25519_KUNIT_TEST effectively depend on WIREGUARD
after this commit), we could consider adding a new kconfig option that
enables all the library code specifically for testing.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/r/CAMuHMdULzMdxuTVfg8_4jdgzbzjfx-PHkcgbGSthcUx_sHRNMg@mail.gmail.com
Fixes: 4dcf6cadda ("lib/crypto: tests: Add KUnit tests for SHA-224 and SHA-256")
Fixes: 571eaeddb6 ("lib/crypto: tests: Add KUnit tests for SHA-384 and SHA-512")
Fixes: 6dd4d9f791 ("lib/crypto: tests: Add KUnit tests for Poly1305")
Fixes: 66b1306079 ("lib/crypto: tests: Add KUnit tests for SHA-1 and HMAC-SHA1")
Fixes: d6b6aac0cd ("lib/crypto: tests: Add KUnit tests for MD5 and HMAC-MD5")
Fixes: afc4e4a5f1 ("lib/crypto: tests: Migrate Curve25519 self-test to KUnit")
Fixes: 6401fd334d ("lib/crypto: tests: Add KUnit tests for BLAKE2b")
Fixes: 15c64c47e4 ("lib/crypto: tests: Add SHA3 kunit tests")
Fixes: b3aed551b3 ("lib/crypto: tests: Add KUnit tests for POLYVAL")
Fixes: ed894faccb ("lib/crypto: tests: Add KUnit tests for ML-DSA verification")
Fixes: 7246fe6cd6 ("lib/crypto: tests: Add KUnit tests for NH")
Cc: stable@vger.kernel.org
Reviewed-by: David Gow <david@davidgow.net>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260226191749.39397-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>