pub fn sm4ed<const BS: u8>(x: u32, a: u32) -> u32
🔬 This is a nightly-only experimental API. (stdsimd #48556)
This is supported on target_arch="riscv32" only.
Expand description

Accelerates the round function F in the SM4 block cipher algorithm

This instruction is included in extension Zksed. It’s defined as:

SM4ED(x, a, BS) = x ⊕ T(ai)
... where
ai = a.bytes[BS]
T(ai) = L(τ(ai))
bi = τ(ai) = SM4-S-Box(ai)
ci = L(bi) = bi ⊕ (bi ≪ 2) ⊕ (bi ≪ 10) ⊕ (bi ≪ 18) ⊕ (bi ≪ 24)
SM4ED = (ci ≪ (BS * 8)) ⊕ x

where represents 32-bit xor, and ≪ k represents rotate left by k bits. As is defined above, T is a combined transformation of non linear S-Box transform τ and linear layer transform L.

In the SM4 algorithm, the round function F is defined as:

F(x0, x1, x2, x3, rk) = x0 ⊕ T(x1 ⊕ x2 ⊕ x3 ⊕ rk)
... where
T(A) = L(τ(A))
B = τ(A) = (SM4-S-Box(a0), SM4-S-Box(a1), SM4-S-Box(a2), SM4-S-Box(a3))
C = L(B) = B ⊕ (B ≪ 2) ⊕ (B ≪ 10) ⊕ (B ≪ 18) ⊕ (B ≪ 24)

It can be implemented by sm4ed instruction like:

let a = x1 ^ x2 ^ x3 ^ rk;
let c0 = sm4ed::<0>(x0, a);
let c1 = sm4ed::<1>(c0, a); // c1 represents c[0..=1], etc.
let c2 = sm4ed::<2>(c1, a);
let c3 = sm4ed::<3>(c2, a);
return c3; // c3 represents c[0..=3]
Run

According to RISC-V Cryptography Extensions, Volume I, the execution latency of this instruction must always be independent from the data it operates on.