pub fn sm4ks<const BS: u8>(x: u32, k: u32) -> u32
🔬 This is a nightly-only experimental API. (stdsimd #48556)
This is supported on target_arch="riscv32" only.
Expand description

Accelerates the key schedule operation in the SM4 block cipher algorithm

This instruction is included in extension Zksed. It’s defined as:

SM4KS(x, k, BS) = x ⊕ T'(ki)
... where
ki = k.bytes[BS]
T'(ki) = L'(τ(ki))
bi = τ(ki) = SM4-S-Box(ki)
ci = L'(bi) = bi ⊕ (bi ≪ 13) ⊕ (bi ≪ 23)
SM4KS = (ci ≪ (BS * 8)) ⊕ x

where represents 32-bit xor, and ≪ k represents rotate left by k bits. As is defined above, T' is a combined transformation of non linear S-Box transform τ and the replaced linear layer transform L'.

In the SM4 algorithm, the key schedule is defined as:

rk[i] = K[i+4] = K[i] ⊕ T'(K[i+1] ⊕ K[i+2] ⊕ K[i+3] ⊕ CK[i])
... where
K[0..=3] = MK[0..=3] ⊕ FK[0..=3]
T'(K) = L'(τ(K))
B = τ(K) = (SM4-S-Box(k0), SM4-S-Box(k1), SM4-S-Box(k2), SM4-S-Box(k3))
C = L'(B) = B ⊕ (B ≪ 13) ⊕ (B ≪ 23)

where MK represents the input 128-bit encryption key, constants FK and CK are fixed system configuration constant values defined by the SM4 algorithm. Hence, the key schedule operation can be implemented by sm4ks instruction like:

let k = k1 ^ k2 ^ k3 ^ ck_i;
let c0 = sm4ks::<0>(k0, k);
let c1 = sm4ks::<1>(c0, k); // c1 represents c[0..=1], etc.
let c2 = sm4ks::<2>(c1, k);
let c3 = sm4ks::<3>(c2, k);
return c3; // c3 represents c[0..=3]
Run

According to RISC-V Cryptography Extensions, Volume I, the execution latency of this instruction must always be independent from the data it operates on.