This is supported on x86-64 and target feature
avx512bw
only.Expand description
Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst.