This is supported on x86 and target feature
avx
only.Expand description
Moves double-precision values from a 256-bit vector of [4 x double]
to a 32-byte aligned memory location. To minimize caching, the data is
flagged as non-temporal (unlikely to be used again soon).