This is supported on x86-64 and target feature
fma
only.Expand description
Multiplies the lower single-precision (32-bit) floating-point elements in
a
and b
, and subtract packed elements in c
from the negated
intermediate result. Store the result in the lower element of the
returned value, and copy the 3 upper elements from a
to the upper
elements of the result.