Performance regression in Vector<T>.Vector<T>(T)
on x86/x64
#108929
Labels
area-CodeGen-coreclr
CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
in-pr
There is an active PR which will close this issue when it is merged
Milestone
Description
When using NET 9-RC2,
Vector<T>
constructor that broadcasts a scalar to all elements of a vector is not optimized to a broadcasting instruction on x86/x64. .NET 8 compiler makes this optimization.Reproduction Steps
The regression can be reproduced by compiling the following function:
Expected behavior
I would expect the compiler to use only few instructions for broadcasting the scalar to all elements. This what .NET 8 compiler produces:
So a single vpbroadcastd does the job when AVX2 is enabled.
Actual behavior
Using .NET 9-rc2, the following machine code is generated:
As you can see, the compiler fills elements individually to an array on stack, which is much slower.
Regression?
No response
Known Workarounds
Use .NET 8 or select Vector128/256/512.Create method based on
Vector<T>
length:This workaround results in the following machine code with .NET 9-RC2:
Configuration
No response
Other information
No response
The text was updated successfully, but these errors were encountered: