Skip to content

Conversation

@ylpoonlg
Copy link
Contributor

@ylpoonlg ylpoonlg commented Dec 8, 2025

Fixes #121961.

Rename Sve2.AddSaturateWithUnsignedAddend/AddSaturateWithSignedAddend to Sve2.AddSaturate so that it aligns with the approved SVE2 API.

Since the Sve2.AddSaturate intrinsic is used for multiple instructions, special intrinsics are used internally for the signed/unsigned addend variants. The AddSaturate intrinsic is marked with the SpecialImport flag so that the actual instruction can be decided based on the operand types during importing.

@dotnet/arm64-contrib @a74nh

Rename Sve2.AddSaturateWithUnsignedAddend/AddSaturateWithSignedAddend to
Sve2.AddSaturate so that it aligns with the approved SVE2 API.
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Dec 8, 2025
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

// always predicated. Therefore, separate HWIntrinsic IDs are used here instead of emitting the
// corresponding instructions in the codegen with the same HWIntrinsic.

if (op1BaseJitType != op2BaseJitType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear why this is needed instead of using the auxiliaryJitType field, which exists to allow such special differentiations.

We tend to prefer fewer IDs where we can as it simplifies the handling and matching of patterns elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My consideration was that AddSaturate can be either predicated (for SVE2 only) or unpredicated (for SVE/SVE2), while AddSaturateWithSigned/UnsignedAddend is only in predicated form in SVE2. Their hwintrinsic flags are slightly different (OptionalEmbeddedMasked vs EmbeddedMasked). I'm not sure how I should implement the different embedded mask behaviors using the same intrinsic ID. Is there a better way to handle this in the codegen?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't there already be NI_Sve_AddSaturate (unpredicated) vs NI_Sve2_AddSaturate (either), in which case the latter fits into the same general buckets as any other intrinsic that can potentially be predicated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when using HW_Flag_OptionalEmbeddedMaskedOperation, it does not wrap the node with a ConditionalSelect (embedded mask) during lowering, but the SUQADD/USQADD instructions always require a mask (even if not given explicitly). Would it be better to mark NI_Sve2_AddSaturate as HW_Flag_EmbeddedMaskedOperation then check if it meets the criteria to use the unpredicated SQADD/UQADD?

Copy link
Member

@tannergooding tannergooding Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would likely be the route I'd go, yes.

This just seems like one other case where we have an intrinsic with special semantics and so it requires minor additional handling when doing lowering and instruction selection; otherwise all the other handling in the front end should be the same.

The flags are mostly supposed to be hints and we often have helpers that query the flag and specially handle any remaining edges, so that other code isn't querying the flags directly. Likewise we tend to try to keep all the HIR (pre-lowering) IR simpler, as it makes pattern matching, value numbering, constant folding, and other phases simpler. While LIR is when we start really getting into machine and microarchitecture specific concepts (like register allocation, codegen, and emit) and so that's where we largely expect these complexities to start surfacing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Runtime.Intrinsics community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SVE2: Fixup SVE2 Math/bitmanipulate/FP APIs

3 participants