Allow multiple effects to be packed into a single buffer again. #480

pcwalton · 2025-06-10T01:51:15Z

This commit restores the functionality present in older versions of Hanabi that allows multiple effects to be packed into a single buffer. This is in preparation for reenabling batching.

I use the base_instance field on the effects metadata to supply the offset within each indirect buffer to the shader. This is convenient, because it naturally paves the way to use multi-draw as a follow-up, and it avoids having to add more fields to the effects metadata.

As suggested in the code, I rewrote sorting to be prior to batching instead of after batching. It now uses a modified version of Tarjan's algorithm to sort the effects into the optimum batching order. Not only does it make sure that parents precede children, but it also groups buffers together, and sorts by offset, in order to make batching easier as a follow-up. Sorting by offset within the buffer should enable the use of binary search in order for the shader to map from the global thread ID to the effect instance.

This commit restores the functionality present in older versions of Hanabi that allows multiple effects to be packed into a single buffer. This is in preparation for reenabling batching. I use the `base_instance` field on the effects metadata to supply the offset within each indirect buffer to the shader. This is convenient, because it naturally paves the way to use multi-draw as a follow-up, and it avoids having to add more fields to the effects metadata. As suggested in the code, I rewrote sorting to be prior to batching instead of after batching. It now uses a modified version of [Tarjan's algorithm] to sort the effects into the optimum batching order. Not only does it make sure that parents precede children, but it also groups buffers together, and sorts by offset, in order to make batching easier as a follow-up. Sorting by offset within the buffer should enable the use of binary search in order for the shader to map from the global thread ID to the effect instance. [Tarjan's algorithm]: https://en.wikipedia.org/wiki/Topological_sorting#Depth-first_search

djeedai · 2025-06-18T19:03:03Z

@pcwalton did you see some example fails with what appears to be a genuine error?

thread '<unnamed>' panicked at /Users/runner/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wgpu-24.0.5/src/backend/wgpu_core.rs:2750:26:
wgpu error: Validation Error

Caused by:
  In ComputePass::end
    In a set_bind_group command
      Dynamic binding index 0 (targeting BindGroup with 'hanabi:bind_group:sort_fill' label 0, binding 1) with value 3200, does not respect device's requested `min_storage_buffer_offset_alignment` limit: 256

The sort shaders weren't updated, which caused failures in the `ribbons` example. It looks like the plan was to use dynamic offsets in order to move the window around within the buffer; however, that doesn't work due to alignment limitations. So I removed the dynamic offsets and instead use the new field in the effects metadata. Much of the complexity comes from the fact that the render shader now needs access to the effects metadata (in just a single place), because there's no special WGSL shader variable available for the base instance. Thus I had to update a bunch of bind groups and such in order to make the effects metadata accessible to the render shader. The good news is that, if we ever need to add more fields to the effects metadata that the render shader needs to access in the future, doing so will be straightforward.

pcwalton · 2025-06-19T04:57:36Z

@djeedai Thanks, it should be fixed now with the latest commit. This was because the sorting shaders previously weren't updated to deal with multiple effects in the same buffer. See the commit message for more details.

djeedai · 2025-06-19T21:18:03Z

The ribbon example is completely broken.

ribbon.mp4

pcwalton · 2025-06-21T01:55:41Z

@djeedai Hmm, it works for me. What system are you on?

pcwalton · 2025-06-21T04:33:29Z

ribbon is tested and is working on both Windows 11, NVIDIA GeForce RTX 4070 Laptop GPU and an macOS Apple M2 mac mini.

djeedai · 2025-06-21T15:46:29Z

src/render/mod.rs

-            effect_slice,
+            effect_slice: effect_slice.clone(),
            init_and_update_pipeline_ids,
-            parent_buffer_index,


If we don't have the index of the parent buffer, we can't batch together updates for effect which have a parent, because the shader only has 1 bind point for a parent. It might not be used yet but conceptually we need that info for batching. Or am I missing something?

Prior to this patch, we only used the parent_buffer_index field for the sorting. When I moved the sorting to be before batching instead of after it, the field ended up becoming unused, so I deleted it. I believe you're right that yes, we do conceptually need that info when we re-add batching later, but I figured you'd prefer not having unused fields.

djeedai

See individual comments, there's multiple bugs in the sorter.

src/render/batch.rs

djeedai · 2025-06-21T16:21:41Z

src/render/batch.rs

-mod tests {
-    use super::*;
-
-    fn make_batch(buffer_index: u32, parent_buffer_index: Option<u32>) -> EffectBatch {


I'd rather keep tests like that. Best example why is that I tried to re-add it and that's how I realized:

the sorter in your change was sorting in the wrong order.

the algorithm is broken in the first place, doesn't even assign the correct levels to parents and children

Here's my test

#[cfg(test)] mod tests { use super::*; fn insert_entry( sorter: &mut EffectSorter, entity: Entity, buffer_index: u32, base_instance: u32, parent: Option<Entity>, ) { sorter.effects.push(EffectToBeSorted { entity, base_instance, buffer_index, }); if let Some(parent) = parent { sorter.child_to_parent.insert(entity, parent); } } #[test] fn toposort_batches() { let mut sorter = EffectSorter::new(); // Some "parent" effect let e1 = Entity::from_raw(1); insert_entry(&mut sorter, e1, 42, 0, None); assert_eq!(sorter.effects.len(), 1); assert_eq!(sorter.effects[0].entity, e1); assert!(sorter.child_to_parent.is_empty()); // Some "child" effect in a different buffer let e2 = Entity::from_raw(2); insert_entry(&mut sorter, e2, 5, 30, Some(e1)); assert_eq!(sorter.effects.len(), 2); assert_eq!(sorter.effects[0].entity, e1); assert_eq!(sorter.effects[1].entity, e2); assert_eq!(sorter.child_to_parent.len(), 1); assert_eq!(sorter.child_to_parent[&e2], e1); sorter.sort(); assert_eq!(sorter.effects.len(), 2); assert_eq!(sorter.effects[0].entity, e2); // child first assert_eq!(sorter.effects[1].entity, e1); // parent after assert_eq!(sorter.child_to_parent.len(), 1); // unchanged assert_eq!(sorter.child_to_parent[&e2], e1); // unchanged // Some "child" effect in the same buffer as its parent let e3 = Entity::from_raw(3); insert_entry(&mut sorter, e3, 5, 20, Some(e1)); assert_eq!(sorter.effects.len(), 3); assert_eq!(sorter.effects[0].entity, e2); // from previous sort assert_eq!(sorter.effects[1].entity, e1); // from previous sort assert_eq!(sorter.effects[2].entity, e3); // simply appended assert_eq!(sorter.child_to_parent.len(), 2); assert_eq!(sorter.child_to_parent[&e2], e1); assert_eq!(sorter.child_to_parent[&e3], e1); sorter.sort(); assert_eq!(sorter.effects.len(), 3); // Note: Ideally this would want e1 to be first and e3 after, so that e3 and e2 // are grouped together since they're in the same buffer. But sorting by // topologicial order AND by buffer at the same time is in general impossible // (conflicting requirements; no solution). if sorter.effects[0].entity == e1 { assert_eq!(sorter.effects[1].entity, e3); // child first } else { assert_eq!(sorter.effects[0].entity, e3); // child first assert_eq!(sorter.effects[1].entity, e1); // child first } assert_eq!(sorter.effects[2].entity, e2); // parent after all children assert_eq!(sorter.child_to_parent.len(), 2); assert_eq!(sorter.child_to_parent[&e2], e1); assert_eq!(sorter.child_to_parent[&e3], e1); } }

Are you sure that last test is right? We have:

e1 -> e2 (i.e. e2 is a child of e1) e1 -> e3 (i.e. e3 is a child of e1) e1 buffer = 42 e2 buffer = 5 e3 buffer = 5

So a possible order is:

e3, e2, e1

because this satisfies the constraints of children before parents (e3 and e2 before e1) and buffers grouped together (e2 = 5, e3 = 5). But this code doesn't admit that order:

// Note: Ideally this would want e1 to be first and e3 after, so that e3 and e2 // are grouped together since they're in the same buffer. But sorting by // topological order AND by buffer at the same time is in general impossible // (conflicting requirements; no solution). if sorter.effects[0].entity == e1 { assert_eq!(sorter.effects[1].entity, e3); // child first } else { assert_eq!(sorter.effects[0].entity, e3); // child first assert_eq!(sorter.effects[1].entity, e1); // child first }

I pushed a new commit that changes the test to match the comments. I think this is what you were going for.

Sorry I think I changed my mental model mid-way. The test starts with "e1 is parent" but my mental model was that e2 was. The test is likely wrong.

djeedai · 2025-06-21T16:57:07Z

src/render/batch.rs

+        while let Some(effect_index) = visited.zeroes().next() {
+            visit(
+                &mut levels,
+                &mut visiting,
+                &mut visited,
+                &parent_to_children,
+                effect_index,
+                0,
+            );


This algorithm simply doesn't work. I wrote a small test with 3 entities, inserted in that order:

[0], child of [1]

[1], parent of both [0] and [2]

[2], child of [1]

Here's what happens:

On first while iteration, index 0 is returned because it's never been visited. visit() then mark [0] as visited, skip the for loop since it has no children, and insert [0] with level=0.

On second while iteration, index 1 is returned (the common parent). This time the for loop is executed, but child [0] is already visited so it's skipped. Child [2] is recursively visited with level=1. Finally parent [1] is inserted with level=0

We end up with:

[0] level=0

[1] level=0

[2] level=1

This is wrong; [0] and [2] should have the same level, different (larger or smaller depending on convention) than their parent [1].

OK, this should be fixed now. Now it's two phases. First, I do a topological sort (in the reverse order I had before). Second, I assign levels, based on the constraint that, for each (parent, child) pair, the level of the parent must be at least the level of the child + 1. This seems correct.

Now children are sorted before parents, and the sorting logic has been rewritten. The test isn't the same as the one suggested, because I believe the suggested test had a bug in it. I changed the test to follow the comments.

pcwalton · 2025-06-22T21:52:30Z

Addressed review comments.

pcwalton · 2025-06-22T21:54:09Z

Marking this as a draft because I can reproduce the ribbon failure now.

pcwalton · 2025-06-22T22:33:12Z

So it's very strange: I saw the bad behavior once, but can't reproduce it again. Whatever it is is very nondeterministic.

pcwalton · 2025-06-23T01:26:50Z

I only ever reproduced the problem once. Incredibly obnoxious.

@djeedai Sorry about this, I hate to ask someone else to debug my code, but if you can reproduce it reliably it might just be faster for you to look into it. I'm kind of wondering whether it was a pre-existing off-by-one error that suddenly became problematic when multiple effects were packed into the same buffer, looking at your video.

pcwalton · 2025-06-23T02:07:16Z

I've tested and ribbon is working on:

AMD Ryzen 9 8945HS, NVIDIA GeForce RTX 4070 Laptop GPU, Windows, Vulkan
AMD Ryzen 9 8945HS, NVIDIA GeForce RTX 4070 Laptop GPU, Windows, OpenGL
11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz, NVIDIA GeForce RTX 2060, Windows, Vulkan
11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz, NVIDIA GeForce RTX 2060, Windows, OpenGL
Apple M2 Mac mini, macOS, Metal

Those are all the machines I have easy access to. It doesn't work on DX12, but that's expected because of the downlevel flag VERTEX_AND_INSTANCE_INDEX_RESPECTS_RESPECTIVE_FIRST_VALUE_IN_INDIRECT_DRAW (which I think is gone in newer wgpu).

djeedai · 2025-06-23T19:55:39Z

Thanks for the extensive testing. I also couldn't reproduce after the first try, not sure why. But this looks like one of the bugs we had in the past, indexing being subtly wrong, but since it addresses a large buffer and doesn't go out of bounds, so we only see occasional visual artifacts. I will try to find some time this week or more likely week-end to debug once more, and if I don't find anything I'll merge as is.

offsets. Currently, we use a dynamic offset to point to the `Spawner`. This prevents batching, so this PR changes the shaders so that the effect metadata contains an index to the spawner, and the shader picks the spawner out of an array. This is built on top of djeedai#480.

pcwalton · 2025-08-21T23:49:26Z

Thought: Maybe the random failure is due to an assumption that the head of a ribbon is the first instance, which is usually but not always the case?

djeedai · 2025-09-11T20:27:49Z

Re:

Those are all the machines I have easy access to. It doesn't work on DX12, but that's expected because of the downlevel flag VERTEX_AND_INSTANCE_INDEX_RESPECTS_RESPECTIVE_FIRST_VALUE_IN_INDIRECT_DRAW (which I think is gone in newer wgpu).

Yes and no. As I read the docs, this will only work if you keep the VALIDATION_INDIRECT_CALL feature, which is I think that monstrosity which injects an extra compute dispatch before all indirect compute dispatches, and does so without caring for the bindings, that is, it will let the user code do the bindings, they undo them with its own bindings, then do the injected call, then redo the user bindings, then finally let the user call dispatch:

So I want to disable that ASAP.

djeedai added C - enhancement New feature or request A - internal Internal change on a core system labels Jun 18, 2025

djeedai reviewed Jun 21, 2025

View reviewed changes

djeedai requested changes Jun 21, 2025

View reviewed changes

Address review comments.

2cdfbbc

Now children are sorted before parents, and the sorting logic has been rewritten. The test isn't the same as the one suggested, because I believe the suggested test had a bug in it. I changed the test to follow the comments.

pcwalton marked this pull request as draft June 22, 2025 21:53

pcwalton marked this pull request as ready for review June 23, 2025 01:26

pcwalton mentioned this pull request Jul 4, 2025

Put spawners into an array in the shader instead of using dynamic offsets. #487

Open

Allow multiple effects to be packed into a single buffer again. #480

Are you sure you want to change the base?

Allow multiple effects to be packed into a single buffer again. #480

Uh oh!

Conversation

pcwalton commented Jun 10, 2025

Uh oh!

djeedai commented Jun 18, 2025

Uh oh!

pcwalton commented Jun 19, 2025

Uh oh!

djeedai commented Jun 19, 2025

Uh oh!

pcwalton commented Jun 21, 2025

Uh oh!

pcwalton commented Jun 21, 2025

Uh oh!

djeedai Jun 21, 2025

Choose a reason for hiding this comment

Uh oh!

pcwalton Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

djeedai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

djeedai Jun 21, 2025

Choose a reason for hiding this comment

Uh oh!

djeedai Jun 21, 2025

Choose a reason for hiding this comment

Uh oh!

pcwalton Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcwalton Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

djeedai Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

djeedai Jun 21, 2025

Choose a reason for hiding this comment

Uh oh!

pcwalton Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

pcwalton commented Jun 22, 2025

Uh oh!

pcwalton commented Jun 22, 2025

Uh oh!

pcwalton commented Jun 22, 2025

Uh oh!

pcwalton commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcwalton commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

djeedai commented Jun 23, 2025

Uh oh!

pcwalton commented Aug 21, 2025

Uh oh!

djeedai commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pcwalton Jun 21, 2025 •

edited

Loading

pcwalton Jun 22, 2025 •

edited

Loading

pcwalton commented Jun 23, 2025 •

edited

Loading

pcwalton commented Jun 23, 2025 •

edited

Loading