-
Notifications
You must be signed in to change notification settings - Fork 1.2k
auto enabling priority and fairness #8650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c778149 to
87ae80e
Compare
method. Even with the channel providing guardrails on initialization, the previous attempt would still cause spurious panic's with newPM being nil by the time we attempt to call the method on it. I didn't take a look at the generated assembly but I assume that even with the channel block there was still a race as to when the address of newPM was copied from the outer scope's stack into the closure's stack. This could likely be avoided by taking the address of the pointer and passing that instead, but started to feel too clever to me.
2bf07fe to
b96901b
Compare
The test code here needs to half initializaton, and then set the defaultQueue itself. With the setting of config moved over to the actual initialization we need to replicate some of this code in the test to keep the same behavior as before.
dnr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the PR title: it should just talk about auto-enabling priority/fairness. the fact that there's a dynamic config to enable the auto-enabling is the least interesting part.
proto/internal/temporal/server/api/persistence/v1/task_queues.proto
Outdated
Show resolved
Hide resolved
And some other misc PR comments
dnr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(trying out new code review tool, apologies if any of the comments end up on the wrong lines)
proto/internal/temporal/server/api/matchingservice/v1/request_response.proto
Outdated
Show resolved
Hide resolved
| int64 version = 2; | ||
| } | ||
|
|
||
| enum FairnessState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this at the top of the file between imports and the first message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it this way originally as well, but the proto linter disagrees with this (a bit silly I know):
- file_path: temporal/server/api/persistence/v1/task_queues.proto
problems:
- message: Messages should precede all top-level enums.
location:
start_position:
line_number: 11
column_number: 1
end_position:
line_number: 16
column_number: 1
rule_id: core::0191::file-layout
rule_doc_uri: https://linter.aip.dev/191/file-layout
| pm.versionedQueuesLock.Lock() | ||
| defer pm.versionedQueuesLock.Unlock() | ||
| pm.initCancel() | ||
| queue, err := pm.defaultQueueFuture.Get(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could block? or is initialize() guaranteed to resolve the future quickly after initCtx is canceled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the later, its a bit tricky, at some point in writing this I had a comment block explaining my thinking about but I must have removed it before pushing. You asking this is probably more evidence that I should put something back there.
My thinking on this is initialization can be in one of three states, it could be before we make the call to wait on the userdatainitialization, currently blocked on initialization, or finishing up the initialization after the wait call returns. In the first two, cancelling the context will short circuit the rest of initialization and not block Stop. However the last one is a bit more complex, there isn't a great spot to short circuit the initialization after the wait call, and we don't want to race with the subscriptions. So my thought was that as long as there is nothing that can block us for long after that first Wait call, then we should just let it finish first before tearing down. The idea being that the rest of initialization should not have any more blocking calls.
I'm not in love with this solution, and I'm open to discussing alternatives, but I think this works "ok" as-is.
|
|
||
| ctx, cancel := context.WithCancel(context.Background()) | ||
| cancel() | ||
| defaultQ, err := pm.defaultQueueFuture.Get(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use defaultQueue()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I avoided it because if the queue is not ready defaultQueue() has that softassert in it, in this case it not being ready yet is more expected and not really an error, so I didn't want to spam the output for it. Perhaps that should be in a comment there? Or maybe defaultQueue takes another flag? If we add a GetIfReady() to our future implementation I think my preference would be to use that and a comment why we dont use defaultQueue()
What changed?
Add a new dynamic config for Auto Enabling fairness and priority if we see the relevant tasks coming in.
Why?
Seamlessly start to transition users who start using the fields over to the new code path.
How did you test it?
Potential risks
Due to storing this in the userdata we are using that interface a bit more, we also need to change the initialization such that we start it before being able to substantiate the defaultQ, this change in initialization might have unintended side effects that I'm not currently seeing.
Notes
I've left comments for pieces of code that I am less sure or unsure about, or where things different slightly from our original plan. Leaving as a draft until I can get some new tests for testing this more thoroughly.