Don't use MADV_RANDOM #940

sdodson · 2025-04-08T19:07:23Z

In addition to the explicit documented behavior in posix_madvise(2) this
call since Linux 6.4 also causes the kernel to aggressively free pages
from the page cache by short circuiting the LRU second chance mechanism.
The result is compaction events that took 900ms now take up to 20s and
a system which generally operated with near zero major page faults sees
600 or more major faults per second during compaction events.

We've tested this change in older kernels and observed no negative impact
in typical cloud instances.

Fixes #939

k8s-ci-robot · 2025-04-08T19:07:28Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sdodson
Once this PR has been reviewed and has the lgtm label, please assign ahrtr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-04-08T19:07:33Z

Hi @sdodson. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sdodson · 2025-04-08T19:07:46Z

CC @dusk125

In addition to the explicit documented behavior in posix_madvise(2) this call since Linux 6.4 also causes the kernel to aggressively free pages from the page cache by short circuiting the LRU second chance mechanism. The result is compaction events that took 900ms now take up to 20s and a system which generally operated with near zero major page faults sees 600 or more major faults per second during compaction events. We've tested this change in older kernels and observed no negative impact in typical cloud instances. Fixes etcd-io#939

sdodson · 2025-12-09T18:14:54Z

I'll be revisiting this in the coming months as OpenShift moves to complete our transition to RHEL10 (6.12 baseline kernel). I'm OK with it being closed and would re-open it once I've confirmed that we see high rate of page faults, maybe something else in RHEL10 and/or upstream kernel mitigated the unique behavior seen on RHEL 9 kernels, who knows.

k8s-ci-robot added size/XS needs-ok-to-test labels Apr 8, 2025

sdodson force-pushed the remove-madv_random branch from 43efcdf to 849eb4a Compare April 8, 2025 19:16

github-actions bot added the stale label Jul 8, 2025

github-actions bot closed this Aug 12, 2025

ahrtr reopened this Aug 13, 2025

ahrtr removed the stale label Aug 13, 2025

glycerine pushed a commit to glycerine/bbolt that referenced this pull request Oct 13, 2025

apply etcd-io#940 to fix etcd-io#939

f2f2949

github-actions bot added the stale label Nov 12, 2025

github-actions bot closed this Dec 9, 2025

ahrtr reopened this Dec 9, 2025

ahrtr added stage/tracked and removed stale labels Dec 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't use MADV_RANDOM #940

Don't use MADV_RANDOM #940

Uh oh!

sdodson commented Apr 8, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented Apr 8, 2025

Uh oh!

k8s-ci-robot commented Apr 8, 2025

Uh oh!

sdodson commented Apr 8, 2025

Uh oh!

sdodson commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Don't use MADV_RANDOM #940

Are you sure you want to change the base?

Don't use MADV_RANDOM #940

Uh oh!

Conversation

sdodson commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Apr 8, 2025

Uh oh!

k8s-ci-robot commented Apr 8, 2025

Uh oh!

sdodson commented Apr 8, 2025

Uh oh!

sdodson commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

sdodson commented Apr 8, 2025 •

edited

Loading