Skip to content

Nodes that haven't finish creation are abandoned and left for registration TTL (15m) when Karpenter restarts #1211

@comtalyst

Description

@comtalyst

Version

Karpenter Version: v1.6.3

Kubernetes Version: v1.32

Expected Behavior

  • Karpenter decides to scale-up, start creating a node
  • Karpenter restarts (e.g., due to configuration changes from cluster update, if using NAP)
    ...
  • Node creation is completed, as if there is no restart

Actual Behavior

  • Karpenter decides to scale-up, start creating a node
  • Karpenter restarts (e.g., due to configuration changes from cluster update, if using NAP)
    ...
  • Ongoing node creation is abandoned
  • After registration TTL (15 minutes), the node is cleaned up
    • This means 15 minutes of down time, where user workload is left unschedulable
  • Karpenter creates a new node

Steps to Reproduce the Problem

Restart Karpenter while the node is being created, per the timeline above.
For NAP, Karpenter may be restarted with cluster update. Node image upgrade on system pool is one of the operations that trigger restart.

Resource Specs and Logs

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions