Skip to content

[Bug]: Fault quarantine should not process old events when circuit breaker is closed #450

@lalitadithya

Description

@lalitadithya

Prerequisites

  • I searched existing issues
  • I can reproduce this issue

Bug Description

We had a cluster that had the circuit breaker in a tripped state for many days and when we closed the braker, the FQ module started to process very old events. This resulted in nodes being cordoned and uncondoned for older issues. Ideally, the FQ module should only cordon nodes that have fault at the current time

Component

Fault Management

Steps to Reproduce

  1. Trip the circuit breaker
  2. Send many unhealthy events for X hours/days
  3. Send many healthy events for X hours/days
  4. Close the circuit breaker

Environment

  • NVSentinel version: 0.2.0
  • Kubernetes version: all
  • Deployment method: ArgoCD

Logs/Output

N/A

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions