Skip to content

Snapshot row streaming timeout too short (5s) causes flaky acceptance tests #4866

@e-gineer

Description

@e-gineer

Problem

The snapshot acceptance tests are flaky, contributing to the ~50% CI failure rate. Analysis shows the snapshot row streaming goroutine has a 5-second timeout per row, which is too aggressive for CI environments.

Root Cause

In pkg/snapshot/snapshot.go:214, the SnapshotToQueryResult function streams snapshot data rows through a channel with a 5-second timeout:

case <-time.After(5 * time.Second):
    // Timeout - consumer likely stopped reading, exit to prevent leak
    return

In slow CI environments or when the system is under load, this timeout can expire before data is fully consumed, resulting in incomplete snapshot output. This manifests as acceptance test failures where:

  • Expected: 15 lines (header + 12 data rows)
  • Actual: 3 lines (header only, no data)

Evidence

From PR #4865 CI run (snapshot test failure):

  • Snapshot URL was generated successfully (upload worked)
  • But output contained only header, no data rows
  • Test expected 15 lines, got 3 lines

Recent CI runs show ~50% failure rate, with snapshot tests frequently failing.

Proposed Solution

Increase the timeout from 5 seconds to 30 seconds to accommodate slower CI environments while still preventing goroutine leaks.

Impact

This simple change should significantly reduce the flakiness of snapshot acceptance tests without any negative side effects. The timeout is only hit when the consumer has abandoned the result, so a longer timeout doesn't affect normal operation.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions